Project

General

Profile

Troubleshooting

CMI

CMI portal "Internal server error" (500)

May be needed to reset 'viappsadm' db connection user password

Default: 'viappspass'

cmi-api/application/configs/application.ini

resources.db.params.username = "viappsadm" 
resources.db.params.password = "some_password" 

Run this script

MYSQL_ROOT_PASS=viapps #<-- (write SQL root password here)
VIAPPS_ADM_PASS=some_password #<-- write some password than above in .ini file

echo "DROP USER 'viappsadm'@'localhost';" | mysql -u root -p${MYSQL_ROOT_PASS}
echo "CREATE USER 'viappsadm'@'localhost' IDENTIFIED BY '${VIAPPS_ADM_PASS}' ;"  | mysql -u root -p${MYSQL_ROOT_PASS}
echo "GRANT ALL PRIVILEGES ON virtualapp.* to 'viappsadm'@'localhost';" | mysql -u root -p${MYSQL_ROOT_PASS}
echo "FLUSH PRIVILEGES;"  | mysql -u root -p${MYSQL_ROOT_PASS}

May be needed to reset cmi portal 'admin' user password

Default: 'viapps'

MYSQL_ROOT_PASS=viapps #<-- (write SQL root password here)
CMI_PASS=login_pass #<-- write desired login password

echo "update USERS set PASSWORD=md5('$CMI_PASS') where NAME='admin';" | mysql -u root -p${MYSQL_ROOT_PASS} virtualapp 

Chef server is not running

  1. How to check
    1. Port 4000 Listeninig
      # netstat -tan | grep 4000 | grep LISTEN
      tcp        0      0 0.0.0.0:4000                0.0.0.0:*                   LISTEN
      
    2. cmi-client key working
      # knife client list
        [...]
        chef-validator
        chef-webui
        cmi-client
      
  1. Simple restart
    1. Execute from CMI as root: setup-chef-server.sh
  1. Full restart
    1. Exec:
      # mv /etc/cron.d/cmi-check /tmp   # disable cron keepalive
      # service chef-solr stop
      # service chef-server-webui stop
      # service chef-server stop
      # service chef-expander stop
      # service rabbitmq-server stop
      # service couchdb stop
      # setup-chef-server.sh           # restart all services
      # mv /tmp/cmi-check /etc/cron.d  # if chef working, re-enable cron keepalive.
      

cmi-client key not working

  1. Test cmi-client key
    # knife node list
    ERROR: Failed to authenticate to http://localhost:4000 as cmi-client with key /etc/chef/client.pem
    
    1. How to fix
      1. Test chef-webui key is working
        # knife node list -u chef-webui -k "/etc/chef/webui.pem" 
        [no errors]
        
        1. If not working, reset it
          # mv /etc/chef/client.pem /etc/chef/client.pem_old
          # service chef-server restart
          # knife node list -u chef-webui -k /etc/chef/webui.pem
          [no errors]
          
      2. Either re-register cmi-client with using chef-webui user
        # knife client reregister cmi-client -u chef-webui -k /etc/chef/webui.pem > /etc/chef/client.pem
        
      3. Or delete cmi-client and create it again
        # knife client delete cmi-client -u chef-webui -k "/etc/chef/webui.pem" 
        # export EDITOR=vi
        # echo ZZ | knife client create cmi-client -a -f "/etc/chef/client.pem" -u chef-webui -k "/etc/chef/webui.pem" 
        

Error "SQLSTATE23000: Integrity constraint violation: 1048 Column VERSION cannot be NULL" when importing an appliance node into a CMI

This happened if you update from a pre 1.1 cmi viapps version into a 1.3.1-11 or below. This problem would be fixed in future patch releases.
When trying to import an orphaned cmix node or a cmix from another cmi the following problem might show up:

To fix this by hand execute the following command in your CMI mysql database (virtualapp):

mysql> ALTER TABLE APPLIANCES MODIFY VERSION varchar(25);
Query OK, 7 rows affected (0.12 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> select ID,NAME from APPLIANCES;
+----+-------------------------+
| ID | NAME                    |
+----+-------------------------+
| 35 | ipsec3.viapps.org       |
| 44 | dns01                   |
| 45 | dns02                   |
| 46 | smtpgw.viappslabs.org   |
| 47 | px03.viapps.org         |
| 48 | lb01.viappslabs.org     |
| 50 | pcivault.viappslabs.org |
+----+-------------------------+
7 rows in set (0.00 sec)

mysql> desc APPLIANCES;
+-----------------------+------------------+------+-----+-------------------------------+----------------+
| Field                 | Type             | Null | Key | Default                       | Extra          |
+-----------------------+------------------+------+-----+-------------------------------+----------------+
| ID                    | int(10)          | NO   | PRI | NULL                          | auto_increment |
| NAME                  | varchar(50)      | NO   |     | NULL                          |                |
| DESCRIPTION           | varchar(100)     | NO   |     | NULL                          |                |
| FARM                  | varchar(50)      | NO   |     | NULL                          |                |
| IP                    | varchar(20)      | NO   |     | NULL                          |                |
| IPMANAGEMENT          | varchar(50)      | NO   |     | NULL                          |                |
| FLAVOURSID            | int(10) unsigned | NO   |     | NULL                          |                |
| STATUSID              | int(10) unsigned | NO   |     | NULL                          |                |
| STEPSTATUSID          | int(10)          | NO   |     | 1                             |                |
| MESSAGE               | varchar(200)     | YES  |     | NULL                          |                |
| OWNER                 | varchar(50)      | YES  |     | root                          |                |
| URI                   | varchar(200)     | YES  |     | http://localhost:80/cmix-api/ |                |
| TOKEN                 | varchar(200)     | YES  |     | NULL                          |                |
| MAIL                  | varchar(100)     | YES  |     | NULL                          |                |
| VERSION               | varchar(25)      | YES  |     | NULL                          |                |
| AUTOMODE              | int(1) unsigned  | NO   |     | 0                             |                |
| ENABLED_STATS         | tinyint(1)       | YES  |     | 1                             |                |
| ENABLED_ELASTICSEARCH | tinyint(1)       | YES  |     | 0                             |                |
| CREATED               | timestamp        | NO   |     | 0000-00-00 00:00:00           |                |
| CREATED_BY            | int(10)          | YES  |     | NULL                          |                |
| MODIFIED              | timestamp        | NO   |     | CURRENT_TIMESTAMP             |                |
| MODIFIED_BY           | int(10)          | YES  |     | NULL                          |                |
+-----------------------+------------------+------+-----+-------------------------------+----------------+
22 rows in set (0.00 sec)

CMIX installation fails when installing first package (httpd)

  1. Check /etc/yum.repos.d/vaf-cmix.repo in node. It should point to CMI management IP configured in CMI/Configuration/General/REPO
    [VAF]
    name=Virtual Appliance Factory
    baseurl=http://10.10.109.1/repo/x86_64
    enabled=1
    gpgcheck=0
    

CMIX installation fails with a yum repo error

  1. Check /etc/yum.repos.d/vaf-cmix.repo is the only file in this folder. If not:
    1. Remove/backup any other *.repo file
    2. Run "yum clean all" to clear yum cache
    3. Test with "yum install httpd" (answers either "N" or "Y" when prompted are ok)

How to reset mysql root password to "viapps"

  1. Run /root/first_run.sh to set all CMI parameters again, or:
    ------------------------------------------------------------------------
    [root@cmi01]# service mysqld stop
    [root@cmi01]# su - mysql
    -bash-4.1$
    -bash-4.1$ cd /tmp/
    -bash-4.1$ cat root_pass.sql
    UPDATE mysql.user SET Password=PASSWORD('viapps') WHERE User='root';
    FLUSH PRIVILEGES;
    -bash-4.1$
    -bash-4.1$ mysqld_safe --init-file /tmp/root_pass.sql &
    -bash-4.1$ exit
    [root@cmi01 tmp]# service mysqld stop
    [root@cmi01 tmp]# rm /tmp/root_pass.sql
    [root@cmi01 tmp]# service mysqld start
    [root@cmi01 tmp]# mysql -u root -pviapps
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 8
    [....]
    mysql> exit
    

rsyslog stopped and logrotate don't work

  1. Check for a logrotate status corrupted file
    # service rsyslog status
    rsyslog dead but pid exist
    # logrotate -f /etc/logrotate.d/syslog
    [frozen] ^C
    # ls -l /var/lib/logrotate.status ; tail -f /var/lib/logrotate.status
    [very long file, las execution not finished]
    # rm /var/lib/logrotate.status
    # logrotate -f /etc/logrotate.d/syslog
    # service resyslog restart
    

Huge chef.couch and .chef_desing/*.views under /var/lib/couchdb

  1. Exec:
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/clients
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/cookbooks
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/data_bags
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/environments
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/id_map
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/nodes
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/roles
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/sandboxes
    curl -H "Content-Type: application/json" -X POST http://localhost:5984/chef/_compact/users
    
  2. Substitute cmi cron /etc/cron.d/compact_chef_db with this update

Apply fix

CMI HA Cluster

Split-Brain. DRBD disks Standalone in both nodes

Reference

  1. status node1
    [root@node1 ~]# service drbd status
    [...]
    m:res    cs          ro               ds                 p       mounted  fstype
    1:disk1  StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4
    
  2. status node2
    [root@node2 ~]# service drbd status
    [...]
    m:res    cs          ro                 ds                 p       mounted  fstype
    1:disk1  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----
    

Discard node2 disk data and sync to node1

  1. Node2
    1. Force "Standalone" status if needed
      [root@node2]# drbdadm disconnect disk1
    2. Force resource disk1 to be "Seconday" role, if needed
      [root@node2# drbdadm secondary disk1
    3. Connect to Primary discarding own data
      [root@node2]# drbdadm connect --discard-my-data disk1
  2. Node1
    1. Connect disk1, so sync to node2 can take effect
      [root@node1]# drbdadm connect disk1
    2. Sync in progress ...
      [root@node1]# service drbd status
      [...]
      m:res    cs          ro                 ds                     p  mounted       fstype
      ...      sync'ed:    29.5%              (49532/67880)K
      1:disk1  SyncSource  Primary/Secondary  UpToDate/Inconsistent  C  /cmi_cluster  ext4
      

Heartbeat fails to start due to mysqld and/or rabbitmq-server faillure

Procedure applied in a support issue.

  1. heartbeat didn't work due to mysqld faillure. Couldn't create socket in /var/lib/mysql. "No space left in device"
  2. cluster disk mounted manually in node2
    node2# drbdadm  primary disk1
    node2# mount /dev/drbd1 /cmi_cluster
    node2# df -k | grep cmi_cluster # (ok, 70% reported)
    node2# touch /cmi_cluster/test # => "No space left in device" (?)
    node2# cd /root && tar cvf backup.tar /cmi_cluster/path/to/some/logs && rm -rf /cmi_cluster/path/to/some/logs
    node2# touch /cmi_cluster/test # ok
    node2# cd /root && umount /cmi_cluster && drbdadm secondary disk1
    node2# /usr/share/heartbeat/hb_takeover # try to start cluster services
    
  3. mysqld starts ok, heartbeart fails now due to rabbitmq-server start faillure
  4. stop rabbitmq-server and dependant services (just in case they where running, they shouldn't be) and delete
    rabbitmq data
    node1# cat /etc/ha.d/haresources # look for drbd disk dependent services
    node1# service chef-solr stop ; service chef-server-webui stop ; service chef-server stop
    node1# service chef-expander stop ; service rabbitmq-server stop
    node1# drbdadm  primary disk1 && mount /dev/drbd1 /cmi_cluster
    node1# cd /cmi_cluster/lib/rabbitmq/mnesia
    node1# tar cvf backup.tar rabbit*  # just in case 
    node1# rm -rf rabbit*
    node1# cd /root && umount /cmi_cluster && drbdadm secondary disk1
    
  5. Issuing hb_takeover/hb_standby commands now CMI cluster now worked fine.

How to test takeover

  1. Monitor /var/log/ha-log in both nodes.
  2. In active node issue:
    /usr/share/heartbeat/hb_standby
    
    1. Active node will release VIP, stop services and umount cluster disk.
    2. Slave node will take VIP up, mount cluster disk and start services (in reverse order ther where stopped)

cluster disk doesn't mount

Cluster disk, mounted at /cmi_cluster, doesn't work.

If "cat /proc/drbd" shows disk in "DiskLess" status and "drbdadm attach disk1" or "drbdadm up all" returns an error wirh "no metadata", Try:

drbdadm create-md disk1

Then "cat /proc/drbd" should show cluster disk syncronizing from master. When finished it should be available for takeover

cluster disk in Seconadry/Secondary status

If cluster disk is in "Sencodary/Secondary" status:

[root@cmi2 ~]# /etc/init.d/drbd status
[...]
m:res    cs         ro                 ds                 p  mounted       fstype
1:disk1  Connected  Secondary/Secondary  UpToDate/UpToDate  C  /cmi_cluster  ext4

And the following errors appear in ha-log:

[root@node2 ~]# tail -f /var/log/ha-log
cmi2 heartbeat: [1686]: ERROR: should_drop_message: attempted replay attack [cmi2.teste.produban]? [gen = 1390477951, curgen = 1390477952]
cmi2 heartbeat: [1686]: ERROR: should_drop_message: attempted replay attack [cmi2.teste.produban]? [gen = 1390477951, curgen = 1390477952]
cmi2 heartbeat: [1686]: WARN: nodename cmi2.teste.produban uuid changed to cmi1.teste.produban
cmi2 heartbeat: [1686]: WARN: nodename cmi1.teste.produban uuid changed to cmi2.teste.produban
cmi2 heartbeat: [1686]: ERROR: should_drop_message: attempted replay attack [cmi2.teste.produban]? [gen = 1390477951, curgen = 1390477952]
cmi2 heartbeat: [1686]: ERROR: should_drop_message: attempted replay attack [cmi2.teste.produban]? [gen = 1390477951, curgen = 1390477952]

Then remove hb_uuid and hb_generation files and restart heartbeat
[root@node2 ~]# cd /var/lib/heartbeat/
[root@node2 heartbeat]# ll hb_*
-rw-r--r-- 1 root root 16 Jan 21 17:03 hb_generation
-rw-r--r-- 1 root root 16 Jan 20 19:23 hb_uuid
[root@node2 heartbeat]# rm -f hb_generation hb_uuid
[root@node2 heartbeat]# /etc/init.d/heartbeat restart

chef services are set to start outside heartbeat

When an "setup-chef-server.sh" has been issued in CMI, the following services has to be unset at boot

chkconfig collectd off
chkconfig mysqld off
chkconfig httpd off
chkconfig rsyslog off
chkconfig couchdb off
chkconfig rabbitmq-server off
chkconfig postfix off
chkconfig chef-expander off
chkconfig chef-server off
chkconfig chef-server-webui off
chkconfig chef-solr off
chkconfig crond off

CMIX HA Cluster

CMIX VMs where removed from datacenter before Cluster was removed from CMI portal

Cluster deletetion fails due a communication faillure with non longer existing CMIX nodes and nodes can not be deleted from Appliances tab if still registered as cluster components.
To mannually unregister a Cluster and its nodes:

[root@cmi1 ~]# MYSQL_ROOT_PASS=viapps

[root@cmi1 ~]# echo "SELECT * FROM CLUSTERS;" | mysql -u root -p$MYSQL_ROOT_PASS virtualapp
ID      NAME    DESCRIPTION     FLAVOURSID      MASTER_SERVICE_IP   [.....]      
2       FW      firewall reference      1       10.10.109.103       [.....] 
[root@cmi1 ~]#  echo "DELETE  FROM CLUSTERS WHERE ID='2';" | mysql -u root -pMYSQL_ROOT_PASS virtualapp
[root@cmi1 ~]#  echo "SELECT * FROM CLUSTERS;" | mysql -u root -pMYSQL_ROOT_PASS virtualapp

[root@cmi1 ~]#  echo "SELECT * FROM APPLIANCES;" | mysql -u root -p$MYSQL_ROOT_PASS virtualapp
ID      NAME    DESCRIPTION     FARM    IP      IPMANAGEMENT    FLAVOURSID     [.....]
9       fws.slave       fw slave        FW VAR  172.16.104.104  10.10.109.104  [.....]
10      fwm.master      fw master       FW VAR  172.16.104.103  10.10.109.103  [.....] 
[root@cmi1 ~]#
[root@cmi1 ~]#  echo "DELETE FROM APPLIANCES WHERE ID='9';" | mysql -u root -pviapps virtualapp
[root@cmi1 ~]#  echo "DELETE FROM APPLIANCES WHERE ID='10';" | mysql -u root -pviapps virtualapp
[root@cmi1 ~]#  echo "SELECT * FROM APPLIANCES;" | mysql -u root -p$MYSQL_ROOT_PASS virtualapp
[root@cmi1 ~]# 

CMIX Firewall

Control loss

  1. Log on to firewall node by ssh, if not possible via console.
  2. Manually insert rules to recover CMI control and heartbeat (for clusters)
    [root@fw ~]# iptables -I INPUT -s <CMI_IP> -j ACCEPT
    [root@fw ~]# iptables -I INPUT -p udp --dport 694 -j ACCEPT
    
  3. Manage firewall from CMI, adjust rules/policies an restart service

Debug squid ACL/Rules

From squid docs
Ref levels

I set up my access controls, but they don't work! why?
If ACLs are giving you problems and you don't know why they aren't working, you can use this tip to debug them.
In squid.conf enable debugging for section 33 at level 2. For example:

debug_options ALL,1 33,2

Then restart or reconfigure squid.
From now on, your cache.log should contain a line for every request that explains if it was allowed, or denied, and which ACL was the last one that it matched.

If this does not give you sufficient information to nail down the problem you can also enable detailed debug information on ACL processing

debug_options ALL,1 33,2 28,9

Then restart or reconfigure squid as above.
From now on, your cache.log should contain detailed traces of all access list processing. Be warned that this can be quite some lines per request.

Debug CMI web response times

  1. edit /etc/httpd/conf/httpd.conf
  2. add %T/%D to common and combined LogFormat definitions
    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T/%D" combined
    LogFormat "%h %l %u %t \"%r\" %>s %b %T/%D" common
    
  3. service httpd reload
  4. check:
    # tail -f /var/log/httpd/cmi-portal-access_log
    10.10.36.181 - - [16/Nov/2014:19:38:34 +0100] "POST /cmix-portal/service-rest?vappId=5 HTTP/1.1" 201 230 0/675256 <-- 0 secs, 675256 microsecs (0,7 secs aprox)
    

After enabling elasticsearch on node, Integrated Log Viewer (kibana) doesn't show data for this node .

(specialy when node have multiple hostnames)

Sometimes elasticsearch is still indexing, please wait some minutes and check again

On CMI check the hostname of the node CMI is receiving by rsyslog (sometimes hostname is short or full FQDN).

[root@cmilab ~]# tail -n1 /var/log/syslog/hosts/px.viappslabs.org/2017/01/12/messages.log  
Jan 12 13:31:01 px.viappslabs.org auditd[1695]: Audit daemon rotating log files

In this example the hostname is px.viappslabs.org, now need to check if this hostname matches the rsyslog redirect to elasticsearch file.

[root@cmilab ~]# grep hostname /etc/rsyslog.d/elasticsearch.px.white.viappslabs.org.conf.remote 
if $hostname == "px.white.viappslabs.org" then 

The hostname doesn't match, so correct the hostname inside the conf file and restart rsyslog.