[Pacemaker] cibadmin sets node to offline

Jamie thisbodydrop at gmail.com
Wed Aug 6 13:43:08 UTC 2014


Hi,

I have setup a 2 node cluster, using the following packages:

pacemaker                           1.1.10+git20130802-1ubuntu2
corosync                            2.3.3-1ubuntu1

My cluster config is as so:

node $id="12303" ldb03
node $id="12304" ldb04
primitive p_fence_ldb03 stonith:external/vcenter \
        params VI_SERVER="10.17.248.10" 
VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" 
HOSTLIST="ldb03=ldb03" RESETPOWERON="0" pcmk_host_check="static-list" 
pcmk_host_list="ldb03" \
        op start interval="0" timeout="500s"
primitive p_fence_ldb04 stonith:external/vcenter \
        params VI_SERVER="10.17.248.10" 
VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" 
HOSTLIST="ldb04=ldb04" RESETPOWERON="0" pcmk_host_check="static-list" 
pcmk_host_list="ldb04" \
        op start interval="0" timeout="500s"
primitive p_fs_mysql ocf:heartbeat:Filesystem \
        params device="nfsserver:/LDB_Cluster1" directory="/var/lib/mysql" 
fstype="nfs" 
options="relatime,rw,hard,nointr,rsize=32768,wsize=32768,bg,vers=3,proto=tcp
" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="120s" \
        op monitor interval="60s" timeout="60s" \
        meta is-managed="true"
primitive p_ip_1 ocf:heartbeat:IPaddr2 \
        params ip="10.10.10.11" cidr_netmask="25" \
        op monitor interval="30s" \
        meta target-role="Started" is-managed="true"
primitive p_ip_2 ocf:heartbeat:IPaddr2 \
        params ip="10.10.10.12" cidr_netmask="25" \
        op monitor interval="30s" \
        meta target-role="Started" is-managed="true"
primitive p_ip_3 ocf:heartbeat:IPaddr2 \
        params ip="10.10.10.13" cidr_netmask="25" \
        op monitor interval="30s" \
        meta target-role="Started" is-managed="true"
primitive p_mysql ocf:heartbeat:mysql \
        params datadir="/var/lib/mysql" binary="/usr/bin/mysqld_safe" 
socket="/var/run/mysqld/mysqld.sock" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        op monitor interval="20" timeout="30" \
        meta target-role="Started" is-managed="true"
group g_mysql p_fs_mysql p_mysql p_ip_1 p_ip_2 p_ip_3 \
location l_fence_ldb03 p_fence_ldb03 -inf: ldb03
location l_fence_ldb04 p_fence_ldb04 -inf: ldb04
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-42f2063" \
        cluster-infrastructure="corosync" \
        no-quorum-policy="ignore" \
        stonith-enabled="true" \
        stop-all-resources="false" \
        expected-quorum-votes="2" \
        last-lrm-refresh="1407325251"


This exact configuration has worked during the setup, but I have encountered 
a problem with my inactive node ldb03. Corosync shows this node as up:

root at ldb03:~# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.12303.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.12303.ip (str) = r(0) ip(10.10.10.8)
runtime.totem.pg.mrp.srp.members.12303.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.12303.status (str) = joined
runtime.totem.pg.mrp.srp.members.12304.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.12304.ip (str) = r(0) ip(10.10.10.9)
runtime.totem.pg.mrp.srp.members.12304.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.12304.status (str) = joined

and crm status and crm node status show it as online:

Last updated: Wed Aug  6 14:16:24 2014
Last change: Wed Aug  6 14:02:00 2014 via crm_resource on ldb04
Stack: corosync
Current DC: ldb04 (12304) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
7 Resources configured
Online: [ ldb03 ldb04 ]

root at ldb03:~# crm node status
<nodes>
  <node id="12304" uname="ldb04"/>
  <node id="12303" uname="ldb03"/>
</nodes>


but....after seeing this entry in my logs:
Aug  6 13:26:23 ldb03 cibadmin[2140]:   notice: crm_log_args: Invoked: 
cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03" 
ha="active" in_ccm="false" crmd="offline" join="member" expected="down" crm-
debug-origin="manual_clear" shutdown="0"/>

I noticed that cibadmin shows it as normal(offline)
root at ldb03:~# crm node show
ldb04(12304): normal
ldb03(12303): normal(offline)

The offline state is not present in anything but cibadmin. Not the cib.xml, 
not corosync-quorumtool and a tcpdump shows multicast traffic from both 
hosts.

I tried (hesitantly) to delete the line using cibadmin, but I couldn't quite 
get the syntax right. Any tips on how to get this node to show as online and 
subsequently be able to run resources? Currently, when I run crm resource 
move, this has no effect, no errors and nothing noticeable in the logfiles 
either.

Sorry for long thread....I can attach more logs/config if necessary.

Thanks,

Jamie.





More information about the Pacemaker mailing list