[Pacemaker] cibadmin sets node to offline
Jamie
thisbodydrop at gmail.com
Wed Aug 6 15:43:08 CEST 2014
Hi,
I have setup a 2 node cluster, using the following packages:
pacemaker 1.1.10+git20130802-1ubuntu2
corosync 2.3.3-1ubuntu1
My cluster config is as so:
node $id="12303" ldb03
node $id="12304" ldb04
primitive p_fence_ldb03 stonith:external/vcenter \
params VI_SERVER="10.17.248.10"
VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml"
HOSTLIST="ldb03=ldb03" RESETPOWERON="0" pcmk_host_check="static-list"
pcmk_host_list="ldb03" \
op start interval="0" timeout="500s"
primitive p_fence_ldb04 stonith:external/vcenter \
params VI_SERVER="10.17.248.10"
VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml"
HOSTLIST="ldb04=ldb04" RESETPOWERON="0" pcmk_host_check="static-list"
pcmk_host_list="ldb04" \
op start interval="0" timeout="500s"
primitive p_fs_mysql ocf:heartbeat:Filesystem \
params device="nfsserver:/LDB_Cluster1" directory="/var/lib/mysql"
fstype="nfs"
options="relatime,rw,hard,nointr,rsize=32768,wsize=32768,bg,vers=3,proto=tcp
" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="120s" \
op monitor interval="60s" timeout="60s" \
meta is-managed="true"
primitive p_ip_1 ocf:heartbeat:IPaddr2 \
params ip="10.10.10.11" cidr_netmask="25" \
op monitor interval="30s" \
meta target-role="Started" is-managed="true"
primitive p_ip_2 ocf:heartbeat:IPaddr2 \
params ip="10.10.10.12" cidr_netmask="25" \
op monitor interval="30s" \
meta target-role="Started" is-managed="true"
primitive p_ip_3 ocf:heartbeat:IPaddr2 \
params ip="10.10.10.13" cidr_netmask="25" \
op monitor interval="30s" \
meta target-role="Started" is-managed="true"
primitive p_mysql ocf:heartbeat:mysql \
params datadir="/var/lib/mysql" binary="/usr/bin/mysqld_safe"
socket="/var/run/mysqld/mysqld.sock" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
op monitor interval="20" timeout="30" \
meta target-role="Started" is-managed="true"
group g_mysql p_fs_mysql p_mysql p_ip_1 p_ip_2 p_ip_3 \
location l_fence_ldb03 p_fence_ldb03 -inf: ldb03
location l_fence_ldb04 p_fence_ldb04 -inf: ldb04
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
no-quorum-policy="ignore" \
stonith-enabled="true" \
stop-all-resources="false" \
expected-quorum-votes="2" \
last-lrm-refresh="1407325251"
This exact configuration has worked during the setup, but I have encountered
a problem with my inactive node ldb03. Corosync shows this node as up:
root at ldb03:~# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.12303.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.12303.ip (str) = r(0) ip(10.10.10.8)
runtime.totem.pg.mrp.srp.members.12303.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.12303.status (str) = joined
runtime.totem.pg.mrp.srp.members.12304.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.12304.ip (str) = r(0) ip(10.10.10.9)
runtime.totem.pg.mrp.srp.members.12304.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.12304.status (str) = joined
and crm status and crm node status show it as online:
Last updated: Wed Aug 6 14:16:24 2014
Last change: Wed Aug 6 14:02:00 2014 via crm_resource on ldb04
Stack: corosync
Current DC: ldb04 (12304) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
7 Resources configured
Online: [ ldb03 ldb04 ]
root at ldb03:~# crm node status
<nodes>
<node id="12304" uname="ldb04"/>
<node id="12303" uname="ldb03"/>
</nodes>
but....after seeing this entry in my logs:
Aug 6 13:26:23 ldb03 cibadmin[2140]: notice: crm_log_args: Invoked:
cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03"
ha="active" in_ccm="false" crmd="offline" join="member" expected="down" crm-
debug-origin="manual_clear" shutdown="0"/>
I noticed that cibadmin shows it as normal(offline)
root at ldb03:~# crm node show
ldb04(12304): normal
ldb03(12303): normal(offline)
The offline state is not present in anything but cibadmin. Not the cib.xml,
not corosync-quorumtool and a tcpdump shows multicast traffic from both
hosts.
I tried (hesitantly) to delete the line using cibadmin, but I couldn't quite
get the syntax right. Any tips on how to get this node to show as online and
subsequently be able to run resources? Currently, when I run crm resource
move, this has no effect, no errors and nothing noticeable in the logfiles
either.
Sorry for long thread....I can attach more logs/config if necessary.
Thanks,
Jamie.
More information about the Pacemaker
mailing list