[Pacemaker] cibadmin sets node to offline

Wed Aug 6 17:12:40 CEST 2014

Hi,

Sorry, I have managed to fix this now. I noticed in the logline:

Aug  6 13:26:23 ldb03 cibadmin[2140]:   notice: crm_log_args: Invoked:
cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03"
ha="active" in_ccm="false" crmd="offline" join="member" expected="down" crm-
debug-origin="manual_clear" shutdown="0"/>

the id is ldb03, not the ID of the node, 12303.

I removed using: crm_node -R "ldb03" --force
and rebooted.

Nodes are now in sync.

Thanks,

Jamie.

On Wed, Aug 6, 2014 at 2:43 PM, Jamie <thisbodydrop at gmail.com> wrote:

> Hi,
>
> I have setup a 2 node cluster, using the following packages:
>
> pacemaker                           1.1.10+git20130802-1ubuntu2
> corosync                            2.3.3-1ubuntu1
>
> My cluster config is as so:
>
> node $id="12303" ldb03
> node $id="12304" ldb04
> primitive p_fence_ldb03 stonith:external/vcenter \
>         params VI_SERVER="10.17.248.10"
> VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml"
> HOSTLIST="ldb03=ldb03" RESETPOWERON="0" pcmk_host_check="static-list"
> pcmk_host_list="ldb03" \
>         op start interval="0" timeout="500s"
> primitive p_fence_ldb04 stonith:external/vcenter \
>         params VI_SERVER="10.17.248.10"
> VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml"
> HOSTLIST="ldb04=ldb04" RESETPOWERON="0" pcmk_host_check="static-list"
> pcmk_host_list="ldb04" \
>         op start interval="0" timeout="500s"
> primitive p_fs_mysql ocf:heartbeat:Filesystem \
>         params device="nfsserver:/LDB_Cluster1" directory="/var/lib/mysql"
> fstype="nfs"
>
> options="relatime,rw,hard,nointr,rsize=32768,wsize=32768,bg,vers=3,proto=tcp
> " \
>         op start interval="0" timeout="60s" \
>         op stop interval="0" timeout="120s" \
>         op monitor interval="60s" timeout="60s" \
>         meta is-managed="true"
> primitive p_ip_1 ocf:heartbeat:IPaddr2 \
>         params ip="10.10.10.11" cidr_netmask="25" \
>         op monitor interval="30s" \
>         meta target-role="Started" is-managed="true"
> primitive p_ip_2 ocf:heartbeat:IPaddr2 \
>         params ip="10.10.10.12" cidr_netmask="25" \
>         op monitor interval="30s" \
>         meta target-role="Started" is-managed="true"
> primitive p_ip_3 ocf:heartbeat:IPaddr2 \
>         params ip="10.10.10.13" cidr_netmask="25" \
>         op monitor interval="30s" \
>         meta target-role="Started" is-managed="true"
> primitive p_mysql ocf:heartbeat:mysql \
>         params datadir="/var/lib/mysql" binary="/usr/bin/mysqld_safe"
> socket="/var/run/mysqld/mysqld.sock" \
>         op start interval="0" timeout="120" \
>         op stop interval="0" timeout="120" \
>         op monitor interval="20" timeout="30" \
>         meta target-role="Started" is-managed="true"
> group g_mysql p_fs_mysql p_mysql p_ip_1 p_ip_2 p_ip_3 \
> location l_fence_ldb03 p_fence_ldb03 -inf: ldb03
> location l_fence_ldb04 p_fence_ldb04 -inf: ldb04
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.10-42f2063" \
>         cluster-infrastructure="corosync" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="true" \
>         stop-all-resources="false" \
>         expected-quorum-votes="2" \
>         last-lrm-refresh="1407325251"
>
>
> This exact configuration has worked during the setup, but I have
> encountered
> a problem with my inactive node ldb03. Corosync shows this node as up:
>
> root at ldb03:~# corosync-cmapctl | grep members
> runtime.totem.pg.mrp.srp.members.12303.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.12303.ip (str) = r(0) ip(10.10.10.8)
> runtime.totem.pg.mrp.srp.members.12303.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.12303.status (str) = joined
> runtime.totem.pg.mrp.srp.members.12304.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.12304.ip (str) = r(0) ip(10.10.10.9)
> runtime.totem.pg.mrp.srp.members.12304.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.12304.status (str) = joined
>
> and crm status and crm node status show it as online:
>
> Last updated: Wed Aug  6 14:16:24 2014
> Last change: Wed Aug  6 14:02:00 2014 via crm_resource on ldb04
> Stack: corosync
> Current DC: ldb04 (12304) - partition with quorum
> Version: 1.1.10-42f2063
> 2 Nodes configured
> 7 Resources configured
> Online: [ ldb03 ldb04 ]
>
> root at ldb03:~# crm node status
> <nodes>
>   <node id="12304" uname="ldb04"/>
>   <node id="12303" uname="ldb03"/>
> </nodes>
>
>
> but....after seeing this entry in my logs:
> Aug  6 13:26:23 ldb03 cibadmin[2140]:   notice: crm_log_args: Invoked:
> cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03"
> ha="active" in_ccm="false" crmd="offline" join="member" expected="down"
> crm-
> debug-origin="manual_clear" shutdown="0"/>
>
> I noticed that cibadmin shows it as normal(offline)
> root at ldb03:~# crm node show
> ldb04(12304): normal
> ldb03(12303): normal(offline)
>
> The offline state is not present in anything but cibadmin. Not the cib.xml,
> not corosync-quorumtool and a tcpdump shows multicast traffic from both
> hosts.
>
> I tried (hesitantly) to delete the line using cibadmin, but I couldn't
> quite
> get the syntax right. Any tips on how to get this node to show as online
> and
> subsequently be able to run resources? Currently, when I run crm resource
> move, this has no effect, no errors and nothing noticeable in the logfiles
> either.
>
> Sorry for long thread....I can attach more logs/config if necessary.
>
> Thanks,
>
> Jamie.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140806/6054ee06/attachment-0001.html>