[Pacemaker] problem with pacemaker/corosync on CentOS 6.3
Jake Smith
jsmith at argotec.com
Fri Jul 20 14:49:15 UTC 2012
----- Original Message -----
> From: fatcharly at gmx.de
> To: pacemaker at oss.clusterlabs.org
> Sent: Friday, July 20, 2012 6:08:45 AM
> Subject: [Pacemaker] problem with pacemaker/corosync on CentOS 6.3
>
> Hi,
>
> I´m using a pacemaker+corosync bundle to run a pound based
> loadbalancer. After an update on CentOS 6.3 there is some mismatch
> of the node status. Via crm_mon on one node eveything looks fine
> while on the other node everything is offline. Everything was fine
> on CentOS 6.2.
>
> Node powerpound:
>
> ============
> Last updated: Fri Jul 20 12:04:29 2012
> Last change: Thu Jul 19 17:58:31 2012 via crm_attribute on pilotpound
> Stack: openais
> Current DC: powerpound - partition with quorum
> Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> 2 Nodes configured, 2 expected votes
> 7 Resources configured.
> ============
>
> Online: [ powerpound pilotpound ]
>
> HA_IP_1 (ocf::heartbeat:IPaddr2): Started powerpound
> HA_IP_2 (ocf::heartbeat:IPaddr2): Started powerpound
> HA_IP_3 (ocf::heartbeat:IPaddr2): Started powerpound
> HA_IP_4 (ocf::heartbeat:IPaddr2): Started powerpound
> HA_IP_5 (ocf::heartbeat:IPaddr2): Started powerpound
> Clone Set: pingclone [ping-gateway]
> Started: [ pilotpound powerpound ]
>
>
> Node pilotpound:
>
> ============
> Last updated: Fri Jul 20 12:04:32 2012
> Last change: Thu Jul 19 17:58:17 2012 via crm_attribute on pilotpound
> Stack: openais
> Current DC: NONE
> 2 Nodes configured, 2 expected votes
> 7 Resources configured.
> ============
>
> OFFLINE: [ powerpound pilotpound ]
>
>
>
>
>
> from /var/log/messages on pilotpound:
>
> Jul 20 12:06:12 pilotpound cib[24755]: warning: cib_peer_callback:
> Discarding cib_apply_diff message (35909) from powerpound: not in
> our mem bership
> Jul 20 12:06:12 pilotpound cib[24755]: warning: cib_peer_callback:
> Discarding cib_apply_diff message (35910) from powerpound: not in
> our mem bership
>
>
>
> how could this happened and what can I do to solve this problem ?
Pretty sure it had nothing to do with upgrade - I had this the other day on Ubuntu 12.04 after a reboot of both nodes. I believe a couple experts called it a "transient" bug. See:
https://bugzilla.redhat.com/show_bug.cgi?id=820821
https://bugzilla.redhat.com/show_bug.cgi?id=5040
>
> Any suggestions are welcome
I fixed by stopping/killing pacemaker/corosync on offending node (pilotpound). Then cleared these files out on same node:
rm /var/lib/heartbeat/crm/cib*
rm /var/lib/pengine/*
Then restart corosync/pacemaker and the node rejoined fine.
HTH
Jake
More information about the Pacemaker
mailing list