[Pacemaker] problem with pacemaker/corosync on CentOS 6.3
Jake Smith
jsmith at argotec.com
Fri Jul 20 16:18:37 UTC 2012
----- Original Message -----
> From: fatcharly at gmx.de
> To: "Jake Smith" <jsmith at argotec.com>, "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Friday, July 20, 2012 11:50:52 AM
> Subject: Re: [Pacemaker] problem with pacemaker/corosync on CentOS 6.3
>
> Hi Jake,
>
> I erased the files as mentioned und started the services. This is
> what I get on pilotpound after crm_mon :
>
> ============
> Last updated: Fri Jul 20 17:45:58 2012
> Last change:
> Current DC: NONE
> 0 Nodes configured, unknown expected votes
> 0 Resources configured.
> ============
>
>
> Looks like the system didn´t joined the cluster.
Wish I had something for you but I don't. When this happened to me I followed this thread that gave me the directions I passed on to you:
http://www.gossamer-threads.com/lists/linuxha/pacemaker/78397
Not sure after that...
There was another thread in this list back in March I think where Florian mentioned stopping all cluster nodes and restarting to clear up the same kind of error...
>
> Any suggestions are welcome
>
> Kind regards
>
> fatharly
>
> ------- Original-Nachricht --------
> > Datum: Fri, 20 Jul 2012 10:49:15 -0400 (EDT)
> > Von: Jake Smith <jsmith at argotec.com>
> > An: The Pacemaker cluster resource manager
> > <pacemaker at oss.clusterlabs.org>
> > Betreff: Re: [Pacemaker] problem with pacemaker/corosync on CentOS
> > 6.3
>
> >
> > ----- Original Message -----
> > > From: fatcharly at gmx.de
> > > To: pacemaker at oss.clusterlabs.org
> > > Sent: Friday, July 20, 2012 6:08:45 AM
> > > Subject: [Pacemaker] problem with pacemaker/corosync on CentOS
> > > 6.3
> > >
> > > Hi,
> > >
> > > I´m using a pacemaker+corosync bundle to run a pound based
> > > loadbalancer. After an update on CentOS 6.3 there is some
> > > mismatch
> > > of the node status. Via crm_mon on one node eveything looks fine
> > > while on the other node everything is offline. Everything was
> > > fine
> > > on CentOS 6.2.
> > >
> > > Node powerpound:
> > >
> > > ============
> > > Last updated: Fri Jul 20 12:04:29 2012
> > > Last change: Thu Jul 19 17:58:31 2012 via crm_attribute on
> > > pilotpound
> > > Stack: openais
> > > Current DC: powerpound - partition with quorum
> > > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> > > 2 Nodes configured, 2 expected votes
> > > 7 Resources configured.
> > > ============
> > >
> > > Online: [ powerpound pilotpound ]
> > >
> > > HA_IP_1 (ocf::heartbeat:IPaddr2): Started powerpound
> > > HA_IP_2 (ocf::heartbeat:IPaddr2): Started powerpound
> > > HA_IP_3 (ocf::heartbeat:IPaddr2): Started powerpound
> > > HA_IP_4 (ocf::heartbeat:IPaddr2): Started powerpound
> > > HA_IP_5 (ocf::heartbeat:IPaddr2): Started powerpound
> > > Clone Set: pingclone [ping-gateway]
> > > Started: [ pilotpound powerpound ]
> > >
> > >
> > > Node pilotpound:
> > >
> > > ============
> > > Last updated: Fri Jul 20 12:04:32 2012
> > > Last change: Thu Jul 19 17:58:17 2012 via crm_attribute on
> > > pilotpound
> > > Stack: openais
> > > Current DC: NONE
> > > 2 Nodes configured, 2 expected votes
> > > 7 Resources configured.
> > > ============
> > >
> > > OFFLINE: [ powerpound pilotpound ]
> > >
> > >
> > >
> > >
> > >
> > > from /var/log/messages on pilotpound:
> > >
> > > Jul 20 12:06:12 pilotpound cib[24755]: warning:
> > > cib_peer_callback:
> > > Discarding cib_apply_diff message (35909) from powerpound: not in
> > > our mem bership
> > > Jul 20 12:06:12 pilotpound cib[24755]: warning:
> > > cib_peer_callback:
> > > Discarding cib_apply_diff message (35910) from powerpound: not in
> > > our mem bership
> > >
> > >
> > >
> > > how could this happened and what can I do to solve this problem ?
> >
> > Pretty sure it had nothing to do with upgrade - I had this the
> > other day
> > on Ubuntu 12.04 after a reboot of both nodes. I believe a couple
> > experts
> > called it a "transient" bug. See:
> > https://bugzilla.redhat.com/show_bug.cgi?id=820821
> > https://bugzilla.redhat.com/show_bug.cgi?id=5040
> >
> > >
> > > Any suggestions are welcome
> >
> > I fixed by stopping/killing pacemaker/corosync on offending node
> > (pilotpound). Then cleared these files out on same node:
> > rm /var/lib/heartbeat/crm/cib*
> > rm /var/lib/pengine/*
> >
> > Then restart corosync/pacemaker and the node rejoined fine.
> >
> > HTH
> >
> > Jake
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
More information about the Pacemaker
mailing list