[Pacemaker] Nodes not rejoining cluster
Andrew Beekhof
andrew at beekhof.net
Sun Apr 15 11:51:19 UTC 2012
On Sat, Mar 31, 2012 at 3:33 AM, Florian Haas <florian at hastexo.com> wrote:
> On Fri, Mar 30, 2012 at 6:09 PM, Gregg Stock <gregg at damagecontrolusa.com> wrote:
>> That looks good. They were all the same and had the correct ip addresses.
>
> So you've got both healthy rings, and all 5 nodes have 5 members in
> the membership list?
>
> Then this would make it a Pacemaker problem. IIUC the code causing
> Pacemaker to discard the update from a node that is "not in our
> membership" has actually been removed from 1.1.7[1] so an upgrade may
> not be a bad idea, but you'll probably have to wait for a few more
> days until packages become available.
>
> Still, out of curiosity, and since you're saying this is a test
> cluster: what happens if you shut down corosync and Pacemaker on *all*
> the nodes, and bring it back up?
>
> We've had a few people report these "not in our membership" issues on
> the list before, and they seem to appear in a very sporadic and
> transient fashion, so the root cause (which may well be totally
> trivial) hasn't really been found out -- as far as I can tell, at
> least. Hence, my question of whether the issue persists after a full
> cluster shutdown.
>
> Florian
>
> [1] https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f
> -- note Andrew will rightfully flame me to a crisp if I've
> misinterpreted that commit, so caveat lector. :)
Its related, but as mentioned off-list, I've seen the same behaviour
even with that patch.
Somehow the process list never makes it to one of the peers (the
others get it fine) which causes much confusion.
The above patch merely ignores the process list in the cib, the crmd
will still be affected.
More information about the Pacemaker
mailing list