[ClusterLabs] data loss of network would cause Pacemaker exit abnormally

Wed Aug 31 15:39:52 UTC 2016

On 08/30/2016 01:58 PM, chenhj wrote:
> Hi,
> 
> This is a continuation of the email below(I did not subscrib this maillist)
> 
> http://clusterlabs.org/pipermail/users/2016-August/003838.html
> 
>>>From the above, I suspect that the node with the network loss was the
>>DC, and from its point of view, it was the other node that went away.
> 
> Yes. the node with the network loss was DC(node2)
> 
> Could someone explain what's the following messges means, and 
> why pacemakerd process exit instead of rejoin to CPG group?
> 
>> Aug 27 12:33:59 [46849] node3 pacemakerd:    error: pcmk_cpg_membership:
>>        We're not part of CPG group 'pacemakerd' anymore!

This means the node was kicked out of the membership. I don't remember
what that implies, I'm guessing the node exits because the cluster will
most likely fence it after kicking it out.

> 
>>> [root at node3 ~]# rpm -q corosync
>>> corosync-1.4.1-7.el6.x86_64
>>That is quite old ...
>>> [root at node3 ~]# cat /etc/redhat-release 
>>> CentOS release 6.3 (Final)
>>> [root at node3 ~]# pacemakerd -F
>> Pacemaker 1.1.14-1.el6 (Build: 70404b0)
>>and I doubt that many people have tested Pacemaker 1.1.14 against
>>corosync 1.4.1 ... quite far away from
>>each other release-wise ...
> 
> pacemaker 1.1.14 + corosync-1.4.7 can also reproduced this probleam, but
> seems with lower probability.

The corosync 2 series is a major improvement, but some config changes
are necessary