[Pacemaker] node can't join after reboot in "ver:1" config

Shuichi Ihara sihara at ddn.com
Sun Mar 27 11:07:36 EDT 2011


Hi,

We have two nodes cluster and just changed "ver" to 1 from 0 in the corosync.conf to start {stonithd, cib, lrmd, attrd, pengine, crmd} by pacemakerd, instead of corosync.

service {
	name: pacemaker
	ver: 1
}

We see a problem after this change. We've setup STONITH with IPMI and it had been working well in ver:0 configuration. (e.g. When the all heartbeat connections lost, Node-A does stonith to Node-B with IPMI, then Node-A keeps the resource service.)
Even after the ver:1 configuration, STONITH works pretty well, and it seems no problem, but after the killed node rebooted and started "corosync; pacemakerd" to join into the cluster again, it was  denied due to the following errors.

Mar 27 14:53:04 r08 cib: [5645]: WARN: cib_peer_callback: Discarding cib_apply_diff message (137) from r07: not in our membership
Mar 27 14:53:04 corosync [TOTEM ] Received ringid(192.168.1.126:256) seq 69
Mar 27 14:53:04 corosync [TOTEM ] Delivering 68 to 69
Mar 27 14:53:04 corosync [TOTEM ] Delivering MCAST message with seq 69 to pending delivery queue
Mar 27 14:53:04 corosync [TOTEM ] Received ringid(192.168.1.126:256) seq 6a
Mar 27 14:53:04 corosync [TOTEM ] Delivering 69 to 6a
Mar 27 14:53:04 corosync [TOTEM ] Delivering MCAST message with seq 6a to pending delivery queue
Mar 27 14:53:04 corosync [TOTEM ] releasing messages up to and including 6a
Mar 27 14:53:04 r08 cib: [5645]: WARN: cib_peer_callback: Discarding cib_apply_diff message (138) from r07: not in our membership

When both nodes stop the pacemakerd and corosync, then restart them again, both nodes back correctly. Again, it's no problem on "ver:0" in service section in corosync.conf. Here is my software stack.

corosynclib-1.3.0
pacemaker-1.1.5
cluster-glue-1.0.7

Any advices are appreciated.

Thanks
Ihara








More information about the Pacemaker mailing list