[Pacemaker] node can't join after reboot in "ver:1" config
Shuichi Ihara
sihara at ddn.com
Sun Mar 27 15:07:36 UTC 2011
Hi,
We have two nodes cluster and just changed "ver" to 1 from 0 in the corosync.conf to start {stonithd, cib, lrmd, attrd, pengine, crmd} by pacemakerd, instead of corosync.
service {
name: pacemaker
ver: 1
}
We see a problem after this change. We've setup STONITH with IPMI and it had been working well in ver:0 configuration. (e.g. When the all heartbeat connections lost, Node-A does stonith to Node-B with IPMI, then Node-A keeps the resource service.)
Even after the ver:1 configuration, STONITH works pretty well, and it seems no problem, but after the killed node rebooted and started "corosync; pacemakerd" to join into the cluster again, it was denied due to the following errors.
Mar 27 14:53:04 r08 cib: [5645]: WARN: cib_peer_callback: Discarding cib_apply_diff message (137) from r07: not in our membership
Mar 27 14:53:04 corosync [TOTEM ] Received ringid(192.168.1.126:256) seq 69
Mar 27 14:53:04 corosync [TOTEM ] Delivering 68 to 69
Mar 27 14:53:04 corosync [TOTEM ] Delivering MCAST message with seq 69 to pending delivery queue
Mar 27 14:53:04 corosync [TOTEM ] Received ringid(192.168.1.126:256) seq 6a
Mar 27 14:53:04 corosync [TOTEM ] Delivering 69 to 6a
Mar 27 14:53:04 corosync [TOTEM ] Delivering MCAST message with seq 6a to pending delivery queue
Mar 27 14:53:04 corosync [TOTEM ] releasing messages up to and including 6a
Mar 27 14:53:04 r08 cib: [5645]: WARN: cib_peer_callback: Discarding cib_apply_diff message (138) from r07: not in our membership
When both nodes stop the pacemakerd and corosync, then restart them again, both nodes back correctly. Again, it's no problem on "ver:0" in service section in corosync.conf. Here is my software stack.
corosynclib-1.3.0
pacemaker-1.1.5
cluster-glue-1.0.7
Any advices are appreciated.
Thanks
Ihara
More information about the Pacemaker
mailing list