[Pacemaker] benefits of cman?
Matthew O'Connor
matt at ecsorl.com
Sat May 19 04:00:43 UTC 2012
OK, I answered my own question below...for the most part.
On 05/18/2012 02:26 PM, Matthew O'Connor wrote:
> By the way, will Pacemaker or Corosync log something to the syslog if it
> decides to fence a member? Will it attempt to fence one that has flat
> disappeared, or only one that it has become unable to stop services on?
> I ask because I have a node that recently started spitting out
> "rcu_sched_state detected stall on cpu..." whenever I'm not around. The
> surviving node recognizes that it has lost contact with this defunct
> node, but by that point the DLM and/or OCFS2 is totally hosed and the
> surviving node requires a hard-restart. I guess my hope is that, were
> fencing actually working on my cluster, the fence would happen before
> the surviving node's DLM/OCFS2 drivers melted down (assuming the real
> issue at hand isn't wiping out DLM/OCFS everywhere before the bad-node
> is determined offline by the good-node).
I understand now that the DLM expects STONITH to be working, or else it
will block forever - or until the failed node re-establishes contact.
By the way, my thanks go out to the writer of the libvirt-based STONITH
method. It worked great for me, and it was great to see it nuke my
misbehaving virtual test node! OCFS2 also responded much better in that
test environment - fencing makes such a difference...
Thanks again for the info on cman+corosync+pacemaker!
More information about the Pacemaker
mailing list