[Pacemaker] benefits of cman?

Matthew O'Connor matt at ecsorl.com
Sat May 19 04:00:43 UTC 2012


OK, I answered my own question below...for the most part.

On 05/18/2012 02:26 PM, Matthew O'Connor wrote:
> By the way, will Pacemaker or Corosync log something to the syslog if it
> decides to fence a member?  Will it attempt to fence one that has flat
> disappeared, or only one that it has become unable to stop services on?
> I ask because I have a node that recently started spitting out
> "rcu_sched_state detected stall on cpu..." whenever I'm not around.  The
> surviving node recognizes that it has lost contact with this defunct
> node, but by that point the DLM and/or OCFS2 is totally hosed and the
> surviving node requires a hard-restart.  I guess my hope is that, were
> fencing actually working on my cluster, the fence would happen before
> the surviving node's DLM/OCFS2 drivers melted down (assuming the real
> issue at hand isn't wiping out DLM/OCFS everywhere before the bad-node
> is determined offline by the good-node).
I understand now that the DLM expects STONITH to be working, or else it 
will block forever - or until the failed node re-establishes contact.  
By the way, my thanks go out to the writer of the libvirt-based STONITH 
method.  It worked great for me, and it was great to see it nuke my 
misbehaving virtual test node!  OCFS2 also responded much better in that 
test environment - fencing makes such a difference...

Thanks again for the info on cman+corosync+pacemaker!







More information about the Pacemaker mailing list