[Pacemaker] why so long to stonith?

Andrew Beekhof andrew at beekhof.net
Wed Apr 24 05:16:14 UTC 2013


On 24/04/2013, at 5:34 AM, Brian J. Murrell <brian at interlinx.bc.ca> wrote:

> Using pacemaker 1.1.8 on RHEL 6.4, I did a test where I just killed
> (-KILL) corosync on a peer node.  Pacemaker seemed to take a long time
> to transition to stonithing it though after noticing it was AWOL:

[snip]

> As you can see, 3 minutes and 10 seconds went by before pacemaker
> transitioned from noticing the node unresponsive to stonithing it.
> 
> This smacks of some kind of mis-configured timeout but I'm not aware
> of any timeout that would have this effect.
> 
> Thoughts?
> b.

Almost certainly you are hitting:

    https://bugzilla.redhat.com/show_bug.cgi?id=951340

I am doing my best to convince people that make decisions that this is worthy of an update before 6.5.
The mystery at the moment is why some clusters (ie. all the ones we tested on internally) seem unaffected.



More information about the Pacemaker mailing list