[Pacemaker] behaviour of failure-timeout

Kashif Jawed Siddiqui kashifjs at huawei.com
Thu Sep 13 21:24:08 EDT 2012


Hi,

>>I could not find any detailed explanation in the doc, how
>>"failure-timeout" behaves, can someone clarify that?
failure-timout will clear the failcount if it has increased.(i.e the resources have failed)
The cluster is primarily event driven but can have effect based on time. This timout can be specified as cluster property known as cluster-recheck-interval. The default is 15 mins

>>My rough understanding so far is, that after a failcount is increased,
>>pacemaker "waits for the failure-timeout" to expire and then checks if
>>the failure condition is still on. If not, it will reset the failcount
>>on that node. Now
Yes

>>- How does pacemaker check that, is it using a monitor operation?
As mentioned above, depends on any events or cluster-recheck-interval.
>>- are there re-checks at later times
Same as above
>>- Are checks only run on the node where the failcount was increased or
>>on all
No. It is run on all nodes

Regards,
Kashif Jawed Siddiqui


>>Cheers!
>>Mario


_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



More information about the Pacemaker mailing list