[Pacemaker] clear failcount when monitor is successful?

Johan Huysmans johan.huysmans at inuits.be
Wed Apr 24 04:37:24 EDT 2013


I'm still investigating what happens in my situation.

So I have a cloned resource, with on-fail set to block.
I configured the failure-timeout to 30s.
An other resource groups depends on the cloned resource (order & 
colocation configured)

--> start situation
* scope=status  name=fail-count-d_tomcat value=0
* depending resource group running on node
* crm_mon shows everything ok

--> a failure occurs
* scope=status  name=fail-count-d_tomcat value=1
* depending resource group stopping on node
* crm_mon shows failure

--> After 30s (= failure-timeout)
* scope=status  name=fail-count-d_tomcat value=1
* depending resource group not running on node
* crm_mon shows NO failure !!!!!

--> After something changes in the cluster or the recheck interval
* scope=status  name=fail-count-d_tomcat value=0
* depending resource group can run on node
* crm_mon shows no failure
* BUT my resource is still monitored and failing!



I find it disturbing that a resource with a failing monitor has a 0 
failcount,
shows ok in crm_mon and allows to run the depending resources.


gr.
Johan




On 24-04-13 08:35, Johan Huysmans wrote:
> I tried the failure-timeout.
> But I noticed that when the failure-timeout resets the failcount the 
> resource becomes OK in the crm_mon view.
> However the resource is still failing.
>
> This shouldn't happen, Can this behaviour be changed with some setting?
>
> gr.
> Johan
>
>
> On 24-04-13 07:23, Andrew Beekhof wrote:
>> On 23/04/2013, at 11:24 PM, Johan Huysmans <johan.huysmans at inuits.be> 
>> wrote:
>>
>>> Hi All,
>>>
>>> I have a cloned resource, running on my both nodes, my on-fail is 
>>> set to block.
>>> So if the resource fails on a node the failcount increases, but 
>>> whenever the resource automatically recovers the failcount isn't reset.
>>>
>>> Is there a way to reset the failcount to 0, when the monitor is 
>>> successful?
>>
>> No, but you can expire them after a period of time.
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list