[Pacemaker] Expired fail-count doesn't get cleaned up.

Andrew Beekhof andrew at beekhof.net
Mon Aug 13 22:48:13 EDT 2012


On Tue, Aug 14, 2012 at 10:15 AM, Mario Penners <mario.penners at gmail.com> wrote:
> Hi David,
>
> I understand the failcount only gets reset after a probe is run.  So you
> need to give it a "crm resource reprobe"  for the expiry timer to be
> evaluated.

Not for recent versions of 1.1

>
> However: I do NOT know, when the probes are run (I see them in my logs
> only after failover or start/stop actions are taking place, but not on a
> regular basis).

When the machine first comes up, and when the admin calls "crm resource reprobe"

> So one might have to schedule the crm resource reprobe
> as a cron or define it as an own resource?
>
> Cheers!
>
> On Tue, 2012-07-31 at 05:36 -0400, David Coulson wrote:
>> I'm running RHEL6 with the tech preview of pacemaker it ships with. I've
>> a number of resources which have a failure-timeout="60", which most of
>> the time does what it is supposed to.
>>
>> Last night a resource failed, which was part of a clone - While the
>> resource recovered, the fail-count log never got cleaned up. Around
>> every second the DC logged the pengine message below. I manually did a
>> resource cleanup, and it seems happy now. Is there something I should be
>> looking for in the logs to indicate that it 'missed' expiring this?
>>
>> Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
>>
>> Migration summary:
>> * Node dresproddns01:
>>     re-openfire-lsb:0: migration-threshold=1000000 fail-count=1
>> last-failure='Mon Jul 30 21:57:53 2012'
>> * Node dresproddns02:
>>
>>
>> Jul 31 05:32:34 dresproddns02 pengine: [2860]: notice: get_failcount:
>> Failcount for cl-openfire on dresproddns01 has expired (limit was 60s)
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list