[Pacemaker] clear failcount when monitor is successful?

Johan Huysmans johan.huysmans at inuits.be
Wed Apr 24 03:14:08 EDT 2013


In our setup it is possible that a resource can automatically recover 
and that no human intervention is needed.
Therefore our cluster should be able to recover automatically.

gr.
Johan

On 24-04-13 08:47, Michael Schwartzkopff wrote:
>
> Am Mittwoch, 24. April 2013, 08:35:29 schrieb Johan Huysmans:
>
> > I tried the failure-timeout.
>
> > But I noticed that when the failure-timeout resets the failcount the
>
> > resource becomes OK in the crm_mon view.
>
> > However the resource is still failing.
>
> >
>
> > This shouldn't happen, Can this behaviour be changed with some setting?
>
> >
>
> > gr.
>
> > Johan
>
> >
>
> > On 24-04-13 07:23, Andrew Beekhof wrote:
>
> > > On 23/04/2013, at 11:24 PM, Johan Huysmans 
> <johan.huysmans at inuits.be> wrote:
>
> > >> Hi All,
>
> > >>
>
> > >> I have a cloned resource, running on my both nodes, my on-fail is 
> set to
>
> > >> block. So if the resource fails on a node the failcount 
> increases, but
>
> > >> whenever the resource automatically recovers the failcount isn't 
> reset.
>
> > >>
>
> > >> Is there a way to reset the failcount to 0, when the monitor is
>
> > >> successful?
>
> > >
>
> > > No, but you can expire them after a period of time.
>
> > > _______________________________________________
>
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> > >
>
> > > Project Home: http://www.clusterlabs.org
>
> > > Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
> > > Bugs: http://bugs.clusterlabs.org
>
> >
>
> > _______________________________________________
>
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> >
>
> > Project Home: http://www.clusterlabs.org
>
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
> > Bugs: http://bugs.clusterlabs.org
>
> When a resource fails and the failcounter increases it is a reason for 
> adminstrator intervention to look what went wrong. On that occasion 
> the admin also can clean the failcounter.
>
> During the normal operation of a cluster a failure should not happen. 
> A non-zero failcounter is always a sign of a problem.
>
> Greetings,
>
> -- 
>
> Dr. Michael Schwartzkopff
>
> Guardinistr. 63
>
> 81375 München
>
> Tel: (0163) 172 50 98
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130424/4c133e09/attachment-0003.html>


More information about the Pacemaker mailing list