[Pacemaker] RFC: What part of the XML configuration do you hate the most?
Dejan Muhamedagic
dejanmm at fastmail.fm
Tue Jun 24 14:26:14 UTC 2008
On Tue, Jun 24, 2008 at 04:02:06PM +0200, Lars Marowsky-Bree wrote:
> On 2008-06-24T15:48:12, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
>
> > > But precisely we have two scenarios to configure to:
> > > a) monitor NG -> stop -> start on the same node
> > > -> monitor NG (Nth time) -> stop -> failover to another node
> > > b) monitor NG -> monitor NG (Nth times) -> stop -> failover to another node
> > >
> > > The current pacemaker behaves as a), I think, but b) is also
> > > useful when you want to ignore a transient error.
> >
> > The b) part has already been discussed on the list and it's
> > supposed to be implemented in lrmd. I still don't have the API
> > defined, but thought about something like
> >
> > max-total-failures (how many times a monitor may fail)
> > max-consecutive-failures (how many times in a row a monitor may fail)
> >
> > These should probably be attributes defined on the monitor
> > operation level.
>
> The "ignore failure reports" clashes a bit with the "react to failures
> ASAP" requirement.
>
> It is my belief that this should be handled by the RA, not in the LRM
> nor the CRM. The monitor op implementation is the place to handle this.
>
> Beyond that, I strongly feel that "transient errors" are a bad
> foundation to build clusters on.
Of course, all that is right. However, there are some situations
where we could bend the rules. I'm not sure what Keisuke-san had
in mind, but for example one could be more forgiving when
monitoring certain stonith resources.
Thank,
Dejan
More information about the Pacemaker
mailing list