[Pacemaker] RFC: What part of the XML configuration do you hate the most?
Dejan Muhamedagic
dejanmm at fastmail.fm
Fri Jul 11 12:33:34 UTC 2008
On Mon, Jun 30, 2008 at 04:45:29PM +0200, Lars Marowsky-Bree wrote:
> On 2008-06-27T14:52:08, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
>
> > The fail-counts in lrmd will probably be available for
> > inspection. And they would probably also expire after some time.
> > What I suggested in the previous messages is actually missing
> > the time dimension: There should be maximum failures within
> > a period.
> >
> > > So I think that lrmd should always report failures like now,
> > > and crm/cib should hold all the failed status and make a decision.
> >
> > Of course, it could be done like that as well, though that could
> > make processing in crm much more complex.
>
> The CRM already implements all of the above for failures and restarts,
> and tracks failcounts. This would be a fairly minor addition, not that I
> think it would be a good one - RAs shouldn't report failures if there
> wasn't a failure, period.
>
> > > Another case we've met was when we wrote a RA to check for some hardware.
> > > The status from the hardware rarely failed in very specific timing,
> > > and retrying the check was just fine.
> > That's what I often observed with some stonith devices.
>
> This is a bug in the monitor operation.
It is certainly a bug, but not in monitor. Recall that all the
complexity is typically elsewhere, not in the stonith plugin.
Often it's just that the device isn't robust enough and simply
fails once every few hundred calls for whatever reason.
Thanks,
Dejan
More information about the Pacemaker
mailing list