[Pacemaker] monitor on-fail=ignore not restarting when resource reported as stopped

Lars Marowsky-Bree lmb at suse.com
Mon Dec 9 05:53:06 EST 2013


On 2013-12-06T16:06:09, Patrick Hemmer <pacemaker at feystorm.net> wrote:

Hi Patrick,

> > For a resource that pacemaker expects to be started, it's an error if it
> > is found to be stopped. Pacemaker can't tell if it is really cleanly
> > stopped, or died, or ...
> Oh, and I'll quote the OCF spec on this one:
> 
> 1     generic or unspecified error (current practice)
>     The "monitor" operation shall return this for a crashed, hung or
>     otherwise non-functional resource.
> 
> 7     program is not running
>     Note: This is not the error code to be returned by a successful
>     "stop" operation. A successful "stop" operation shall return 0.
>     The "monitor" action shall return this value only for a
>     _cleanly_ stopped resource. If in doubt, it should return 1.
> 
> So the OCF spec very clearly states that OCF_ERR_GENERIC means it's
> failed. OCF_NOT_RUNNING means it shut down cleanly. So yes, pacemaker
> can tell if it cleanly stopped.

Yes. I know. I wrote that.

But for a resource that pacemaker expects to be "started", both mean
that something happened, and the resource is no longer in the target
state. e.g., recovery kicks in. In theory, the only action that
OCF_NOT_RUNNING would allow us to skip is the "stop" action before
starting it elsewhere, but we do that anyway as a safety measure.

There's also a difference in how pacemaker handles this in response to
the initial monitor_0 / probe.

> > If you want Pacemaker to recover failed resources, do not set
> > on-fail="ignore". I still don't quite get why you set that when you
> > obviously don't want the associated behaviour?

I still don't understand what you want.

You want "failures" (i.e., rc != 0 or 7) to be ignored, but "stopped" to
be restarted? You can't do that without modifying the resource agent.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde





More information about the Pacemaker mailing list