[Pacemaker] monitor on-fail=ignore not restarting when resource reported as stopped

Fri Dec 6 11:16:17 EST 2013

Am Freitag, 6. Dezember 2013, 11:02:11 schrieben Sie:
> ------------------------------------------------------------------------
> *From: *Michael Schwartzkopff <ms at sys4.de>
> *Sent: * 2013-12-06 10:50:19 E
> *To: *The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> *Subject: *Re: [Pacemaker] monitor on-fail=ignore not restarting when
> resource
> reported as stopped
> 
> > Am Freitag, 6. Dezember 2013, 10:11:07 schrieb Patrick Hemmer:
> >> I have a resource which updates DNS records (Amazon's Route53). When it
> >> performs it's `monitor` action, it can sometimes fail because of issues
> >> with Amazon's API. So I want failures to be ignored for the monitor
> >> action, and so I set `op monitor on-fail=ignore`. However now when the
> >> monitor action comes back as 'stopped', pacemaker does nothing. In my
> >> opinion a "stopped" return code should not be a failure condition, and
> >> thus the `on-fail=ignore` should not apply. It basically makes the
> >> monitor option completely useless. It won't do anything on failure, it
> >> won't do anything on stopped, so you might as well not have a monitor
> >> action at all.
> >> 
> >> If this is a bug I can create a bug report, just not sure if this is
> >> deliberate or not.
> > 
> > This is not bug but expected behaviour. A monitoring operation for a
> > started resource interpretes everything besides "Started" as failure.
> > Also if your resource is stopped.
> > 
> > And you told the resoure to ignore failures.
> > 
> > It would be better to improve your resource agent to detect error
> > conditions. It could read the state it should be in from pacemaker and
> > compare it with the reality.
> 
> It does detect the error condition. It then returns with
> $OCF_ERR_GENERIC. This is the only possible way to respond. It's also
> the right way. If the script got an error trying to query the status, it
> doesn't know if it's really running or not. If it's not running,
> returning $OCF_SUCCESS would be a lie. If it is running, returning
> $OCF_NOT_RUNNING would be a lie.
> 
> The monitor action can also be called by pacemaker even when the
> resource is not running (ie, prior to starting it, or when pacemaker
> first starts up). Thus returning $OCF_SUCCESS on error is not appropriate.

So where is the problem? If the script returns "ERROR" than pacemaker has to 
acct accordingly.

Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131206/69054e2e/attachment-0003.sig>