[Pacemaker] monitor on-fail=ignore not restarting when resource reported as stopped
Michael Schwartzkopff
ms at sys4.de
Fri Dec 6 15:50:19 UTC 2013
Am Freitag, 6. Dezember 2013, 10:11:07 schrieb Patrick Hemmer:
> I have a resource which updates DNS records (Amazon's Route53). When it
> performs it's `monitor` action, it can sometimes fail because of issues
> with Amazon's API. So I want failures to be ignored for the monitor
> action, and so I set `op monitor on-fail=ignore`. However now when the
> monitor action comes back as 'stopped', pacemaker does nothing. In my
> opinion a "stopped" return code should not be a failure condition, and
> thus the `on-fail=ignore` should not apply. It basically makes the
> monitor option completely useless. It won't do anything on failure, it
> won't do anything on stopped, so you might as well not have a monitor
> action at all.
>
> If this is a bug I can create a bug report, just not sure if this is
> deliberate or not.
This is not bug but expected behaviour. A monitoring operation for a started
resource interpretes everything besides "Started" as failure. Also if your
resource is stopped.
And you told the resoure to ignore failures.
It would be better to improve your resource agent to detect error conditions.
It could read the state it should be in from pacemaker and compare it with the
reality.
Or, the easy way out, make the migration-threshold large (+INF) and add
failure-timeout to your resource. So you allow some failures of your resource,
but forgt the failures after some time.
Of course improving the RA would be the best way.
Greetings,
Michael Schwartzkopff
--
[*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131206/6c12880f/attachment-0004.sig>
More information about the Pacemaker
mailing list