[Pacemaker] OCF Resource agent monitor activity failed due to temporary error
Andreas Kurz
andreas at hastexo.com
Thu Apr 19 12:36:24 UTC 2012
On 04/19/2012 01:59 PM, Kulovits Christian - OS ITSC wrote:
> Hi Andreas,
> Exactly this is what i want pacemaker to do when my RA is not able to determine the resource´s state. But without running into timeout and restart.
> It's the method to display the resource´s state that is unavailable not the resource itself. This typically approach must be coded in every RA instead of once in pacemaker.
You want pacemaker to ignore monitor errors on all unknown return values
and go on with monitoring until a resource "heals" itself?
.... please rethink ... it is a resource agents work to reliable tell
pacemaker the definite resource state -- and "uhm, hm, don't know now
please try later" can be everything -- and how to find that out is very
specific depending on the resource. IMHO that makes no sense at all to
let the cluster manager do this work.
There may be cases were a "degraded" resource state may be a nice
feature and is already a topic here on the list ... from time to time.
Regards,
Andreas
> Christian
>
> -----Original Message-----
> From: Andreas Kurz [mailto:andreas at hastexo.com]
> Sent: Donnerstag, 19. April 2012 13:51
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error
>
> Hi Christian,
>
> On 04/19/2012 01:38 PM, Kulovits Christian - OS ITSC wrote:
>> Hi, Andreas
>>
>> What if the RA gets a response from an external command in the form: "display currently unavailable, try later". The RA has 3 possibly states available, "Running", "Not Running", "Failed". But in this situation he would say "don't know". When I set "on-fail=ignore" this error will be ignored the same way as when response is "not running" and the resource will never be restarted.
>> Christian
>
> A typically approach is to wait a little bit and retry the monitor
> command until it succeeds to deliver a valid status (running/not
> running) or the RA monitor operation timeouts and the script is killed
> including resource recovery.
>
> Regards,
> Andreas
>
--
Need help with Pacemaker?
http://www.hastexo.com/now
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120419/6dc7b7fe/attachment-0004.sig>
More information about the Pacemaker
mailing list