[ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

Oyvind Albrigtsen oalbrigt at redhat.com
Fri Apr 13 05:59:02 EDT 2018


On 13/04/18 11:53 +0200, Nicolas Huillard wrote:
>Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a écrit :
>> On 13/04/18 11:07 +0200, Nicolas Huillard wrote:
>> > One of my resources is a pppd process, which is started with the
>> > heartbeat/anything RA. That RA just spawn the pppd process with the
>> > correct parameters and return OCF_SUCCESS if the process started.
>> > The problem is that the service provided by pppd is only available
>> > after some time (a few seconds to 30s), ie. when it have
>> > successfully
>> > negotiated a connection. At this time, the interface it creates is
>> > UP.
>> >
>> > The issue here is that other resources that depend on this
>> > connection
>> > are started by Pacemaker just after it starts pppd, thus before the
>> > interface is UP. This creates various problems.
>> >
>> > I figured that fixing this would require to add a monitor call
>> > inside
>> > the start operation, and wait for a successful monitor before
>> > returning
>> > OCF_SUCCESS, within the start timeout.
>> >
>> > Is it a correct approach?
>> > Are there some other standard way to fix this, like a "wait for
>> > condition" Resource Agent?
>>
>> You could try using the monitor_hook parameter to check the status,
>
>The issue here is the monitor will at first return a "fail", which is
>considered fatal by Pacemaker unless property start-failure-is-fatal is
>set to false, which may come with side-effects.
>That's what I do now with a ping RA inserted before the service which
>may fail if the interface is not UP. It works, but triggers some "fail"
>events which are not really "fails" but "not started yet".
You might try setting it to e.g. "sleep 30;
<command-for-checking-status>" and see if that works.
>
>> or
>> use the Delay agent between the anything resource and the other
>> resources.
>
>I'll try this. Hoping a sensible delay can be derived from the logs.
>
>Thanks,
>
>-- 
>Nicolas Huillard
>_______________________________________________
>Users mailing list: Users at clusterlabs.org
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org



More information about the Users mailing list