[Pacemaker] Pacemaker stop behaviour when underlying resource is unavailable

Tue Dec 18 02:03:51 UTC 2012

On Fri, Dec 14, 2012 at 9:32 PM, pavan tc <pavan.tc at gmail.com> wrote:
> Hi,
>
> I have structured my multi-state resource agent as below when the underlying
> resource becomes unavailable for some reason:
>
> monitor()
> {
>     state=get_primitive_resource_state()
>
>     ...
>     ...
>     if ($state == unavailable)
>        return $OCF_NOT_RUNNING
>
>     ...
>     ...
> }
>
> stop()
> {
>     monitor()
>     ret=$?
>
>     if (ret == $OCF_NOT_RUNNING)
>        return $OCF_SUCCESS
> }
>
> start()
> {
>     start_primitive()
>     if (start_primitive_failure)
>         return OCF_ERR_GENERIC
> }
>
> The idea is to make sure that stop does not fail when the underlying
> resource goes away.
> (Otherwise I see that the resource gets to an unmanaged state)
> Also, the expectation is that when the resource comes back, it joins the
> cluster without much fuss.
>
> What I see is that pacemaker calls stop twice

That would not be expected. Bug?

> and if it finds that stop
> returns success,
> it does not continue with monitor any more. I also do not see an attempt to
> start.

Anywhere?  Or just on the same node?

>
> Is there a way to keep the monitor going in such circumstances?

Not really. You can define a recurring monitor for the Stopped role though.
But why would it come back?  You _really_ should not be starting
services outside of the cluster - not least of all because we've
probably started it somewhere else in the meantime.

> Am I using incorrect resource agent return codes?
>
> Thanks,
> Pavan
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>