[Pacemaker] prevent starting resources on failed node

Fri Dec 6 10:17:45 EST 2013

[ Hopefully this doesn't cause a duplicate post but my first attempt
returned an error. ]

Using pacemaker 1.1.10 (but I think this issue is more general than that
release), I want to enforce a policy that once a node fails, no
resources can be started/run on it until the user permits it.

I have been successful in achieving this using resource stickiness.
Mostly.  It seems that once the resource has been successfully started
on another node, it stays put, even once the failed node comes back up.
So this is all good.

Where it does seem to be falling down though is that if the failed node
comes back up before the resource can be successfully started on another
node, pacemaker seems to include the just-failed-and-restarted node in
the candidate list of nodes it tries to start the resource on.  So in
this manner, it seems that resource stickiness only applies once the
resource has been started (which is not surprising; it seems a
reasonable behaviour).

The question then is, anyone have any ideas on how to implement such a
policy?  That is, once a node fails, no resources are allowed to start
on it, even if it means not starting the resource (i.e. all other nodes
are unable to start it for whatever reason)?  Simply not starting the
node would be one way to achieve it, yes, but we cannot rely on the node
not being started.

It seems perhaps the installation of a constraint when a node is
stonithed might do the trick, but the question is how to couple/trigger
the installation of a constraint with a stonith action?

Or is there a better/different way to achieve this?

Cheers,
b.