[Pacemaker] fencing to recover from failed resources
Lars Marowsky-Bree
lmb at novell.com
Thu Jan 13 10:13:42 UTC 2011
On 2011-01-13T11:08:49, Bart Coninckx <bart.coninckx at telenet.be> wrote:
> thx for your answer.
> So do I get this straight:
> - resource undergoes monitor operation
> - monitor reports failure
> - a restart of the resource is issued (stop and start)
> - stop fails
> - PE decides to fence the node because of this regardless of the state of
> other resources
>
> Untill I figure out why a stop fails (this are Xen resources, not sure why a
> xm shutdown or xm destroy would fail ...), is there a way to make Pacemaker
> less radical in fencing (without disabling fencing all together?)
You can set the on-fail behavior for stop operations too.
It defaults to "fence" since a failed stop implies that pacemaker was
unable to recover the resource, and so it cannot be started again (on
the same node or elsewhere). This typically implies a bug in the
resource agent (which failed to perform the requested action) or a
kernel bug (unkillable processes etc); hence, the only automated safe
action that pacemaker can do to bring the resource into a clean state
again is to fence the whole node.
If you don't want that, you can set on-fail="block", for example.
Regards,
Lars
--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list