[Pacemaker] fencing to recover from failed resources

Thu Jan 13 10:08:49 UTC 2011

On Thursday 13 January 2011 09:51:16 Lars Marowsky-Bree wrote:
> On 2011-01-12T22:52:14, Bart Coninckx <bart.coninckx at telenet.be> wrote:
> > Jan 12 22:20:34 xen2 pengine: [6633]: WARN: unpack_rsc_op: Processing
> > failed op intranet1_stop_0 on xen1: unknown exec error (-2)
> > 
> > My monitors are set to restart a resorce. What makes the PE decide to
> > fence the node in stead of first trying to restart the resource as the
> > monitor operation is configured to do?
> 
> The restart consists of a "stop" and a "start"; as you can see from the
> above logs, the "stop" failed.
> 
> 
> Regards,
>     Lars

Hi Lars,

thx for your answer. 
So do I get this straight:
- resource undergoes monitor operation
- monitor reports failure
- a restart of the resource is issued (stop and start)
- stop fails
- PE decides to fence the node because of this regardless of the state of 
other resources

Untill I figure out why a stop fails (this are Xen resources, not sure why a 
xm shutdown or xm destroy would fail ...), is there a way to make Pacemaker 
less radical in fencing (without disabling fencing all together?)

Thank you!

Bart