[Pacemaker] fencing to recover from failed resources

Wed Jan 12 21:52:14 UTC 2011

Hi,

I get a lot of fencing on my two node cluster with these messages:

Jan 12 22:20:34 xen2 pengine: [6633]: info: get_failcount: intranet1 has 
failed INFINITY times on xen1
Jan 12 22:20:34 xen2 pengine: [6633]: info: get_failcount: intranet1 has 
failed INFINITY times on xen1
Jan 12 22:20:34 xen2 pengine: [6633]: WARN: unpack_rsc_op: Processing failed 
op intranet1_monitor_60000 on xen1: unknown exec error (-2)
Jan 12 22:20:34 xen2 pengine: [6633]: info: get_failcount: intranet1 has 
failed INFINITY times on xen1
Jan 12 22:20:34 xen2 pengine: [6633]: WARN: unpack_rsc_op: Processing failed 
op intranet1_stop_0 on xen1: unknown exec error (-2)
Jan 12 22:20:34 xen2 pengine: [6633]: WARN: pe_fence_node: Node xen1 will be 
fenced to recover from resource failure(s)

My monitors are set to restart a resorce. What makes the PE decide to fence 
the node in stead of first trying to restart the resource as the monitor 
operation is configured to do?

Thank you!

Bart