[Pacemaker] fencing to recover from failed resources
Michael Smith
msmith at cbnco.com
Thu Jan 13 14:30:48 UTC 2011
Bart Coninckx wrote:
> By the way: things seem better when I change the monitor time out to 30
> seconds in stead of 10 seconds. Very strange though, because the resource
> agent basically does a "xm list --long" while monitoring, which takes less
> than half a second in a console.
I think sometimes xend hangs for a while. 30 seconds should be good.
What is your stop action timeout? I use 90.
There are some old bugs related to spurious resource stops; make sure
you're running at least cluster-glue 1.0.6 and pacemaker 1.1.2.1. I
don't see these versions in the SLES11 HAE SP1 updates repo, but maybe
the fixes were backported to other versions. I've been using the
packages from here:
http://download.opensuse.org/repositories/network:/ha-clustering/SLE_11_SP1/x86_64/
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2482
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2479
Mike
More information about the Pacemaker
mailing list