[Pacemaker] fencing to recover from failed resources
Lars Marowsky-Bree
lmb at novell.com
Thu Jan 13 15:43:46 UTC 2011
On 2011-01-13T09:30:48, Michael Smith <msmith at cbnco.com> wrote:
> >the resource agent basically does a "xm list --long" while
> >monitoring, which takes less than half a second in a console.
> I think sometimes xend hangs for a while. 30 seconds should be good.
There's a pending fix for this, which introduces a faster way to
query the VM state; it involves a change to the Xen RA and a new utility
in the virt-tools package.
If you want to stay supported, easiest way to get those is to contact
support and request a PTF; otherwise, it should eventually appear as a
maintenance update once testing has completed.
> What is your stop action timeout? I use 90.
>
> There are some old bugs related to spurious resource stops; make
> sure you're running at least cluster-glue 1.0.6 and pacemaker
> 1.1.2.1. I don't see these versions in the SLES11 HAE SP1 updates
> repo, but maybe the fixes were backported to other versions.
We're currently queuing up 1.1.4 for release, just waiting for one or
two features to be completed.
> I've
> been using the packages from here:
>
> http://download.opensuse.org/repositories/network:/ha-clustering/SLE_11_SP1/x86_64/
Customers of SLE HA should generally not install this, but request
support help. Installing 3rd party packages voids support, which may
not always be intended.
Best,
Lars
--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list