[Pacemaker] fencing to recover from failed resources

Thu Jan 13 15:43:46 UTC 2011

On 2011-01-13T09:30:48, Michael Smith <msmith at cbnco.com> wrote:

> >the resource agent basically does a "xm list --long" while
> >monitoring, which takes less than half a second in a console.
> I think sometimes xend hangs for a while. 30 seconds should be good.

There's a pending fix for this, which introduces a faster way to
query the VM state; it involves a change to the Xen RA and a new utility
in the virt-tools package.

If you want to stay supported, easiest way to get those is to contact
support and request a PTF; otherwise, it should eventually appear as a
maintenance update once testing has completed.

> What is your stop action timeout? I use 90.
> 
> There are some old bugs related to spurious resource stops; make
> sure you're running at least cluster-glue 1.0.6 and pacemaker
> 1.1.2.1. I don't see these versions in the SLES11 HAE SP1 updates
> repo, but maybe the fixes were backported to other versions.

We're currently queuing up 1.1.4 for release, just waiting for one or
two features to be completed.

> I've
> been using the packages from here:
> 
> http://download.opensuse.org/repositories/network:/ha-clustering/SLE_11_SP1/x86_64/

Customers of SLE HA should generally not install this, but request
support help.  Installing 3rd party packages voids support, which may
not always be intended.

Best,
    Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde