[Pacemaker] will a stonith resource be moved from an AWOL node?

Lars Marowsky-Bree lmb at suse.com
Tue Apr 30 11:13:37 EDT 2013


On 2013-04-30T10:55:41, "Brian J. Murrell" <brian at interlinx.bc.ca> wrote:

> From what I think I know of pacemaker, pacemaker wants to be able to
> stonith that AWOL node before moving any resources away from it since
> starting a resource on a new node while the state of the AWOL node is
> unknown is unsafe, right?

Right.

> But of course, if the resource that pacemaker wants to move is the
> stonith resource there's a bit of a catch-22.  It can't move the
> stonith resource until it can stonith the node, which it cannot stonith
> the node because the node running the resource is AWOL.
> 
> So, is pacemaker supposed to resolve this on it's own or am I supposed
> to create a cluster configuration that ensures that enough stonith
> resources exist to mitigate this situation?

Pacemaker 1.1.8's stonith/fencing subsystem directly ties into the CIB,
and will complete the fencing request even if the fencing/stonith
resource is not instantiated on the node yet. (There's a bug in 1.1.8 as
released that causes an annoying delay here, but that's fixed since.)

That can appear to be a bit confusing if you were used to the previous
behaviour.

(And I'm not sure it's a real win for the complexity of the
project/code, but Andrew and David are.)

> Node node1: UNCLEAN (pending)
> Online: [ node2 ]

> node1 is very clearly completely off.  The cluster has been in this state, with node1 being off for several 10s of minutes now and still the stonith resource is running on it.

It shouldn't take so long. 

I think your easiest path is to update.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde





More information about the Pacemaker mailing list