[Pacemaker] catch-22: can't fence node A because node A has the fencing resource

Wed Dec 4 00:10:53 EST 2013

04.12.2013, 03:30, "David Vossel" <dvossel at redhat.com>:
> ----- Original Message -----
>
>>  From: "Brian J. Murrell" <brian at interlinx.bc.ca>
>>  To: pacemaker at clusterlabs.org
>>  Sent: Monday, December 2, 2013 2:50:41 PM
>>  Subject: [Pacemaker] catch-22: can't fence node A because node A has the fencing resource
>>
>>  So, I'm migrating my working pacemaker configuration from 1.1.7 to
>>  1.1.10 and am finding what appears to be a new behavior in 1.1.10.
>>
>>  If a given node is running a fencing resource and that node goes AWOL,
>>  it needs to be fenced (of course).  But any other node trying to take
>>  over the fencing resource to fence it appears to first want the current
>>  owner of the fencing resource to fence the node.  Of course that can't
>>  happen since the node that needs to do the fencing is AWOL.
>>
>>  So while I can buy into the general policy that a node needs to be
>>  fenced in order to take over it's resources, fencing resources have to
>>  be excepted from this or there can be this catch-22.
>
> We did away with all of the policy engine logic involved with trying to move fencing devices off of the target node before executing the fencing action. Behind the scenes all fencing devices are now essentially clones.  If the target node to be fenced has a fencing device running on it, that device can execute anywhere in the cluster to avoid the "suicide" situation.
>
> When you are looking at crm_mon output and see a fencing device is running on a specific node, all that really means is that we are going to attempt to execute fencing actions for that device from that node first. 

Means... means... means... 
There are baseline principles of programming, one of which is "obvious better not obvious."