[Pacemaker] how do I avoid infinite reboot cycles by fencing just the offline node?

Mon Jun 14 12:26:57 UTC 2010

I configured a sbd fencing device on the shared storage to prevent data 
corruption. It works basically, but when I pull the network plugs on one node 
to simulate a failure one of the nodes is fenced (not necessarily the one that 
was unplugged). After the fenced node reboots it fences the other node, this 
goes on and on...

I configured pingd and location so that the resources on the shared device are 
not started on the node that is without network connectivity, but still this 
node fences the other node.

What I would like to achive is that in case of a network problem on a node 
this node is fenced (and not some randomly chosen node) and that after a 
reboot this node just sits there waiting for the network to come up again (and 
not fencing other nodes). Once the network comes up, this node could 
automatically join the cluster again.

Is this possible? Or do I have to disable  the cluster stack on bootup, to 
sort things out manually before joining the cluster. 

Can s.o. please point me to the right direction? Maybe I'm just overlooking 
the obviuos?

TIA,
Oliver

node $id="00b61c9a-22c6-4689-9930-1fd65d5729fa" server-d \
        attributes standby="off"
node $id="0d11e934-91b9-400d-9820-feb2f5895b55" server-c
primitive resDATA ocf:heartbeat:LVM \
        params volgrpname="data"
primitive resDataC ocf:heartbeat:Filesystem \
        params device="/dev/mapper/data-C" directory="/srv/data/C" 
fstype="ext4" \
        meta is-managed="true"
primitive resDataD ocf:heartbeat:Filesystem \
        params device="/dev/mapper/data-D" directory="/srv/data/D" 
fstype="ext4"
primitive resPingGateway ocf:pacemaker:pingd \
        params host_list="gateway"
primitive resSBD stonith:external/sbd \
        params sbd_device="/dev/mapper/3600c0ff000d8d78802faa14b01000000-part1"
primitive resVserverTestFramelos ocf:heartbeat:VServer \
        params vserver="test-framelos" \
        meta is-managed="true"
group grpVserverC resDataC resVserverTestFramelos \
        meta target-role="Started"
group grpVserverD resDataD \
        meta target-role="Started"
clone cloneDATA resDATA
clone clonePingGateway resPingGateway \
        meta target-role="Started"
clone cloneSBD resSBD
location cli-prefer-grpVserverC grpVserverC \
        rule $id="cli-prefer-rule-grpVserverC" inf: #uname eq server-c
location cli-prefer-grpVserverD grpVserverD \
        rule $id="cli-prefer-rule-grpVserverD" inf: #uname eq server-d and 
#uname eq server-d
location cli-prefer-resVserverTestFramelos resVserverTestFramelos \
        rule $id="cli-prefer-rule-resVserverTestFramelos" inf: #uname eq 
server-c
location locPingVserverC grpVserverC \
        rule $id="locPingVserverC-rule" -inf: not_defined pingd or pingd lte 0
location locPingVserverD grpVserverD \
        rule $id="locPingVserverD-rule" -inf: not_defined pingd or pingd lte 0
order ordDataC inf: cloneDATA grpVserverC
order ordDataD inf: cloneDATA grpVserverD
property $id="cib-bootstrap-options" \
        dc-version="1.0.8-f2ca9dd92b1d+ sid tip" \
        cluster-infrastructure="Heartbeat" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="true" \
        default-resource-stickiness="INFINITY" \
        last-lrm-refresh="1276514035"