[Pacemaker] how do I avoid infinite reboot cycles by fencing just the offline node?

Dejan Muhamedagic dejanmm at fastmail.fm
Mon Jun 14 10:43:54 EDT 2010


Hi,

On Mon, Jun 14, 2010 at 02:26:57PM +0200, Oliver Heinz wrote:
> 
> I configured a sbd fencing device on the shared storage to prevent data 
> corruption. It works basically, but when I pull the network plugs on one node 
> to simulate a failure one of the nodes is fenced (not necessarily the one that 
> was unplugged). After the fenced node reboots it fences the other node, this 
> goes on and on...

The networking is still down between the nodes? If so, then this
is expected.

> I configured pingd and location so that the resources on the shared device are 
> not started on the node that is without network connectivity, but still this 
> node fences the other node.

Yes. With pingd you can influence the resource placement, but it
can't fix split brain.

> What I would like to achive is that in case of a network problem on a node 
> this node is fenced (and not some randomly chosen node) and that after a 
> reboot this node just sits there waiting for the network to come up again (and 
> not fencing other nodes). Once the network comes up, this node could 
> automatically join the cluster again.
> 
> Is this possible?

No. You need to make your network connectivity between the nodes
redundant. Split brain is bad news. The cluster will try best to
deal with it, but, as you could see, it won't always please the
users.

> Or do I have to disable  the cluster stack on bootup, to 
> sort things out manually before joining the cluster. 

I think that that's a good idea.

Oh, and don't make sbd a clone, it doesn't like parallel
operations to a device.

Thanks,

Dejan

> Can s.o. please point me to the right direction? Maybe I'm just overlooking 
> the obviuos?
> 
> TIA,
> Oliver
> 
> node $id="00b61c9a-22c6-4689-9930-1fd65d5729fa" server-d \
>         attributes standby="off"
> node $id="0d11e934-91b9-400d-9820-feb2f5895b55" server-c
> primitive resDATA ocf:heartbeat:LVM \
>         params volgrpname="data"
> primitive resDataC ocf:heartbeat:Filesystem \
>         params device="/dev/mapper/data-C" directory="/srv/data/C" 
> fstype="ext4" \
>         meta is-managed="true"
> primitive resDataD ocf:heartbeat:Filesystem \
>         params device="/dev/mapper/data-D" directory="/srv/data/D" 
> fstype="ext4"
> primitive resPingGateway ocf:pacemaker:pingd \
>         params host_list="gateway"
> primitive resSBD stonith:external/sbd \
>         params sbd_device="/dev/mapper/3600c0ff000d8d78802faa14b01000000-part1"
> primitive resVserverTestFramelos ocf:heartbeat:VServer \
>         params vserver="test-framelos" \
>         meta is-managed="true"
> group grpVserverC resDataC resVserverTestFramelos \
>         meta target-role="Started"
> group grpVserverD resDataD \
>         meta target-role="Started"
> clone cloneDATA resDATA
> clone clonePingGateway resPingGateway \
>         meta target-role="Started"
> clone cloneSBD resSBD
> location cli-prefer-grpVserverC grpVserverC \
>         rule $id="cli-prefer-rule-grpVserverC" inf: #uname eq server-c
> location cli-prefer-grpVserverD grpVserverD \
>         rule $id="cli-prefer-rule-grpVserverD" inf: #uname eq server-d and 
> #uname eq server-d
> location cli-prefer-resVserverTestFramelos resVserverTestFramelos \
>         rule $id="cli-prefer-rule-resVserverTestFramelos" inf: #uname eq 
> server-c
> location locPingVserverC grpVserverC \
>         rule $id="locPingVserverC-rule" -inf: not_defined pingd or pingd lte 0
> location locPingVserverD grpVserverD \
>         rule $id="locPingVserverD-rule" -inf: not_defined pingd or pingd lte 0
> order ordDataC inf: cloneDATA grpVserverC
> order ordDataD inf: cloneDATA grpVserverD
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.8-f2ca9dd92b1d+ sid tip" \
>         cluster-infrastructure="Heartbeat" \
>         expected-quorum-votes="2" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="true" \
>         default-resource-stickiness="INFINITY" \
>         last-lrm-refresh="1276514035"
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list