[Pacemaker] how do I avoid infinite reboot cycles by fencing just the offline node?

Mon Jun 14 16:50:52 UTC 2010

Hi,

On Mon, Jun 14, 2010 at 06:29:59PM +0200, Oliver Heinz wrote:
> Am Montag, 14. Juni 2010, um 16:43:54 schrieb Dejan Muhamedagic:
> > Hi,
> > 
> > On Mon, Jun 14, 2010 at 02:26:57PM +0200, Oliver Heinz wrote:
> > > I configured a sbd fencing device on the shared storage to prevent data
> > > corruption. It works basically, but when I pull the network plugs on one
> > > node to simulate a failure one of the nodes is fenced (not necessarily
> > > the one that was unplugged). After the fenced node reboots it fences the
> > > other node, this goes on and on...
> > 
> > The networking is still down between the nodes? If so, then this
> > is expected.
> > 
> > > I configured pingd and location so that the resources on the shared
> > > device are not started on the node that is without network connectivity,
> > > but still this node fences the other node.
> > 
> > Yes. With pingd you can influence the resource placement, but it
> > can't fix split brain.
> > 
> > > What I would like to achive is that in case of a network problem on a
> > > node this node is fenced (and not some randomly chosen node) and that
> > > after a reboot this node just sits there waiting for the network to come
> > > up again (and not fencing other nodes). Once the network comes up, this
> > > node could automatically join the cluster again.
> > > 
> > > Is this possible?
> > 
> > No. You need to make your network connectivity between the nodes
> > redundant.
> 
> It will be, but I'm testing the worst case scenario. I once had split brain 
> because I plugged in a firewire device and the kernel oops made it to block any 
> i/o (network and even the dumb serial line I had for redundancy) for longer 
> than my configured dead time.
> 
> > Split brain is bad news. The cluster will try best to
> > deal with it, but, as you could see, it won't always please the
> > users.
> > 
> > > Or do I have to disable  the cluster stack on bootup, to
> > > sort things out manually before joining the cluster.
> > 
> > I think that that's a good idea.
> 
> I really like the idea of having the other node sitting there waiting for 
> network recovery and integrating seamlessly once network is up. 
> 
> I guess pacemaker fences the other node after it becomes DC, rigth? So what I 
> would probably want is some rules that prevent this node from becoming DC and 
> starting to do things. Something like "hmm, I can't reach my gateway I should 
> keep looking for a DC an not elect myself."

There is no way to influence the DC election process.

> Meanwhile  I'll add some checks to the init-Script "if we can't reach the 
> gateway and the other node we'd better not start the cluster stack"
> 
> 
> > 
> > Oh, and don't make sbd a clone, it doesn't like parallel
> > operations to a device.
> 
> Thanks for clarification. After reading this 
> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg03851.html
> I thought it wouldn't hurt either.

I think that in the meantime our knowledge about the way sbd
works improved. So, it may hurt and you're covered just as well
with a single sbd instance.

Thanks,

Dejan

> 
> Thanks,
> Oliver
> 
> 
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > > Can s.o. please point me to the right direction? Maybe I'm just
> > > overlooking the obviuos?
> > > 
> > > TIA,
> > > Oliver
> > > 
> > > node $id="00b61c9a-22c6-4689-9930-1fd65d5729fa" server-d \
> > > 
> > >         attributes standby="off"
> > > 
> > > node $id="0d11e934-91b9-400d-9820-feb2f5895b55" server-c
> > > primitive resDATA ocf:heartbeat:LVM \
> > > 
> > >         params volgrpname="data"
> > > 
> > > primitive resDataC ocf:heartbeat:Filesystem \
> > > 
> > >         params device="/dev/mapper/data-C" directory="/srv/data/C"
> > > 
> > > fstype="ext4" \
> > > 
> > >         meta is-managed="true"
> > > 
> > > primitive resDataD ocf:heartbeat:Filesystem \
> > > 
> > >         params device="/dev/mapper/data-D" directory="/srv/data/D"
> > > 
> > > fstype="ext4"
> > > primitive resPingGateway ocf:pacemaker:pingd \
> > > 
> > >         params host_list="gateway"
> > > 
> > > primitive resSBD stonith:external/sbd \
> > > 
> > >         params
> > >         sbd_device="/dev/mapper/3600c0ff000d8d78802faa14b01000000-part1"
> > > 
> > > primitive resVserverTestFramelos ocf:heartbeat:VServer \
> > > 
> > >         params vserver="test-framelos" \
> > >         meta is-managed="true"
> > > 
> > > group grpVserverC resDataC resVserverTestFramelos \
> > > 
> > >         meta target-role="Started"
> > > 
> > > group grpVserverD resDataD \
> > > 
> > >         meta target-role="Started"
> > > 
> > > clone cloneDATA resDATA
> > > clone clonePingGateway resPingGateway \
> > > 
> > >         meta target-role="Started"
> > > 
> > > clone cloneSBD resSBD
> > > location cli-prefer-grpVserverC grpVserverC \
> > > 
> > >         rule $id="cli-prefer-rule-grpVserverC" inf: #uname eq server-c
> > > 
> > > location cli-prefer-grpVserverD grpVserverD \
> > > 
> > >         rule $id="cli-prefer-rule-grpVserverD" inf: #uname eq server-d
> > >         and
> > > 
> > > #uname eq server-d
> > > location cli-prefer-resVserverTestFramelos resVserverTestFramelos \
> > > 
> > >         rule $id="cli-prefer-rule-resVserverTestFramelos" inf: #uname eq
> > > 
> > > server-c
> > > location locPingVserverC grpVserverC \
> > > 
> > >         rule $id="locPingVserverC-rule" -inf: not_defined pingd or pingd
> > >         lte 0
> > > 
> > > location locPingVserverD grpVserverD \
> > > 
> > >         rule $id="locPingVserverD-rule" -inf: not_defined pingd or pingd
> > >         lte 0
> > > 
> > > order ordDataC inf: cloneDATA grpVserverC
> > > order ordDataD inf: cloneDATA grpVserverD
> > > property $id="cib-bootstrap-options" \
> > > 
> > >         dc-version="1.0.8-f2ca9dd92b1d+ sid tip" \
> > >         cluster-infrastructure="Heartbeat" \
> > >         expected-quorum-votes="2" \
> > >         no-quorum-policy="ignore" \
> > >         stonith-enabled="true" \
> > >         default-resource-stickiness="INFINITY" \
> > >         last-lrm-refresh="1276514035"
> > > 
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs:
> > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
> > > r
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker