[Pacemaker] stonith/SBD question in the event of a lost node

mark - pacemaker list m+pacemaker at nerdish.us
Mon Apr 2 18:01:31 CEST 2012


Hi Lars,

On Mon, Apr 2, 2012 at 10:35 AM, Lars Marowsky-Bree <lmb at suse.com> wrote:

> On 2012-04-02T09:33:22, mark - pacemaker list <m+pacemaker at nerdish.us>
> wrote:
>
> > Hello,
> >
> > I'm just looking to verify that I'm understanding/configuring SBD
> > correctly.  It works great in the controlled cases where you unplug a
> node
> > from the network (it gets fenced via SBD) or remove its access to the
> > shared disk (the node suicides).  However, In the event of a hardware
> > failure or power interruption that takes a node offline before SBD can
> > fence it, if that node never comes back into the cluster then its
> resources
> > can't ever start anywhere else.  The surviving nodes will continue to try
> > to fence the dead node at regular intervals but can never succeed.
>
> No, that is not correct.
>
> The node will be fenced implicitly - the poison pill is still written,
> and SBD knows that the node will have either read it (and committed
> suicide), determined that it was unable to read it (and committed
> suicide), or the watchdog will have triggered if SBD itself has failed
> beyond hope (i.e., the node will have committed suicide). Hence, after
> the msgwait timeout, the node will be declared "successfully dead" after
> the poison pill was written.
>
> What can affect fencing is the inability to write the poison pill to the
> (majority of) sbd device(s); e.g., the connection between the surviving
> nodes and the (majority of) sbd device(s) is broken.
>
> Or, theoretically, if the node has never been up and claimed its slot on
> them; but that is indeed reasonably unlikely.
>
> So the resources will be claimed afterwards; of course, the
> stonith-timeout needs to be higher than msgwait for this to work.
>
> Are you actually seeing the behaviour you describe (in which case it is
> either a bug or something else going wrong), or is this speculation?
>
>
This is something I encountered, unfortunately it's been some time back so
I don't think I'll still have logs available but I'll check.  I had one
node of a three-node cluster die spontaneously, and for the next 5 hours
the two surviving nodes kept attempting to fence it (the cluster wasn't
quite in production yet so it wasn't monitored and stayed in this state
until we arrived at work in the morning).  As soon as we got the dead node
back up and running and it started talking with the cluster, the resources
started on another node.

I'm very glad to hear that's not how things are supposed to work, that
gives me a cause to set up a test cluster and see if I can replicate the
problem or discover where I've screwed up the configuration.

Debian's corosync/pacemaker scripts don't include a way to start SBD, you
have to work up something on your own to get it started prior to corosync's
startup (from testing SBD with RHEL I seem to recall that its scripts will
start SBD for you, and that it has to be running before the other
components start).  I'm simply starting it as:  /usr/sbin/sbd -d
/dev/mapper/quorumdisk-sbd1 -W -D watch
... prior to corosync running.  Does that seem reasonable?

Thank you,
Mark



> manual intervention?  I suppose this may be one of the reasons that
> fencing
> > via power devices is pretty much the best way to go about it?
>
> No, fencing via power devices exposes one to the madness that is
> management board firmware. If I have the choice, I'll always pick SBD.

Regards,
>    Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
> Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120402/85e73f22/attachment.html>


More information about the Pacemaker mailing list