[Pacemaker] Mostly STONITH Questions / Seeking Best Practice

Wed Sep 5 17:07:44 EDT 2012

Thanks for the feedback Lars ... more information / questions below:

On Wed, Sep 5, 2012 at 9:53 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:

> On 2012-09-04T16:31:54, David Morton <davidmorton78 at gmail.com> wrote:
>
> > 1) I'm planning on implementing sfex resources (a small LVM volume on the
> > same volume group as the data being protected) as an additional safety
> > feature along side the existing external/ipmi STONITH control ... is this
> > best practice in case the IBM IMM is unavailable or credentials change
> etc
> > and the STONITH is not carried out ?
>
> sfex will not help you if STONITH stalls. The stonith agent will fail if
> the credentials change or the IMM is unavailable, and thus will not
> proceed with the take-over.
>
> (Personally I've not seen many good examples of where I'd use sfex.)
>

What I'm looking for here is not a backup for the existing STONITH
mechanism but an additional level of (storage based) protection as we are
using non-clustered filesystems. From what i read in the documentation this
is the purpose of sfex ? To provide a lock resource, so even if STONITH
fails silently and / or you are in a split brain situation the storage will
not be mounted in more than one location ... and used in a group, no
database services will start = no risk of data corruption ?

If this is not the purpose of sfex, what is the best mechanism to ensure
filesystems are not mounted more than once in a cluster ? Am I just being
paranoid ? ;)

>
> > 2) Is there any risk to a healthy node if an unhealthy node with a shared
> > OCFS2 volume mounted goes down ? Quorum policy is set to ignore. Seems to
> > not give any issues but I want to clarify this is the designed behavior.
>
> IO will freeze until fencing is completed.
>
> (Theoretically, a failing node can crash the others if it doesn't die
> "cleanly", but first starts spreading bad data around. The risk of this
> increases with tighter coupling of nodes.)
>

So I assume that  if IO gets held up Pacemaker will wait until the monitor
fails and then take down the dependent resources on the healthy node ?

>
> > 3) Does a node need its own STONITH resource to be able to self fence or
> is
> > this covered off by internal pacemaker functionality ? ie: We currently
> use
> > location constraints to ensure STONITH resources don't run on themselves
> as
> > per the documentation.
>
> STONITH is started as needed.
>
> > 4) What is the best way to disable STONITH non disruptively for node
> > maintenance ? Is it a case of editing the CIB stonith-enabled directive
> to
> > false and stopping the STONITH resources then stopping openais ?
>
> Why do you need to disable STONITH for node maintenance? Just shut down
> the node cleanly (or at least stop the cluster stack on it, which will
> also stop all cluster resources) and it will not be fenced.
>

This is good to hear, I was after this clarification ... so for any
maintenance just ensure Pacemaker is shutdown cleanly.

>
> > 5) Is there an OCF compliant resource agent script for Derby / JavaDB
> that
> > anyone knows of ? We use an old init style script at the moment, I'm
> afraid
> > it will trip us up and STONITH a node on shutdown at some stage.
>
> Why would it do that?
>

The init script is not particularly solid, the very odd time it will hang
up on a stop. Sounds like a rainy day project to create an OCF compliant
script !

>
> > group NEWS VG_DB_NEWS FS_LOGS_NEWS FS_DB_NEWS IP_NEWS_15 IP_NEWS_72
> DERBYDB
>
> You have an unhealthy obsession with capslock. ;-)
>

Thankyou ;)

>
> Regards,
>     Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
> Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120906/56dce05c/attachment-0003.html>