Thanks for the feedback Lars ... more information / questions below:<br><br><div class="gmail_quote">On Wed, Sep 5, 2012 at 9:53 PM, Lars Marowsky-Bree <span dir="ltr">&lt;<a href="mailto:lmb@suse.com" target="_blank">lmb@suse.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 2012-09-04T16:31:54, David Morton &lt;<a href="mailto:davidmorton78@gmail.com">davidmorton78@gmail.com</a>&gt; wrote:<br>


<br>

&gt; 1) I&#39;m planning on implementing sfex resources (a small LVM volume on the<br>

&gt; same volume group as the data being protected) as an additional safety<br>

&gt; feature along side the existing external/ipmi STONITH control ... is this<br>

&gt; best practice in case the IBM IMM is unavailable or credentials change etc<br>

&gt; and the STONITH is not carried out ?<br>

<br>

sfex will not help you if STONITH stalls. The stonith agent will fail if<br>

the credentials change or the IMM is unavailable, and thus will not<br>

proceed with the take-over.<br>

<br>

(Personally I&#39;ve not seen many good examples of where I&#39;d use sfex.)<br></blockquote><div> </div><div>What I&#39;m looking for here is not a backup for the existing STONITH mechanism but an additional level of (storage based) protection as we are using non-clustered filesystems. From what i read in the documentation this is the purpose of sfex ? To provide a lock resource, so even if STONITH fails silently and / or you are in a split brain situation the storage will not be mounted in more than one location ... and used in a group, no database services will start = no risk of data corruption ?</div>

<div><br></div><div>If this is not the purpose of sfex, what is the best mechanism to ensure filesystems are not mounted more than once in a cluster ? Am I just being paranoid ? ;)</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

&gt; 2) Is there any risk to a healthy node if an unhealthy node with a shared<br>

&gt; OCFS2 volume mounted goes down ? Quorum policy is set to ignore. Seems to<br>

&gt; not give any issues but I want to clarify this is the designed behavior.<br>

<br>

IO will freeze until fencing is completed.<br>

<br>

(Theoretically, a failing node can crash the others if it doesn&#39;t die<br>

&quot;cleanly&quot;, but first starts spreading bad data around. The risk of this<br>

increases with tighter coupling of nodes.)<br></blockquote><div><br></div><div>So I assume that  if IO gets held up Pacemaker will wait until the monitor fails and then take down the dependent resources on the healthy node ?</div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

&gt; 3) Does a node need its own STONITH resource to be able to self fence or is<br>

&gt; this covered off by internal pacemaker functionality ? ie: We currently use<br>

&gt; location constraints to ensure STONITH resources don&#39;t run on themselves as<br>

&gt; per the documentation.<br>

<br>

STONITH is started as needed.<br>

<br>

&gt; 4) What is the best way to disable STONITH non disruptively for node<br>

&gt; maintenance ? Is it a case of editing the CIB stonith-enabled directive to<br>

&gt; false and stopping the STONITH resources then stopping openais ?<br>

<br>

Why do you need to disable STONITH for node maintenance? Just shut down<br>

the node cleanly (or at least stop the cluster stack on it, which will<br>

also stop all cluster resources) and it will not be fenced.<br></blockquote><div><br></div><div>This is good to hear, I was after this clarification ... so for any maintenance just ensure Pacemaker is shutdown cleanly.</div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

&gt; 5) Is there an OCF compliant resource agent script for Derby / JavaDB that<br>

&gt; anyone knows of ? We use an old init style script at the moment, I&#39;m afraid<br>

&gt; it will trip us up and STONITH a node on shutdown at some stage.<br>

<br>

Why would it do that?<br></blockquote><div><br></div><div>The init script is not particularly solid, the very odd time it will hang up on a stop. Sounds like a rainy day project to create an OCF compliant script ! </div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

&gt; group NEWS VG_DB_NEWS FS_LOGS_NEWS FS_DB_NEWS IP_NEWS_15 IP_NEWS_72 DERBYDB<br>

<br>

You have an unhealthy obsession with capslock. ;-)<br></blockquote><div><br></div><div>Thankyou ;) </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Regards,<br>

    Lars<br>

<br>

--<br>

Architect Storage/HA<br>

SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)<br>

&quot;Experience is the name everyone gives to their mistakes.&quot; -- Oscar Wilde<br>

<br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

</blockquote></div><br>