<br><br><div class="gmail_quote">On Wed, Jun 15, 2011 at 4:20 PM, Dejan Muhamedagic <span dir="ltr">&lt;<a href="mailto:dejanmm@fastmail.fm">dejanmm@fastmail.fm</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Wed, Jun 15, 2011 at 03:26:56PM -0500, mark - pacemaker list wrote:<br>
&gt; On Wed, Jun 15, 2011 at 12:24 PM, imnotpc &lt;<a href="mailto:imnotpc@rock3d.net">imnotpc@rock3d.net</a>&gt; wrote:<br>
&gt;<br>
&gt; &gt;<br>
&gt; &gt; What I was thinking is that the DC is never fenced<br>
&gt;<br>
&gt;<br>
&gt; Is this actually the case?<br>
<br>
</div>In a way it is true. Only DC can order fencing and there is<br>
always exactly one DC in a partition. On split brain, each<br>
partition elects a DC and if the DC has quorum it can try to<br>
fence nodes in other partitions. That&#39;s why in two-node clusters</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
there&#39;s always a shoot-out. But note that the old DC (before<br>
split brain), if it loses quorum, gets fenced by a new DC from<br>
another partition.<br>
<div class="im"><br>
&gt; It would sure explain the one &quot;gotcha&quot; I&#39;ve<br>
&gt; never been able to work around in a three node cluster with stonith/SBD.  If<br>
&gt; you unplug the network cable from the DC (but it and the other nodes all<br>
&gt; still see the SBD disk via their other NIC(s)), the DC of course becomes<br>
&gt; completely isolated.  It will fence<br>
<br>
</div>Fence? It won&#39;t fence anything unless it has quorum. Do you have<br>
no-quorum-policy=ignore?<br></blockquote><div><br></div><div>I have no-quorum-policy=freeze.</div><div><br></div><div><br></div><div>With this status:</div><div><br></div><div><div>============</div><div>Last updated: Wed Jun 15 16:48:57 2011</div>
<div>Stack: Heartbeat</div><div>Current DC: cn1.testlab.local (814b426f-ab10-445c-9158-a1765d82395e) - partition with quorum</div><div>Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3</div><div>3 Nodes configured, unknown expected votes</div>
<div>5 Resources configured.</div><div>============</div><div><br></div><div>Online: [ cn2.testlab.local cn3.testlab.local cn1.testlab.local ]</div><div><br></div><div> Resource Group: MySQL-history</div><div>     iscsi_mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:iscsi):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div>
<div>     volgrp_mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:LVM):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div><div>     fs_mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:Filesystem):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div>
<div>     ip_mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div><div>     mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:mysql):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div>
<div>     mail_alert_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:MailTo):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div><div> Resource Group: MySQL-hsa</div>
<div>     iscsi_mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:iscsi):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div><div>     volgrp_mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:LVM):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div>
<div>     fs_mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:Filesystem):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div><div>     ip_mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div>
<div>     mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:mysql):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div><div>     mail_alert_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:MailTo):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div>
<div> Resource Group: MySQL-livedata</div><div>     iscsi_mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:iscsi):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div>
<div>     volgrp_mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:LVM):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div><div>     fs_mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:Filesystem):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div>
<div>     ip_mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div><div>     mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:mysql):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div>
<div>     mail_alert_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:MailTo):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div><div> stonith_sbd<span class="Apple-tab-span" style="white-space:pre">        </span>(stonith:external/sbd):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div>
<div> Resource Group: Cluster_Status</div><div>     cluster_status_ip<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div>
<div>     cluster_status_page<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:apache):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div></div><div><br>
</div><div><br></div><div>I isolated cn1 (the DC, but stonith_sbd was running on cn2).  In this case, one of the two good nodes became DC and cn1 was fenced, so things worked as I&#39;d expect.  The outage for cn1&#39;s resources is quite short.</div>
<div><br></div><div>However, with *this* status, where everything is the same as above except the stonith_sbd resource is also located on cn1, so it is both DC and the node running stonith_sbd:</div><div><br></div><div><div>
============</div><div>Last updated: Wed Jun 15 16:58:49 2011</div><div>Stack: Heartbeat</div><div>Current DC: cn1.testlab.local (814b426f-ab10-445c-9158-a1765d82395e) - partition with quorum</div><div>Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3</div>
<div>3 Nodes configured, unknown expected votes</div><div>5 Resources configured.</div><div>============</div><div><br></div><div>Online: [ cn2.testlab.local cn3.testlab.local cn1.testlab.local ]</div><div><br></div><div>
 Resource Group: MySQL-history</div><div>     iscsi_mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:iscsi):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div>
<div>     volgrp_mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:LVM):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div><div>     fs_mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:Filesystem):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div>
<div>     ip_mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div><div>     mysql_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:mysql):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div>
<div>     mail_alert_history<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:MailTo):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div><div> Resource Group: MySQL-hsa</div>
<div>     iscsi_mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:iscsi):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div><div>     volgrp_mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:LVM):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div>
<div>     fs_mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:Filesystem):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div><div>     ip_mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div>
<div>     mysql_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:mysql):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div><div>     mail_alert_hsa<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:MailTo):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div>
<div> Resource Group: MySQL-livedata</div><div>     iscsi_mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:iscsi):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div>
<div>     volgrp_mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:LVM):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div><div>     fs_mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:Filesystem):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div>
<div>     ip_mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div><div>     mysql_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:mysql):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div>
<div>     mail_alert_livedata<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:MailTo):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn3.testlab.local</div><div> stonith_sbd<span class="Apple-tab-span" style="white-space:pre">        </span>(stonith:external/sbd):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn1.testlab.local</div>
<div> Resource Group: Cluster_Status</div><div>     cluster_status_ip<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div>
<div>     cluster_status_page<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:apache):<span class="Apple-tab-span" style="white-space:pre">        </span>Started cn2.testlab.local</div></div><div><br>
</div><div> </div><div><br></div><div>... when I isolated cn1, it almost immediately fenced cn3.  Approx 30 seconds later cn2 promotes itself to DC as it&#39;s the only surviving node with network connectivity, but of course cn3 is just trying to come back up after a reboot so it isn&#39;t participating yet.  I have two nodes that think they&#39;re DC, neither with quorum.  That&#39;s where I decided to change no-quorum-policy to freeze, because at this time all services would shut down completely.  With freeze, at least the services on the surviving good node stay up.</div>
<div><br></div><div>Once cn3 finishes booting pacemaker starts, then cn2 and cn3 form a quorum and cn1 finally gets fenced, and all resources are able to start on machines with network connectivity.  The outage in this case has of course been quite a bit longer than the previous one.</div>
<div><br></div><div>Regards,</div><div>Mark</div></div>