<HTML>
<HEAD>
<TITLE>How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)</TITLE>
</HEAD>
<BODY>
<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>All,<BR>
<BR>
I am running a two-node web cluster on OCFS2 (v1.5.0) via DRBD Primary/Primary (v8.3.8) and Pacemaker. Everything &nbsp;seems to be working great, except during testing of hard-boot scenarios.<BR>
<BR>
Whenever I hard-boot one of the nodes, the other node is successfully fenced and marked &#8220;Outdated&#8221;<BR>
<BR>
* &lt;resource minor=&quot;0&quot; cs=&quot;WFConnection&quot; ro1=&quot;Primary&quot; ro2=&quot;Unknown&quot;ds1=&quot;UpToDate&quot; ds2=&quot;Outdated&quot; /&gt;<BR>
<BR>
However, this locks up I/O on the still active node and prevents any operations within the cluster :( I have even forced DRBD into StandAlone mode while in this state, but that does not resolve the I/O lock either....does anyone know if this is possible using OCFS2 (maintaining an active cluster in Primary/Unknown once the other node has a failure? E.g. Be it forced, controlled, etc)<BR>
<BR>
I have been focusing on DRBD config, but I am starting to wonder if perhaps it&#8217;s something with my Pacemaker or OCFS2 setup that is forcing this I/O lock during a failure. &nbsp;Any thoughts?<BR>
<B><BR>
</B>-----------------------------<BR>
<B>crm_mon (crm_mon 1.0.9 for OpenAIS and Heartbeat):<BR>
</B><BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>============<BR>
Last updated: Mon Apr &nbsp;4 12:57:47 2011<BR>
Stack: openais<BR>
Current DC: ubu10a - partition with quorum<BR>
Version: 1.0.9-unknown<BR>
2 Nodes configured, 2 expected votes<BR>
4 Resources configured.<BR>
============<BR>
<BR>
Online: [ ubu10a ubu10b ]<BR>
<BR>
&nbsp;Master/Slave Set: msDRBD<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Masters: [ ubu10a ubu10b ]<BR>
&nbsp;Clone Set: cloneDLM<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started: [ ubu10a ubu10b ]<BR>
&nbsp;Clone Set: cloneO2CB<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started: [ ubu10a ubu10b ]<BR>
&nbsp;Clone Set: cloneFS<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started: [ ubu10a ubu10b ]<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
-----------------------------<BR>
<B>DRBD (v8.3.8):<BR>
</B></SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
version: 8.3.8 (api:88/proto:86-94)<BR>
0:repdata &nbsp;Connected &nbsp;Primary/Primary &nbsp;UpToDate/UpToDate &nbsp;C &nbsp;/data &nbsp;&nbsp;&nbsp;ocfs2<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
-----------------------------<BR>
<B>DRBD Conf:<BR>
</B></SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
global {<BR>
&nbsp;&nbsp;usage-count no;<BR>
}<BR>
common {<BR>
&nbsp;&nbsp;syncer { rate 10M; }<BR>
}<BR>
resource repdata {<BR>
&nbsp;&nbsp;protocol C;<BR>
<BR>
&nbsp;&nbsp;meta-disk internal;<BR>
&nbsp;&nbsp;device /dev/drbd0;<BR>
&nbsp;&nbsp;disk /dev/sda3;<BR>
<BR>
&nbsp;&nbsp;handlers {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;pri-on-incon-degr &quot;echo o &gt; /proc/sysrq-trigger ; halt -f&quot;;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;pri-lost-after-sb &quot;echo o &gt; /proc/sysrq-trigger ; halt -f&quot;;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;local-io-error &quot;echo o &gt; /proc/sysrq-trigger ; halt -f&quot;;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;split-brain &quot;/usr/lib/drbd/notify-split-brain.sh root&quot;;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;fence-peer &quot;/usr/lib/drbd/crm-fence-peer.sh&quot;;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;after-resync-target &quot;/usr/lib/drbd/crm-unfence-peer.sh&quot;;<BR>
&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;startup {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;degr-wfc-timeout 120; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# 120 = 2 minutes.<BR>
&nbsp;&nbsp;&nbsp;&nbsp;wfc-timeout 30;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;become-primary-on both;<BR>
&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;disk {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;fencing resource-only;<BR>
&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;syncer {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;rate 10M;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;al-extents 257;<BR>
&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;net {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;cram-hmac-alg &quot;sha1&quot;;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;shared-secret &quot;XXXXXXX&quot;;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;allow-two-primaries;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;after-sb-0pri discard-zero-changes;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;after-sb-1pri discard-secondary;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;after-sb-2pri disconnect;<BR>
&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;on ubu10a {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;address 192.168.0.66:7788;<BR>
&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;on ubu10b {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;address 192.168.0.67:7788;<BR>
&nbsp;&nbsp;}<BR>
}<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
<BR>
-----------------------------<BR>
<B>CIB.xml<BR>
</B></SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
node ubu10a \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;attributes standby=&quot;off&quot;<BR>
node ubu10b \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;attributes standby=&quot;off&quot;<BR>
primitive resDLM ocf:pacemaker:controld \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op monitor interval=&quot;120s&quot;<BR>
primitive resDRBD ocf:linbit:drbd \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;params drbd_resource=&quot;repdata&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;operations $id=&quot;resDRBD-operations&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op monitor interval=&quot;20s&quot; role=&quot;Master&quot; timeout=&quot;120s&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op monitor interval=&quot;30s&quot; role=&quot;Slave&quot; timeout=&quot;120s&quot;<BR>
primitive resFS ocf:heartbeat:Filesystem \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;params device=&quot;/dev/drbd/by-res/repdata&quot; directory=&quot;/data&quot; fstype=&quot;ocfs2&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op monitor interval=&quot;120s&quot;<BR>
primitive resO2CB ocf:pacemaker:o2cb \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op monitor interval=&quot;120s&quot;<BR>
ms msDRBD resDRBD \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;meta resource-stickines=&quot;100&quot; notify=&quot;true&quot; master-max=&quot;2&quot; interleave=&quot;true&quot;<BR>
clone cloneDLM resDLM \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;meta globally-unique=&quot;false&quot; interleave=&quot;true&quot;<BR>
clone cloneFS resFS \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;meta interleave=&quot;true&quot; ordered=&quot;true&quot;<BR>
clone cloneO2CB resO2CB \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;meta globally-unique=&quot;false&quot; interleave=&quot;true&quot;<BR>
colocation colDLMDRBD inf: cloneDLM msDRBD:Master<BR>
colocation colFSO2CB inf: cloneFS cloneO2CB<BR>
colocation colO2CBDLM inf: cloneO2CB cloneDLM<BR>
order ordDLMO2CB 0: cloneDLM cloneO2CB<BR>
order ordDRBDDLM 0: msDRBD:promote cloneDLM<BR>
order ordO2CBFS 0: cloneO2CB cloneFS<BR>
property $id=&quot;cib-bootstrap-options&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dc-version=&quot;1.0.9-unknown&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cluster-infrastructure=&quot;openais&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;stonith-enabled=&quot;false&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;no-quorum-policy=&quot;ignore&quot; \<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected-quorum-votes=&quot;2&quot;<BR>
<BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><B><BR>
</B>----------------------------- <BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
</SPAN></FONT>
</BODY>
</HTML>