[Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)
Mike Reid
mbreid at thepei.com
Mon Apr 4 19:34:48 UTC 2011
All,
I am running a two-node web cluster on OCFS2 (v1.5.0) via DRBD
Primary/Primary (v8.3.8) and Pacemaker. Everything seems to be working
great, except during testing of hard-boot scenarios.
Whenever I hard-boot one of the nodes, the other node is successfully fenced
and marked ³Outdated²
* <resource minor="0" cs="WFConnection" ro1="Primary"
ro2="Unknown"ds1="UpToDate" ds2="Outdated" />
However, this locks up I/O on the still active node and prevents any
operations within the cluster :( I have even forced DRBD into StandAlone
mode while in this state, but that does not resolve the I/O lock
either....does anyone know if this is possible using OCFS2 (maintaining an
active cluster in Primary/Unknown once the other node has a failure? E.g. Be
it forced, controlled, etc)
I have been focusing on DRBD config, but I am starting to wonder if perhaps
it¹s something with my Pacemaker or OCFS2 setup that is forcing this I/O
lock during a failure. Any thoughts?
-----------------------------
crm_mon (crm_mon 1.0.9 for OpenAIS and Heartbeat):
> ============
> Last updated: Mon Apr 4 12:57:47 2011
> Stack: openais
> Current DC: ubu10a - partition with quorum
> Version: 1.0.9-unknown
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
>
> Online: [ ubu10a ubu10b ]
>
> Master/Slave Set: msDRBD
> Masters: [ ubu10a ubu10b ]
> Clone Set: cloneDLM
> Started: [ ubu10a ubu10b ]
> Clone Set: cloneO2CB
> Started: [ ubu10a ubu10b ]
> Clone Set: cloneFS
> Started: [ ubu10a ubu10b ]
-----------------------------
DRBD (v8.3.8):
>
> version: 8.3.8 (api:88/proto:86-94)
> 0:repdata Connected Primary/Primary UpToDate/UpToDate C /data ocfs2
-----------------------------
DRBD Conf:
>
> global {
> usage-count no;
> }
> common {
> syncer { rate 10M; }
> }
> resource repdata {
> protocol C;
>
> meta-disk internal;
> device /dev/drbd0;
> disk /dev/sda3;
>
> handlers {
> pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
> pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
> local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
> split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> }
> startup {
> degr-wfc-timeout 120; # 120 = 2 minutes.
> wfc-timeout 30;
> become-primary-on both;
> }
> disk {
> fencing resource-only;
> }
> syncer {
> rate 10M;
> al-extents 257;
> }
> net {
> cram-hmac-alg "sha1";
> shared-secret "XXXXXXX";
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> }
> on ubu10a {
> address 192.168.0.66:7788;
> }
> on ubu10b {
> address 192.168.0.67:7788;
> }
> }
-----------------------------
CIB.xml
>
> node ubu10a \
> attributes standby="off"
> node ubu10b \
> attributes standby="off"
> primitive resDLM ocf:pacemaker:controld \
> op monitor interval="120s"
> primitive resDRBD ocf:linbit:drbd \
> params drbd_resource="repdata" \
> operations $id="resDRBD-operations" \
> op monitor interval="20s" role="Master" timeout="120s" \
> op monitor interval="30s" role="Slave" timeout="120s"
> primitive resFS ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/repdata" directory="/data"
> fstype="ocfs2" \
> op monitor interval="120s"
> primitive resO2CB ocf:pacemaker:o2cb \
> op monitor interval="120s"
> ms msDRBD resDRBD \
> meta resource-stickines="100" notify="true" master-max="2"
> interleave="true"
> clone cloneDLM resDLM \
> meta globally-unique="false" interleave="true"
> clone cloneFS resFS \
> meta interleave="true" ordered="true"
> clone cloneO2CB resO2CB \
> meta globally-unique="false" interleave="true"
> colocation colDLMDRBD inf: cloneDLM msDRBD:Master
> colocation colFSO2CB inf: cloneFS cloneO2CB
> colocation colO2CBDLM inf: cloneO2CB cloneDLM
> order ordDLMO2CB 0: cloneDLM cloneO2CB
> order ordDRBDDLM 0: msDRBD:promote cloneDLM
> order ordO2CBFS 0: cloneO2CB cloneFS
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-unknown" \
> cluster-infrastructure="openais" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> expected-quorum-votes="2"
>
>
-----------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110404/bdaa4251/attachment-0003.html>
More information about the Pacemaker
mailing list