[Pacemaker] Dual primary drbd resouce not promoted on one host

Jake Smith jsmith at argotec.com
Tue Feb 5 23:00:20 UTC 2013


----- Original Message -----
> From: "Jürgen Herrmann" <Juergen.Herrmann at XLhost.de>
> To: "Jake Smith" <jsmith at argotec.com>, "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Tuesday, February 5, 2013 4:00:48 PM
> Subject: Re: [Pacemaker] Dual primary drbd resouce not promoted on one host
> 
> Am 05.02.2013 16:32, schrieb Jake Smith:
> > ----- Original Message -----
> >> From: "Jürgen Herrmann" <Juergen.Herrmann at XLhost.de>
> >> To: pacemaker at oss.clusterlabs.org
> >> Sent: Tuesday, February 5, 2013 7:04:26 AM
> >> Subject: [Pacemaker] Dual primary drbd resouce not promoted on one
> >> host
> >>
> >> Hi there!
> >>
> >> I have the following problem:
> >>
> >> I have a 2 node cluster with a dual primary drbd resource. On top
> >> of it sits an OCFS2 file system. nodes: app1a, app1b
> >>
> >> Now today I had the following scenario (occurred several times
> >> now):
> >> - crm node standby app1a
> >> - poweroff app1a for hdd replacement (hw raid controller)
> >> - poweron app1a
> >> - crm node online app1a
> >>
> >> all the other resources come back up as expecte, expect the master
> >> slave set for the dual primary drbd.
> >>
> >> here's the relevant portion of my cluster config:
> >>
> >> node app1a.xlhost.de \
> >>          attributes standby="off"
> >> node app1b.xlhost.de \
> >>          attributes standby="off"
> >> primitive resDLM ocf:pacemaker:controld \
> >>          op start interval="0" timeout="90s" \
> >>          op stop interval="0" timeout="100s" \
> >>          op monitor interval="120s"
> >> primitive resDRBD0 ocf:linbit:drbd \
> >>          op monitor interval="23" role="Slave" timeout="30" \
> >>          op monitor interval="13" role="Master" timeout="20" \
> >>          op start interval="0" timeout="240s" \
> >>          op promote interval="0" timeout="240s" \
> >>          op demote interval="0" timeout="100s" \
> >>          op stop interval="0" timeout="100s" \
> >>          params drbd_resource="drbd0"
> >> primitive resFSDRBD0 ocf:heartbeat:Filesystem \
> >>          params device="/dev/drbd0" directory="/mnt/drbd0"
> >> fstype="ocfs2" options="noatime,intr,nodiratime,heartbeat=none" \
> >>          op monitor interval="120s" timeout="50s" \
> >>          op start interval="0" timeout="70s" \
> >>          op stop interval="0" timeout="70s"
> >> primitive resO2CB ocf:pacemaker:o2cb \
> >>          op start interval="0" timeout="90s" \
> >>          op stop interval="0" timeout="100s" \
> >>          op monitor interval="120s"
> >> ms msDRBD0 resDRBD0 \
> >>          meta master-max="2" master-node-max="1" clone-max="2"
> >> clone-node-max="1" notify="true" target-role="Master"
> >> clone cloneDLM resDLM \
> >>          meta globally-unique="false" interleave="true"
> >> target-role="Started"
> >> clone cloneFSDRBD0 resFSDRBD0 \
> >>          meta interleave="true" globally-unique="false"
> >> target-role="Started"
> >> clone cloneO2CB resO2CB \
> >>          meta globally-unique="false" interleave="true"
> >> target-role="Started"
> >> colocation colFSDRBD0_DRBD0 inf: cloneFSDRBD0 msDRBD0:Master
> >
> > ^^^ This colocation should be cloneDLM on msDRBD0.
> >
> >> colocation colFSDRBD0_O2CB inf: cloneFSDRBD0 cloneO2CB
> >> colocation colO2CB_DLM inf: cloneO2CB cloneDLM
> >> order ordDLM_FSDRBD0 inf: cloneDLM cloneFSDRBD0
> >
> > ^^^ This order statement is not needed.
> >
> >> order ordDLM_O2CB inf: cloneDLM cloneO2CB
> >> order ordDRBD0_FSDRBD0 inf: msDRBD0:promote cloneFSDRBD0
> >
> > ^^^ This order should be msDRBD0:promote then cloneDLM:start
> >
> > If you explicitly define the action in an order statement for the
> > resource then the same action is implied for the rest of the
> > resources.  So your statement is going to try to promote
> > cloneFSDRBD0.
> > You should define both actions explicitly like this:
> >
> > order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
> >
> >> order ordO2CB_FSDRBD0 inf: cloneO2CB cloneFSDRBD0
> >>
> >
> >
> >> if i take down both nodes and fire them up again, everything goes
> >> back
> >> to normal and msDRBD0 is promoted to master on both nodes.
> >>
> >> I suspect this has something to do with ordering or colocation
> >> constraints
> >> but i'm not sure though. i've been staring at this problem for
> >> dozens
> >> of
> >> times now and a vast amount of googling did not turn up my
> >> specific
> >> problem either.
> >
> > I'm pretty sure you are correct.  I haven't used/tested OCFS on
> > Pacemaker in awhile but I believe this is the correct
> > ordering/collocation you're looking for (same as my notes above):
> >
> > Order - DRBD:promote then DLM:start then O2CB:start then FS:start
> > Collocation - FS on O2CB on DLM on DRBD:master
> >
> 
> Hi Jake!
> 
> Thanks very much for your comments!
> 
> To sum it up i rewrote all six order/colo statements here:
> 
> colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master
> colocation colO2CB_DLM inf: cloneO2CB cloneDLM
> colocation colFSDRBD0_O2CB inf: cloneFSDRBD0 cloneO2CB
> 
> order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
> order ordDLM_O2CB inf: cloneDLM cloneO2CB
> order ordO2CB_FSDRBD0 inf: cloneO2CB cloneFSDRBD0
> 
> will try this sometime in the upcoming nights and will report back,
> maybe in the meantime you could have a look at the statements again
> to doublecheck? thanks in advance.

Looks good to me (assuming I recall correctly that dlm needs to start before o2cb).

> 
> best regards,
> Jürgen Herrmann
> 
> --
> >> XLhost.de ® - Webhosting von supersmall bis eXtra Large <<
> 
> XLhost.de GmbH
> Jürgen Herrmann, Geschäftsführer
> Boelckestrasse 21, 93051 Regensburg, Germany
> 
> Geschäftsführer: Jürgen Herrmann
> Registriert unter: HRB9918
> Umsatzsteuer-Identifikationsnummer: DE245931218
> 
> Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
> Fax:  +49 (0)800 95467830
> Web:  http://www.XLhost.de
> 
> 




More information about the Pacemaker mailing list