[Pacemaker] Dual primary drbd resouce not promoted on one host

Tue Feb 5 12:04:26 UTC 2013

Hi there!

I have the following problem:

I have a 2 node cluster with a dual primary drbd resource. On top
of it sits an OCFS2 file system. nodes: app1a, app1b

Now today I had the following scenario (occurred several times now):
- crm node standby app1a
- poweroff app1a for hdd replacement (hw raid controller)
- poweron app1a
- crm node online app1a

all the other resources come back up as expecte, expect the master
slave set for the dual primary drbd.

here's the relevant portion of my cluster config:

node app1a.xlhost.de \
         attributes standby="off"
node app1b.xlhost.de \
         attributes standby="off"
primitive resDLM ocf:pacemaker:controld \
         op start interval="0" timeout="90s" \
         op stop interval="0" timeout="100s" \
         op monitor interval="120s"
primitive resDRBD0 ocf:linbit:drbd \
         op monitor interval="23" role="Slave" timeout="30" \
         op monitor interval="13" role="Master" timeout="20" \
         op start interval="0" timeout="240s" \
         op promote interval="0" timeout="240s" \
         op demote interval="0" timeout="100s" \
         op stop interval="0" timeout="100s" \
         params drbd_resource="drbd0"
primitive resFSDRBD0 ocf:heartbeat:Filesystem \
         params device="/dev/drbd0" directory="/mnt/drbd0" 
fstype="ocfs2" options="noatime,intr,nodiratime,heartbeat=none" \
         op monitor interval="120s" timeout="50s" \
         op start interval="0" timeout="70s" \
         op stop interval="0" timeout="70s"
primitive resO2CB ocf:pacemaker:o2cb \
         op start interval="0" timeout="90s" \
         op stop interval="0" timeout="100s" \
         op monitor interval="120s"
ms msDRBD0 resDRBD0 \
         meta master-max="2" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" target-role="Master"
clone cloneDLM resDLM \
         meta globally-unique="false" interleave="true" 
target-role="Started"
clone cloneFSDRBD0 resFSDRBD0 \
         meta interleave="true" globally-unique="false" 
target-role="Started"
clone cloneO2CB resO2CB \
         meta globally-unique="false" interleave="true" 
target-role="Started"
colocation colFSDRBD0_DRBD0 inf: cloneFSDRBD0 msDRBD0:Master
colocation colFSDRBD0_O2CB inf: cloneFSDRBD0 cloneO2CB
colocation colO2CB_DLM inf: cloneO2CB cloneDLM
order ordDLM_FSDRBD0 inf: cloneDLM cloneFSDRBD0
order ordDLM_O2CB inf: cloneDLM cloneO2CB
order ordDRBD0_FSDRBD0 inf: msDRBD0:promote cloneFSDRBD0
order ordO2CB_FSDRBD0 inf: cloneO2CB cloneFSDRBD0

if i take down both nodes and fire them up again, everything goes back
to normal and msDRBD0 is promoted to master on both nodes.

I suspect this has something to do with ordering or colocation 
constraints
but i'm not sure though. i've been staring at this problem for dozens 
of
times now and a vast amount of googling did not turn up my specific
problem either.

anybody have a clue? :) any hint in the right direction as where too 
look
etc. would really be appreciated.

Thanks in advance for your help and best regards,
Jürgen Herrmann
-- 
>> XLhost.de ® - Webhosting von supersmall bis eXtra Large <<

XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de