[Pacemaker] drbd under pacemaker - always get split brain
Nikola Ciprich
nikola.ciprich at linuxbox.cz
Mon Jul 2 21:49:09 UTC 2012
hello,
I'm trying to solve quite mysterious problem here..
I've got new cluster with bunch of SAS disks for testing purposes.
I've configured DRBDs (in primary/primary configuration)
when I start drbd using drbdadm, it get's up nicely (both nodes
are Primary, connected).
however when I start it using corosync, I always get split-brain, although
there are no data written, no network disconnection, anything..
here's drbd resource config:
primitive drbd-sas0 ocf:linbit:drbd \
params drbd_resource="drbd-sas0" \
operations $id="drbd-sas0-operations" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="200s" \
op promote interval="0" timeout="200s" \
op demote interval="0" timeout="200s" \
op monitor interval="179s" role="Master" timeout="150s" \
op monitor interval="180s" role="Slave" timeout="150s"
ms ms-drbd-sas0 drbd-sas0 \
meta clone-max="2" clone-node-max="1" master-max="2" master-node-max="1" notify="true" globally-unique="false" interleave="true" target-role="Started"
here's the dmesg output when pacemaker tries to promote drbd, causing the splitbrain:
[ 157.646292] block drbd2: Starting worker thread (from drbdsetup [6892])
[ 157.646539] block drbd2: disk( Diskless -> Attaching )
[ 157.650364] block drbd2: Found 1 transactions (1 active extents) in activity log.
[ 157.650560] block drbd2: Method to ensure write ordering: drain
[ 157.650688] block drbd2: drbd_bm_resize called with capacity == 584667688
[ 157.653442] block drbd2: resync bitmap: bits=73083461 words=1141930 pages=2231
[ 157.653760] block drbd2: size = 279 GB (292333844 KB)
[ 157.671626] block drbd2: bitmap READ of 2231 pages took 18 jiffies
[ 157.673722] block drbd2: recounting of set bits took additional 2 jiffies
[ 157.673846] block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[ 157.673972] block drbd2: disk( Attaching -> UpToDate )
[ 157.674100] block drbd2: attached to UUIDs 0150944D23F16BAE:0000000000000000:8C175205284E3262:8C165205284E3263
[ 157.685539] block drbd2: conn( StandAlone -> Unconnected )
[ 157.685704] block drbd2: Starting receiver thread (from drbd2_worker [6893])
[ 157.685928] block drbd2: receiver (re)started
[ 157.686071] block drbd2: conn( Unconnected -> WFConnection )
[ 158.960577] block drbd2: role( Secondary -> Primary )
[ 158.960815] block drbd2: new current UUID 015E111F18D08945:0150944D23F16BAE:8C175205284E3262:8C165205284E3263
[ 162.686990] block drbd2: Handshake successful: Agreed network protocol version 96
[ 162.687183] block drbd2: conn( WFConnection -> WFReportParams )
[ 162.687404] block drbd2: Starting asender thread (from drbd2_receiver [6927])
[ 162.687741] block drbd2: data-integrity-alg: <not-used>
[ 162.687930] block drbd2: drbd_sync_handshake:
[ 162.688057] block drbd2: self 015E111F18D08945:0150944D23F16BAE:8C175205284E3262:8C165205284E3263 bits:0 flags:0
[ 162.688244] block drbd2: peer 7EC38CBFC3D28FFF:0150944D23F16BAF:8C175205284E3263:8C165205284E3263 bits:0 flags:0
[ 162.688428] block drbd2: uuid_compare()=100 by rule 90
[ 162.688544] block drbd2: helper command: /sbin/drbdadm initial-split-brain minor-2
[ 162.691332] block drbd2: helper command: /sbin/drbdadm initial-split-brain minor-2 exit code 0 (0x0)
to me it seems to be that it's promoting it too early, and I also wonder why there is the
"new current UUID" stuff?
I'm using centos6, kernel 3.0.36, drbd-8.3.13, pacemaker-1.1.6
could anybody please try to advice me? I'm sure I'm doing something stupid, but can't figure out what...
thanks a lot in advance
with best regards
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
email servis: servis at linuxbox.cz
-------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120702/bce2af69/attachment-0003.sig>
More information about the Pacemaker
mailing list