[Pacemaker] drbd under pacemaker - always get split brain

Nikola Ciprich nikola.ciprich at linuxbox.cz
Wed Jul 11 05:38:52 EDT 2012


> Well, I'd expect that to be safer as your current configuration ...
> discard-zero-changes will never overwrite data automatically .... have
> you tried adding the start-delay to DRBD start operation? I'm curious if
> that is already sufficient for your problem.
Hi,

tried 
<op id="drbd-sas0-start-0" interval="0" name="start" start-delay="10s" timeout="240s"/>
(I hope it's the setting You've meant, although I'm not sure, I haven't found any documentation
on start-delay option)

but didn't help..




> 
> Regards,
> Andreas
> 
> > 
> >>>
> >>> Best Regards,
> >>> Andreas
> >>>
> >>> --
> >>> Need help with Pacemaker?
> >>> http://www.hastexo.com/now
> >>>
> >>>>
> >>>> thanks for Your time.
> >>>> n.
> >>>>
> >>>>
> >>>>>
> >>>>> Regards,
> >>>>> Andreas
> >>>>>
> >>>>> --
> >>>>> Need help with Pacemaker?
> >>>>> http://www.hastexo.com/now
> >>>>>
> >>>>>>
> >>>>>> thanks a lot in advance
> >>>>>>
> >>>>>> nik
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Jul 08, 2012 at 12:47:16AM +0200, Andreas Kurz wrote:
> >>>>>>> On 07/02/2012 11:49 PM, Nikola Ciprich wrote:
> >>>>>>>> hello,
> >>>>>>>>
> >>>>>>>> I'm trying to solve quite mysterious problem here..
> >>>>>>>> I've got new cluster with bunch of SAS disks for testing purposes.
> >>>>>>>> I've configured DRBDs (in primary/primary configuration)
> >>>>>>>>
> >>>>>>>> when I start drbd using drbdadm, it get's up nicely (both nodes
> >>>>>>>> are Primary, connected).
> >>>>>>>> however when I start it using corosync, I always get split-brain, although
> >>>>>>>> there are no data written, no network disconnection, anything..
> >>>>>>>
> >>>>>>> your full drbd and Pacemaker configuration please ... some snippets from
> >>>>>>> something are very seldom helpful ...
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Andreas
> >>>>>>>
> >>>>>>> --
> >>>>>>> Need help with Pacemaker?
> >>>>>>> http://www.hastexo.com/now
> >>>>>>>
> >>>>>>>>
> >>>>>>>> here's drbd resource config:
> >>>>>>>> primitive drbd-sas0 ocf:linbit:drbd \
> >>>>>>>>     params drbd_resource="drbd-sas0" \
> >>>>>>>>     operations $id="drbd-sas0-operations" \
> >>>>>>>>     op start interval="0" timeout="240s" \
> >>>>>>>>     op stop interval="0" timeout="200s" \
> >>>>>>>>     op promote interval="0" timeout="200s" \
> >>>>>>>>     op demote interval="0" timeout="200s" \
> >>>>>>>>     op monitor interval="179s" role="Master" timeout="150s" \
> >>>>>>>>     op monitor interval="180s" role="Slave" timeout="150s"
> >>>>>>>>
> >>>>>>>> ms ms-drbd-sas0 drbd-sas0 \
> >>>>>>>>    meta clone-max="2" clone-node-max="1" master-max="2" master-node-max="1" notify="true" globally-unique="false" interleave="true" target-role="Started"
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> here's the dmesg output when pacemaker tries to promote drbd, causing the splitbrain:
> >>>>>>>> [  157.646292] block drbd2: Starting worker thread (from drbdsetup [6892])
> >>>>>>>> [  157.646539] block drbd2: disk( Diskless -> Attaching )
> >>>>>>>> [  157.650364] block drbd2: Found 1 transactions (1 active extents) in activity log.
> >>>>>>>> [  157.650560] block drbd2: Method to ensure write ordering: drain
> >>>>>>>> [  157.650688] block drbd2: drbd_bm_resize called with capacity == 584667688
> >>>>>>>> [  157.653442] block drbd2: resync bitmap: bits=73083461 words=1141930 pages=2231
> >>>>>>>> [  157.653760] block drbd2: size = 279 GB (292333844 KB)
> >>>>>>>> [  157.671626] block drbd2: bitmap READ of 2231 pages took 18 jiffies
> >>>>>>>> [  157.673722] block drbd2: recounting of set bits took additional 2 jiffies
> >>>>>>>> [  157.673846] block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> >>>>>>>> [  157.673972] block drbd2: disk( Attaching -> UpToDate )
> >>>>>>>> [  157.674100] block drbd2: attached to UUIDs 0150944D23F16BAE:0000000000000000:8C175205284E3262:8C165205284E3263
> >>>>>>>> [  157.685539] block drbd2: conn( StandAlone -> Unconnected )
> >>>>>>>> [  157.685704] block drbd2: Starting receiver thread (from drbd2_worker [6893])
> >>>>>>>> [  157.685928] block drbd2: receiver (re)started
> >>>>>>>> [  157.686071] block drbd2: conn( Unconnected -> WFConnection )
> >>>>>>>> [  158.960577] block drbd2: role( Secondary -> Primary )
> >>>>>>>> [  158.960815] block drbd2: new current UUID 015E111F18D08945:0150944D23F16BAE:8C175205284E3262:8C165205284E3263
> >>>>>>>> [  162.686990] block drbd2: Handshake successful: Agreed network protocol version 96
> >>>>>>>> [  162.687183] block drbd2: conn( WFConnection -> WFReportParams )
> >>>>>>>> [  162.687404] block drbd2: Starting asender thread (from drbd2_receiver [6927])
> >>>>>>>> [  162.687741] block drbd2: data-integrity-alg: <not-used>
> >>>>>>>> [  162.687930] block drbd2: drbd_sync_handshake:
> >>>>>>>> [  162.688057] block drbd2: self 015E111F18D08945:0150944D23F16BAE:8C175205284E3262:8C165205284E3263 bits:0 flags:0
> >>>>>>>> [  162.688244] block drbd2: peer 7EC38CBFC3D28FFF:0150944D23F16BAF:8C175205284E3263:8C165205284E3263 bits:0 flags:0
> >>>>>>>> [  162.688428] block drbd2: uuid_compare()=100 by rule 90
> >>>>>>>> [  162.688544] block drbd2: helper command: /sbin/drbdadm initial-split-brain minor-2
> >>>>>>>> [  162.691332] block drbd2: helper command: /sbin/drbdadm initial-split-brain minor-2 exit code 0 (0x0)
> >>>>>>>>
> >>>>>>>> to me it seems to be that it's promoting it too early, and I also wonder why there is the
> >>>>>>>> "new current UUID" stuff?
> >>>>>>>>
> >>>>>>>> I'm using centos6, kernel 3.0.36, drbd-8.3.13, pacemaker-1.1.6
> >>>>>>>>
> >>>>>>>> could anybody please try to advice me? I'm sure I'm doing something stupid, but can't figure out what...
> >>>>>>>>
> >>>>>>>> thanks a lot in advance
> >>>>>>>>
> >>>>>>>> with best regards
> >>>>>>>>
> >>>>>>>> nik
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>>>>
> >>>>>>>> Project Home: http://www.clusterlabs.org
> >>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>>>
> >>>>>>> Project Home: http://www.clusterlabs.org
> >>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>>
> >>>>>> Project Home: http://www.clusterlabs.org
> >>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>> _______________________________________________
> >>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>
> >>>>> Project Home: http://www.clusterlabs.org
> >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>> --
> >>>> -------------------------------------
> >>>> Ing. Nikola CIPRICH
> >>>> LinuxBox.cz, s.r.o.
> >>>> 28.rijna 168, 709 00 Ostrava
> >>>>
> >>>> tel.:   +420 591 166 214
> >>>> fax:    +420 596 621 273
> >>>> mobil:  +420 777 093 799
> >>>> www.linuxbox.cz
> >>>>
> >>>> mobil servis: +420 737 238 656
> >>>> email servis: servis at linuxbox.cz
> >>>> -------------------------------------
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> > 
> > 
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > 
> 
> 
> 
> -- 
> Need help with Pacemaker?
> http://www.hastexo.com/now
> 
> 



> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis at linuxbox.cz
-------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120711/52000f87/attachment-0003.sig>


More information about the Pacemaker mailing list