[Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd

Thu Jul 1 16:03:18 UTC 2010

Hi Bart,
.
Just some more thoughts:

Are you sure that drbd was really stopped? 
Does this error also happen after a clean restart (without drbd starting 
at runlevel), i.e. "lsmod | grep drbd"  without results?
How long does it take if you setup drbd (attach,syncer,connect,primary) 
manually? 
What happens when you start openais on only one node?

The syncer rate seems a bit high to me (
http://www.drbd.org/users-guide/s-configure-syncer-rate.html#eq-syncer-rate-example1
), but that should not be the problem.

HTH,
Martin

"Bart Willems" <bart at atipa.com> wrote on 01.07.2010 16:42:26:

> [image removed] 
> 
> Re: [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd
> 
> Bart Willems 
> 
> to:
> 
> 'The Pacemaker cluster resource manager'
> 
> 01.07.2010 16:46
> 
> Please respond to bart, The Pacemaker cluster resource manager 
> 
> Hi Martin,
> 
> No luck I 'm afraid. I first added a start-delay to the monitor 
operations,
> and when that didn't work I also added a start-delay to the start 
operation:
> 
> primitive drbd-storage ocf:linbit:drbd \
>         params drbd_resource="r0" \
>         op monitor interval="10" role="Master" timeout="60" 
start-delay="1m"
> \
>         op start interval="0" timeout="240s" start-delay="1m" \
>         op stop interval="0" timeout="100s" \
>         op monitor interval="20" role="Slave" timeout="60" 
start-delay="1m"
> 
> Thanks,
> Bart
> 
> -----Original Message-----
> From: martin.braun at icw.de [mailto:martin.braun at icw.de] 
> Sent: Thursday, July 01, 2010 3:37
> To: bart at atipa.com; The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] pacemaker fails to start drbd using 
ocf:linbit:drbd
> 
> Hi Bart,
> 
> my guess is that you did  forget the start-delay attribute for the 
monitor 
> operations, that's why you see the time-out error message.
> 
> Here is an example:
> 
> 
>         op monitor interval="20" role="Slave" timeout="20" 
> start-delay="1m" \
>         op monitor interval="10" role="Master" timeout="20" 
> start-delay="1m" \
>         op start interval="0" timeout="240s" \
>         op stop interval="0" timeout="100s" \
>         params drbd_resource="r0" drbdconf="/usr/local/etc/drbd.conf"
> 
> HTH,
> Martin
> 
> 
> 
> "Bart Willems" <bart at atipa.com> wrote on 30.06.2010 21:57:35:
> 
> > [image removed] 
> > 
> > [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd
> > 
> > Bart Willems 
> > 
> > to:
> > 
> > pacemaker
> > 
> > 30.06.2010 21:56
> > 
> > [image removed] 
> > 
> > From:
> > 
> > "Bart Willems" <bart at atipa.com>
> > 
> > To:
> > 
> > <pacemaker at oss.clusterlabs.org>
> > 
> > Please respond to bart at atipa.com, The Pacemaker cluster resource 
> > manager <pacemaker at oss.clusterlabs.org>
> > 
> > Hi All,
> > 
> > I am setting SLES11 SP1 HA on 2 nodes and have configures a 
master/slave
> > drbd resource. I can start drbd, promote/demote hosts. mount/use the 
> file
> > system from the command line, but pacemaker fails to properly start up 

> the
> > drdb service. The 2 nodes are named storm (master) and storm-b 
(slave). 
> > 
> > Details of my setup are:
> > 
> > **********
> > * storm: *
> > **********
> > 
> > eth0: 172.16.0.1/16 (static)
> > eth1: 172.20.168.239 (dhcp)
> > ipmi: 172.16.1.1/16 (static)
> > 
> > ************
> > * storm-b: *
> > ************
> > 
> > eth0: 172.16.0.2/16 (static)
> > eth1: 172.20.168.114 (dhcp)
> > ipmi: 172.16.1.2/16 (static)
> > 
> > ***********************
> > * drbd configuration: *
> > ***********************
> > 
> > storm:~ # cat /etc/drbd.conf 
> > #
> > # please have a a look at the example configuration file in
> > # /usr/share/doc/packages/drbd-utils/drbd.conf
> > #
> > # Note that you can use the YaST2 drbd module to configure this
> > # service!
> > #
> > include "drbd.d/global_common.conf";
> > include "drbd.d/*.res";
> > 
> > storm:~ # cat /etc/drbd.d/r0.res 
> > resource r0 {
> >         device /dev/drbd_r0 minor 0;
> >         meta-disk internal;
> >         on storm {
> >                 disk /dev/sdc1;
> >                 address 172.16.0.1:7811;
> >         }
> >         on storm-b {
> >                 disk /dev/sde1;
> >                 address 172.16.0.2:7811;
> >         }
> >         syncer  {
> >                 rate    120M;
> >         }
> > }
> > 
> > ***********************************
> > * Output of "crm configure show": *
> > ***********************************
> > 
> > storm:~ # crm configure show
> > node storm
> > node storm-b
> > primitive backupExec-ip ocf:heartbeat:IPaddr \
> >         params ip="172.16.0.10" cidr_netmask="16" nic="eth0" \
> >         op monitor interval="30s"
> > primitive drbd-storage ocf:linbit:drbd \
> >         params drbd_resource="r0" \
> >         op monitor interval="60" role="Master" timeout="60" \
> >         op start interval="0" timeout="240" \
> >         op stop interval="0" timeout="100" \
> >         op monitor interval="61" role="Slave" timeout="60"
> > primitive drbd-storage-fs ocf:heartbeat:Filesystem \
> >         params device="/dev/drbd0" directory="/disk1" fstype="ext3"
> > primitive public-ip ocf:heartbeat:IPaddr \
> >         meta target-role="started" \
> >         operations $id="public-ip-operations" \
> >         op monitor interval="30s" \
> >         params ip="143.219.41.20" cidr_netmask="24" nic="eth1"
> > primitive storm-fencing stonith:external/ipmi \
> >         meta target-role="started" \
> >         operations $id="storm-fencing-operations" \
> >         op monitor interval="60" timeout="20" \
> >         op start interval="0" timeout="20" \
> >         params hostname="storm" ipaddr="172.16.1.1" userid="****"
> > passwd="****" interface="lan"
> > ms drbd-storage-masterslave drbd-storage \
> >         meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true" globally-unique="false"
> > target-role="started"
> > location drbd-storage-master-location drbd-storage-masterslave +inf: 
> storm
> > location storm-fencing-location storm-fencing +inf: storm-b
> > colocation drbd-storage-fs-together inf: drbd-storage-fs
> > drbd-storage-masterslave:Master
> > order drbd-storage-fs-startup-order inf: 
> drbd-storage-masterslave:promote
> > drbd-storage-fs:start
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \
> >         cluster-infrastructure="openais" \
> >         expected-quorum-votes="2" \
> >         no-quorum-policy="ignore" \
> >         last-lrm-refresh="1277922623" \
> >         node-health-strategy="only-green" \
> >         stonith-enabled="true" \
> >         stonith-action="poweroff"
> > op_defaults $id="op_defaults-options" \
> >         record-pending="false"
> > 
> > ************************************
> > * Output of "crm_mon -o" on storm: *
> > ************************************
> > 
> > storm:~ # crm_mon -o 
> > Attempting connection to the cluster...
> > ============
> > Last updated: Wed Jun 30 15:25:15 2010
> > Stack: openais
> > Current DC: storm - partition with quorum
> > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5
> > 2 Nodes configured, 2 expected votes
> > 5 Resources configured.
> > ============
> > 
> > Online: [ storm storm-b ]
> > 
> > storm-fencing   (stonith:external/ipmi):        Started storm-b
> > backupExec-ip   (ocf::heartbeat:IPaddr):        Started storm
> > public-ip       (ocf::heartbeat:IPaddr):        Started storm
> > 
> > Operations:
> > * Node storm: 
> >    public-ip: migration-threshold=1000000
> >     + (8) start: rc=0 (ok)
> >     + (11) monitor: interval=30000ms rc=0 (ok)
> >    backupExec-ip: migration-threshold=1000000
> >     + (7) start: rc=0 (ok)
> >     + (10) monitor: interval=30000ms rc=0 (ok)
> >    drbd-storage:0: migration-threshold=1000000 fail-count=1000000
> >     + (9) start: rc=-2 (unknown exec error)
> >     + (14) stop: rc=0 (ok)
> > * Node storm-b: 
> >    storm-fencing: migration-threshold=1000000    + (7) start: rc=0 
(ok) 
>  +
> > (9) monitor: interval=6)
> > 
> > ************************************** 
> > * Output of "crm_mon -o" on storm-b: *
> > **************************************
> > 
> > storm-b:~ # crm_mon -o
> > Attempting connection to the cluster...
> > ============
> > Last updated: Wed Jun 30 15:25:25 2010
> > Stack: openais
> > Current DC: storm - partition with quorum
> > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5
> > 2 Nodes configured, 2 expected votes
> > 5 Resources configured.
> > ============
> > 
> > Online: [ storm storm-b ]
> > 
> > storm-fencing   (stonith:external/ipmi):        Started storm-b
> > backupExec-ip   (ocf::heartbeat:IPaddr):        Started storm
> > public-ip       (ocf::heartbeat:IPaddr):        Started storm
> > 
> > Operations:
> > * Node storm: 
> >    public-ip: migration-threshold=1000000
> >     + (8) start: rc=0 (ok)
> >     + (11) monitor: interval=30000ms rc=0 (ok)
> >    backupExec-ip: migration-threshold=1000000
> >     + (7) start: rc=0 (ok)
> >     + (10) monitor: interval=30000ms rc=0 (ok)
> >    drbd-storage:0: migration-threshold=1000000 fail-count=1000000
> >     + (9) start: rc=-2 (unknown exec error)
> >     + (14) stop: rc=0 (ok)
> > * Node storm-b: 
> >    storm-fencing: migration-threshold=1000000
> >     + (7) start: rc=0 (ok)
> >     + (9) monitor: interval=60000ms rc=0 (ok)
> >    drbd-storage:1: migration-threshold=1000000 fail-count=1000000
> >     + (8) start: rc=-2 (unknown exec error)
> >     + (12) stop: rc=0 (ok)
> > 
> > Failed actions:
> >     drbd-storage:0_start_0 (node=storm, call=9, rc=-2, status=Timed 
> Out):
> > unknown exec error
> >     drbd-storage:1_start_0 (node=storm-b, call=8, rc=-2, status=Timed 
> Out):
> > unknown exec error
> > 
> > 
> > ********************************************************
> > * Output of "rcdrbd status" on both storm and storm-b: *
> > ********************************************************
> > 
> > # rcdrbd status
> > drbd driver loaded OK; device status:
> > version: 8.3.7 (api:88/proto:86-91)
> > GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by 
> phil at fat-tyre,
> > 2010-01-13 17:17:27
> > m:res  cs          ro                 ds                 p mounted
> > fstype
> > 0:r0   StandAlone  Secondary/Unknown  UpToDate/DUnknown  r----
> > 
> > *********************************
> > * Part of the drbd log entries: *
> > *********************************
> > 
> > Jun 30 15:38:10 storm kernel: [ 3730.185457] drbd: initialized. 
Version:
> > 8.3.7 (api:88/proto:86-91)
> > Jun 30 15:38:10 storm kernel: [ 3730.185459] drbd: GIT-hash:
> > ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil at fat-tyre, 
> 2010-01-13
> > 17:17:27
> > Jun 30 15:38:10 storm kernel: [ 3730.185460] drbd: registered as block
> > device major 147
> > Jun 30 15:38:10 storm kernel: [ 3730.185462] drbd: minor_table @
> > 0xffff88035fc0ca80
> > Jun 30 15:38:10 storm kernel: [ 3730.188253] block drbd0: Starting 
> worker
> > thread (from cqueue [9510])
> > Jun 30 15:38:10 storm kernel: [ 3730.188312] block drbd0: disk( 
Diskless 
> ->
> > Attaching ) 
> > Jun 30 15:38:10 storm kernel: [ 3730.188866] block drbd0: Found 4
> > transactions (4 active extents) in activity log.
> > Jun 30 15:38:10 storm kernel: [ 3730.188868] block drbd0: Method to 
> ensure
> > write ordering: barrier
> > Jun 30 15:38:10 storm kernel: [ 3730.188870] block drbd0: 
> max_segment_size (
> > = BIO size ) = 32768
> > Jun 30 15:38:10 storm kernel: [ 3730.188872] block drbd0: 
drbd_bm_resize
> > called with capacity == 9765216
> > Jun 30 15:38:10 storm kernel: [ 3730.188907] block drbd0: resync 
bitmap:
> > bits=1220652 words=19073
> > Jun 30 15:38:10 storm kernel: [ 3730.188910] block drbd0: size = 4768 
MB
> > (4882608 KB)
> > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output:
> > (drbd-storage:0:start:stdout) 
> > Jun 30 15:38:10 storm kernel: [ 3730.189263] block drbd0: recounting 
of 
> set
> > bits took additional 0 jiffies
> > Jun 30 15:38:10 storm kernel: [ 3730.189265] block drbd0: 4 KB (1 
bits)
> > marked out-of-sync by on disk bit-map.
> > Jun 30 15:38:10 storm kernel: [ 3730.189269] block drbd0: disk( 
> Attaching ->
> > UpToDate ) 
> > Jun 30 15:38:10 storm kernel: [ 3730.191735] block drbd0: conn( 
> StandAlone
> > -> Unconnected ) 
> > Jun 30 15:38:10 storm kernel: [ 3730.191748] block drbd0: Starting 
> receiver
> > thread (from drbd0_worker [15487])
> > Jun 30 15:38:10 storm kernel: [ 3730.191780] block drbd0: receiver
> > (re)started
> > Jun 30 15:38:10 storm kernel: [ 3730.191785] block drbd0: conn( 
> Unconnected
> > -> WFConnection ) 
> > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output:
> > (drbd-storage:0:start:stderr) 0: Failure: (124) Device is attached to 
a 
> disk
> > (use detach first)
> > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output:
> > (drbd-storage:0:start:stderr) Command 'drbdsetup 0 disk /dev/sdc1 
> /dev/sdc1
> > internal 
> > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output:
> > (drbd-storage:0:start:stderr) --set-defaults --create-device' 
terminated
> > with exit code 10
> > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Called drbdadm -c
> > /etc/drbd.conf --peer storm-b up r0
> > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Exit code 1
> > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Command output: 
> > 
> > I made sure rcdrbd was stopped before starting rcopenais, so the 
failure
> > related to the device being attached arrises during openais startup.
> > 
> > *************************
> > * Result of ocf-tester: *
> > *************************
> > 
> > storm:~ # ocf-tester -n drbd-storage -o drbd_resource="r0"
> > /usr/lib/ocf/resource.d/linbit/drbd
> > Beginning tests for /usr/lib/ocf/resource.d/linbit/drbd...
> > * rc=6: Validation failed.  Did you supply enough options with -o ?
> > Aborting tests
> > 
> > The only required parameter according to "crm ra info ocf:linbit:drbd" 

> is
> > drbd_resource, so there shouldn't be any additional options required 
to 
> make
> > ocf-tester work.
> > 
> > 
> > Any suggestions for debugging and solutions would be most appreciated.
> > 
> > Thanks,
> > Bart
> > 
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?
> > product=Pacemaker
> 
> 
> InterComponentWare AG: 
> Vorstand: Peter Kirschbauer (Vors.), Jvrg Stadler / Aufsichtsratsvors.:
> Prof. Dr. Christof Hettich 
> Firmensitz: 69190 Walldorf, Altrottstra_e 31 / AG Mannheim HRB 351761 /
> USt.-IdNr.: DE 198388516  =
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?
> product=Pacemaker

InterComponentWare AG:  
Vorstand: Peter Kirschbauer (Vors.), Jörg Stadler / Aufsichtsratsvors.: Prof. Dr. Christof Hettich  
Firmensitz: 69190 Walldorf, Altrottstraße 31 / AG Mannheim HRB 351761 / USt.-IdNr.: DE 198388516