[Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd
Bart Willems
bart at atipa.com
Thu Jul 1 14:42:26 UTC 2010
Hi Martin,
No luck I 'm afraid. I first added a start-delay to the monitor operations,
and when that didn't work I also added a start-delay to the start operation:
primitive drbd-storage ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="10" role="Master" timeout="60" start-delay="1m"
\
op start interval="0" timeout="240s" start-delay="1m" \
op stop interval="0" timeout="100s" \
op monitor interval="20" role="Slave" timeout="60" start-delay="1m"
Thanks,
Bart
-----Original Message-----
From: martin.braun at icw.de [mailto:martin.braun at icw.de]
Sent: Thursday, July 01, 2010 3:37
To: bart at atipa.com; The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd
Hi Bart,
my guess is that you did forget the start-delay attribute for the monitor
operations, that's why you see the time-out error message.
Here is an example:
op monitor interval="20" role="Slave" timeout="20"
start-delay="1m" \
op monitor interval="10" role="Master" timeout="20"
start-delay="1m" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="100s" \
params drbd_resource="r0" drbdconf="/usr/local/etc/drbd.conf"
HTH,
Martin
"Bart Willems" <bart at atipa.com> wrote on 30.06.2010 21:57:35:
> [image removed]
>
> [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd
>
> Bart Willems
>
> to:
>
> pacemaker
>
> 30.06.2010 21:56
>
> [image removed]
>
> From:
>
> "Bart Willems" <bart at atipa.com>
>
> To:
>
> <pacemaker at oss.clusterlabs.org>
>
> Please respond to bart at atipa.com, The Pacemaker cluster resource
> manager <pacemaker at oss.clusterlabs.org>
>
> Hi All,
>
> I am setting SLES11 SP1 HA on 2 nodes and have configures a master/slave
> drbd resource. I can start drbd, promote/demote hosts. mount/use the
file
> system from the command line, but pacemaker fails to properly start up
the
> drdb service. The 2 nodes are named storm (master) and storm-b (slave).
>
> Details of my setup are:
>
> **********
> * storm: *
> **********
>
> eth0: 172.16.0.1/16 (static)
> eth1: 172.20.168.239 (dhcp)
> ipmi: 172.16.1.1/16 (static)
>
> ************
> * storm-b: *
> ************
>
> eth0: 172.16.0.2/16 (static)
> eth1: 172.20.168.114 (dhcp)
> ipmi: 172.16.1.2/16 (static)
>
> ***********************
> * drbd configuration: *
> ***********************
>
> storm:~ # cat /etc/drbd.conf
> #
> # please have a a look at the example configuration file in
> # /usr/share/doc/packages/drbd-utils/drbd.conf
> #
> # Note that you can use the YaST2 drbd module to configure this
> # service!
> #
> include "drbd.d/global_common.conf";
> include "drbd.d/*.res";
>
> storm:~ # cat /etc/drbd.d/r0.res
> resource r0 {
> device /dev/drbd_r0 minor 0;
> meta-disk internal;
> on storm {
> disk /dev/sdc1;
> address 172.16.0.1:7811;
> }
> on storm-b {
> disk /dev/sde1;
> address 172.16.0.2:7811;
> }
> syncer {
> rate 120M;
> }
> }
>
> ***********************************
> * Output of "crm configure show": *
> ***********************************
>
> storm:~ # crm configure show
> node storm
> node storm-b
> primitive backupExec-ip ocf:heartbeat:IPaddr \
> params ip="172.16.0.10" cidr_netmask="16" nic="eth0" \
> op monitor interval="30s"
> primitive drbd-storage ocf:linbit:drbd \
> params drbd_resource="r0" \
> op monitor interval="60" role="Master" timeout="60" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100" \
> op monitor interval="61" role="Slave" timeout="60"
> primitive drbd-storage-fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/disk1" fstype="ext3"
> primitive public-ip ocf:heartbeat:IPaddr \
> meta target-role="started" \
> operations $id="public-ip-operations" \
> op monitor interval="30s" \
> params ip="143.219.41.20" cidr_netmask="24" nic="eth1"
> primitive storm-fencing stonith:external/ipmi \
> meta target-role="started" \
> operations $id="storm-fencing-operations" \
> op monitor interval="60" timeout="20" \
> op start interval="0" timeout="20" \
> params hostname="storm" ipaddr="172.16.1.1" userid="****"
> passwd="****" interface="lan"
> ms drbd-storage-masterslave drbd-storage \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" globally-unique="false"
> target-role="started"
> location drbd-storage-master-location drbd-storage-masterslave +inf:
storm
> location storm-fencing-location storm-fencing +inf: storm-b
> colocation drbd-storage-fs-together inf: drbd-storage-fs
> drbd-storage-masterslave:Master
> order drbd-storage-fs-startup-order inf:
drbd-storage-masterslave:promote
> drbd-storage-fs:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1277922623" \
> node-health-strategy="only-green" \
> stonith-enabled="true" \
> stonith-action="poweroff"
> op_defaults $id="op_defaults-options" \
> record-pending="false"
>
> ************************************
> * Output of "crm_mon -o" on storm: *
> ************************************
>
> storm:~ # crm_mon -o
> Attempting connection to the cluster...
> ============
> Last updated: Wed Jun 30 15:25:15 2010
> Stack: openais
> Current DC: storm - partition with quorum
> Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5
> 2 Nodes configured, 2 expected votes
> 5 Resources configured.
> ============
>
> Online: [ storm storm-b ]
>
> storm-fencing (stonith:external/ipmi): Started storm-b
> backupExec-ip (ocf::heartbeat:IPaddr): Started storm
> public-ip (ocf::heartbeat:IPaddr): Started storm
>
> Operations:
> * Node storm:
> public-ip: migration-threshold=1000000
> + (8) start: rc=0 (ok)
> + (11) monitor: interval=30000ms rc=0 (ok)
> backupExec-ip: migration-threshold=1000000
> + (7) start: rc=0 (ok)
> + (10) monitor: interval=30000ms rc=0 (ok)
> drbd-storage:0: migration-threshold=1000000 fail-count=1000000
> + (9) start: rc=-2 (unknown exec error)
> + (14) stop: rc=0 (ok)
> * Node storm-b:
> storm-fencing: migration-threshold=1000000 + (7) start: rc=0 (ok)
+
> (9) monitor: interval=6)
>
> **************************************
> * Output of "crm_mon -o" on storm-b: *
> **************************************
>
> storm-b:~ # crm_mon -o
> Attempting connection to the cluster...
> ============
> Last updated: Wed Jun 30 15:25:25 2010
> Stack: openais
> Current DC: storm - partition with quorum
> Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5
> 2 Nodes configured, 2 expected votes
> 5 Resources configured.
> ============
>
> Online: [ storm storm-b ]
>
> storm-fencing (stonith:external/ipmi): Started storm-b
> backupExec-ip (ocf::heartbeat:IPaddr): Started storm
> public-ip (ocf::heartbeat:IPaddr): Started storm
>
> Operations:
> * Node storm:
> public-ip: migration-threshold=1000000
> + (8) start: rc=0 (ok)
> + (11) monitor: interval=30000ms rc=0 (ok)
> backupExec-ip: migration-threshold=1000000
> + (7) start: rc=0 (ok)
> + (10) monitor: interval=30000ms rc=0 (ok)
> drbd-storage:0: migration-threshold=1000000 fail-count=1000000
> + (9) start: rc=-2 (unknown exec error)
> + (14) stop: rc=0 (ok)
> * Node storm-b:
> storm-fencing: migration-threshold=1000000
> + (7) start: rc=0 (ok)
> + (9) monitor: interval=60000ms rc=0 (ok)
> drbd-storage:1: migration-threshold=1000000 fail-count=1000000
> + (8) start: rc=-2 (unknown exec error)
> + (12) stop: rc=0 (ok)
>
> Failed actions:
> drbd-storage:0_start_0 (node=storm, call=9, rc=-2, status=Timed
Out):
> unknown exec error
> drbd-storage:1_start_0 (node=storm-b, call=8, rc=-2, status=Timed
Out):
> unknown exec error
>
>
> ********************************************************
> * Output of "rcdrbd status" on both storm and storm-b: *
> ********************************************************
>
> # rcdrbd status
> drbd driver loaded OK; device status:
> version: 8.3.7 (api:88/proto:86-91)
> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
phil at fat-tyre,
> 2010-01-13 17:17:27
> m:res cs ro ds p mounted
> fstype
> 0:r0 StandAlone Secondary/Unknown UpToDate/DUnknown r----
>
> *********************************
> * Part of the drbd log entries: *
> *********************************
>
> Jun 30 15:38:10 storm kernel: [ 3730.185457] drbd: initialized. Version:
> 8.3.7 (api:88/proto:86-91)
> Jun 30 15:38:10 storm kernel: [ 3730.185459] drbd: GIT-hash:
> ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil at fat-tyre,
2010-01-13
> 17:17:27
> Jun 30 15:38:10 storm kernel: [ 3730.185460] drbd: registered as block
> device major 147
> Jun 30 15:38:10 storm kernel: [ 3730.185462] drbd: minor_table @
> 0xffff88035fc0ca80
> Jun 30 15:38:10 storm kernel: [ 3730.188253] block drbd0: Starting
worker
> thread (from cqueue [9510])
> Jun 30 15:38:10 storm kernel: [ 3730.188312] block drbd0: disk( Diskless
->
> Attaching )
> Jun 30 15:38:10 storm kernel: [ 3730.188866] block drbd0: Found 4
> transactions (4 active extents) in activity log.
> Jun 30 15:38:10 storm kernel: [ 3730.188868] block drbd0: Method to
ensure
> write ordering: barrier
> Jun 30 15:38:10 storm kernel: [ 3730.188870] block drbd0:
max_segment_size (
> = BIO size ) = 32768
> Jun 30 15:38:10 storm kernel: [ 3730.188872] block drbd0: drbd_bm_resize
> called with capacity == 9765216
> Jun 30 15:38:10 storm kernel: [ 3730.188907] block drbd0: resync bitmap:
> bits=1220652 words=19073
> Jun 30 15:38:10 storm kernel: [ 3730.188910] block drbd0: size = 4768 MB
> (4882608 KB)
> Jun 30 15:38:10 storm lrmd: [15233]: info: RA output:
> (drbd-storage:0:start:stdout)
> Jun 30 15:38:10 storm kernel: [ 3730.189263] block drbd0: recounting of
set
> bits took additional 0 jiffies
> Jun 30 15:38:10 storm kernel: [ 3730.189265] block drbd0: 4 KB (1 bits)
> marked out-of-sync by on disk bit-map.
> Jun 30 15:38:10 storm kernel: [ 3730.189269] block drbd0: disk(
Attaching ->
> UpToDate )
> Jun 30 15:38:10 storm kernel: [ 3730.191735] block drbd0: conn(
StandAlone
> -> Unconnected )
> Jun 30 15:38:10 storm kernel: [ 3730.191748] block drbd0: Starting
receiver
> thread (from drbd0_worker [15487])
> Jun 30 15:38:10 storm kernel: [ 3730.191780] block drbd0: receiver
> (re)started
> Jun 30 15:38:10 storm kernel: [ 3730.191785] block drbd0: conn(
Unconnected
> -> WFConnection )
> Jun 30 15:38:10 storm lrmd: [15233]: info: RA output:
> (drbd-storage:0:start:stderr) 0: Failure: (124) Device is attached to a
disk
> (use detach first)
> Jun 30 15:38:10 storm lrmd: [15233]: info: RA output:
> (drbd-storage:0:start:stderr) Command 'drbdsetup 0 disk /dev/sdc1
/dev/sdc1
> internal
> Jun 30 15:38:10 storm lrmd: [15233]: info: RA output:
> (drbd-storage:0:start:stderr) --set-defaults --create-device' terminated
> with exit code 10
> Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Called drbdadm -c
> /etc/drbd.conf --peer storm-b up r0
> Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Exit code 1
> Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Command output:
>
> I made sure rcdrbd was stopped before starting rcopenais, so the failure
> related to the device being attached arrises during openais startup.
>
> *************************
> * Result of ocf-tester: *
> *************************
>
> storm:~ # ocf-tester -n drbd-storage -o drbd_resource="r0"
> /usr/lib/ocf/resource.d/linbit/drbd
> Beginning tests for /usr/lib/ocf/resource.d/linbit/drbd...
> * rc=6: Validation failed. Did you supply enough options with -o ?
> Aborting tests
>
> The only required parameter according to "crm ra info ocf:linbit:drbd"
is
> drbd_resource, so there shouldn't be any additional options required to
make
> ocf-tester work.
>
>
> Any suggestions for debugging and solutions would be most appreciated.
>
> Thanks,
> Bart
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?
> product=Pacemaker
InterComponentWare AG:
Vorstand: Peter Kirschbauer (Vors.), Jvrg Stadler / Aufsichtsratsvors.:
Prof. Dr. Christof Hettich
Firmensitz: 69190 Walldorf, Altrottstra_e 31 / AG Mannheim HRB 351761 /
USt.-IdNr.: DE 198388516 =
More information about the Pacemaker
mailing list