[Pacemaker] Problems getting DRBD started

Wed Mar 7 23:50:55 CET 2012

Hello,

On 03/07/2012 01:46 PM, Hans Bert wrote:
> Hello,
> 
> I am new to Pacemaker and Corosync and since three days I am trying to get my DRBD clustered nodes running, but only partial success.
> 
> First I tried to get my DRBD running and it worked pretty good, as one node was primary master and the slave had a replicated UpToDate partition. (Turned automatic start off again via chkconfig)
> 
> What I also did is to define the bonding-IP and it is also working good.
> The trouble came when I wanted to set up Pacemaker with Corosync to do the switch over and the complete handling of the DRBD and the cluster IP.
> 
> With the below defined configuration set the bonding-cluster-IP is set correctly and the DRBD is started by pacemaker, but as you can see not correctly and the drbd_fs is not mounted as I would expect it to be.
> 
> The cluster-IP is set correctly even if I just define the three primitives and the values stonith-enabled="false",  no-quorum-policy="ignore" and resource-stickiness="1".
> 
> Has anyone an idea what I have configured wrong?
> 
> Do I realy need everything I have configured or can I for example remove the 'location'-part?

If you don't care where the Master is running, you can omit the location
rule.

> 
> A side comment: Under normal circumstances I have read the the drbd is the top primitive and all other derived from this (are started if drbd came up correctly). Does it make sense to take the cluster-ip-primitive as this "top primitive" ?
> 
> The OS is a normal Fedora 16 with installed pre build pacemaker-1.1.6, drbd-heartbeat-8.3.11, heartbeat-3.0 and corosync-1.4.2 RPMs.

heartbeat?

> 
> 
> ============
> Last updated: Wed Mar  7 13:11:37 2012
> Last change: Wed Mar  7 11:43:19 2012 via cibadmin on testhost-3-1
> Stack: openais
> Current DC: testhost-3-1 - partition with quorum
> Version: 1.1.6-4.fc16-89678d4947c5bd466e2f31acd58ea4e1edb854d5
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
> 
> Online: [ testhost-3-1 testhost-3-2 ]
> 
>  bonding-cluster-ip	(ocf::heartbeat:IPaddr2):	Started testhost-3-1
> 
> Failed actions:
>     drbd_r0_monitor_0 (node=testhost-3-2, call=2, rc=6, status=complete): not configured
>     drbd_r0:0_start_0 (node=testhost-3-2, call=28, rc=1, status=complete): unknown error
>     drbd_r0_fs_start_0 (node=testhost-3-2, call=7, rc=5, status=complete): not installed

You should find some nice log entries in your /var/log/messages file
from the resource agents that give you valuable hints ... all errors
occur on node testhost-3-2. Don't forget to do a cleanup of the
resources after fixing the cause.

For Filesystem RA missing/wrong devices or mount point directories are
common errors.

For drbd begin with validating the config with "drbdadm dump all" ...

> 
> 
> 
> ---------------------------------------------------------------------------------------------------
> 
> 
> 	attributes standby="off"
> node testhost-3-2 \
> 	attributes standby="off"
> primitive bonding-cluster-ip ocf:heartbeat:IPaddr2 \
> 	params ip="10.10.6.14" broadcast="10.10.6.255" nic="bond0:1" cidr_netmask="24" \

only define nic="bond0" ... if you want to identify the VIP easily,
define a label e.g. label="VIP"

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

PS: one note to your drbd configuration below, inline

> 	op monitor interval="21s" timeout="5s"
> primitive drbd_r0 ocf:linbit:drbd \
> 	params drbd_resource="r0" \
> 	op monitor interval="59s" role="Master" timeout="30" \
> 	op monitor interval="60s" role="Slave" timeout="30"
> primitive drbd_r0_fs ocf:heartbeat:Filesystem \
> 	params device="/dev/drbd0" directory="/share/" fstype="ext3" \
> 	meta target-role="stopped"
> ms ms-drbd_r0 drbd_r0 \
> 	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
> location ms-master-on-testhost-3-1 ms-drbd_r0 \
> 	rule $id="ms-master-on-testhost-3-1-rule" $role="master" 100: #uname eq testhost-3-1
> colocation drbd_ro-fs-on-drbd_r0 inf: drbd_r0_fs ms-drbd_r0:Master
> order ms-drbd_r0-before-drbd_r0-fs inf: ms-drbd_r0:promote drbd_r0_fs:start
> property $id="cib-bootstrap-options" \
> 	dc-version="1.1.6-4.fc16-89678d4947c5baaaae2f31acd58ea4e1edb854d5" \
> 	cluster-infrastructure="openais" \
> 	expected-quorum-votes="2" \
> 	stonith-enabled="false" \
> 	no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
> 	resource-stickiness="1"
> 
> 
> ---------------------------------------------------------------------------------------------------
> 
> 
> 
> resource r0 {
> 
>   startup {
>     wfc-timeout  90;
>     degr-wfc-timeout 120;    # 2 minutes.
>   }
> 
>   disk {
>     on-io-error   detach;
>   }
> 
>   net {
>     timeout       60;    #  6 seconds  (unit = 0.1 seconds)
>     connect-int   10;    # 10 seconds  (unit = 1 second)
>     ping-int      10;    # 10 seconds  (unit = 1 second)
>     
>     after-sb-0pri discard-younger-primary;
>     after-sb-1pri consensus;
>     after-sb-2pri violently-as0p;
>     
>   }
> 
>   syncer {
>     csums-alg md5;
>     rate 10M;
>     after "r0";

after itself? that makes no sense, remove it.

>     al-extents 257;
>   }
> 
>   on testhost-3-1 {
>     device     /dev/drbd0;
>     disk       /dev/sda3;
>     address    10.10.6.12:7788;
>     meta-disk  internal;
>   }
> 
>   on testhost-3-2 {
>     device     /dev/drbd0;
>     disk       /dev/sda3;
>     address    10.10.6.13:7788;
>     meta-disk internal;
>   }
> }
> 
> 
> 
> ---------------------------------------------------------------------------------------------------
> 
> 
> 
> # Please read the corosync.conf.5 manual page
> compatibility: whitetank
> 
> totem {
> 	version: 2
> 	secauth: off
> 	threads: 0
> 	interface {
> 		ringnumber: 0
> 		bindnetaddr: 10.10.6.0
> 		mcastaddr: 226.94.1.1
> 		mcastport: 5405
> 		ttl: 1
> 	}
> }
> 
> logging {
> 	fileline: off
> 	to_stderr: yes
> 	to_logfile: yes
> 	to_syslog: yes
> 	logfile: /var/log/cluster/corosync.log
> 	debug: on
> 	timestamp: on
> 	logger_subsys {
> 		subsys: AMF
> 		debug: off
> 	}
> }
> 
> service {
> 	ver: 1
> 	name: pacemaker
> }
> 
> amf {
> 	mode: disabled
> }
> 
> 
> ---------------------------------------------------------------------------------------------------
> 
> 
> drbd driver loaded OK; device status:
> version: 8.3.11 (api:88/proto:86-96)
> srcversion: 21CA73FE6D7D9C67B0C6AB2 
> m:res  cs            ro  ds  p  mounted  fstype
> 0:r0   Unconfigured
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120307/e8f08ab6/attachment.sig>