[Pacemaker] Problems getting DRBD started
Andreas Kurz
andreas at hastexo.com
Wed Mar 7 22:50:55 UTC 2012
Hello,
On 03/07/2012 01:46 PM, Hans Bert wrote:
> Hello,
>
> I am new to Pacemaker and Corosync and since three days I am trying to get my DRBD clustered nodes running, but only partial success.
>
> First I tried to get my DRBD running and it worked pretty good, as one node was primary master and the slave had a replicated UpToDate partition. (Turned automatic start off again via chkconfig)
>
> What I also did is to define the bonding-IP and it is also working good.
> The trouble came when I wanted to set up Pacemaker with Corosync to do the switch over and the complete handling of the DRBD and the cluster IP.
>
> With the below defined configuration set the bonding-cluster-IP is set correctly and the DRBD is started by pacemaker, but as you can see not correctly and the drbd_fs is not mounted as I would expect it to be.
>
> The cluster-IP is set correctly even if I just define the three primitives and the values stonith-enabled="false", no-quorum-policy="ignore" and resource-stickiness="1".
>
> Has anyone an idea what I have configured wrong?
>
> Do I realy need everything I have configured or can I for example remove the 'location'-part?
If you don't care where the Master is running, you can omit the location
rule.
>
> A side comment: Under normal circumstances I have read the the drbd is the top primitive and all other derived from this (are started if drbd came up correctly). Does it make sense to take the cluster-ip-primitive as this "top primitive" ?
>
> The OS is a normal Fedora 16 with installed pre build pacemaker-1.1.6, drbd-heartbeat-8.3.11, heartbeat-3.0 and corosync-1.4.2 RPMs.
heartbeat?
>
>
> ============
> Last updated: Wed Mar 7 13:11:37 2012
> Last change: Wed Mar 7 11:43:19 2012 via cibadmin on testhost-3-1
> Stack: openais
> Current DC: testhost-3-1 - partition with quorum
> Version: 1.1.6-4.fc16-89678d4947c5bd466e2f31acd58ea4e1edb854d5
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
>
> Online: [ testhost-3-1 testhost-3-2 ]
>
> bonding-cluster-ip (ocf::heartbeat:IPaddr2): Started testhost-3-1
>
> Failed actions:
> drbd_r0_monitor_0 (node=testhost-3-2, call=2, rc=6, status=complete): not configured
> drbd_r0:0_start_0 (node=testhost-3-2, call=28, rc=1, status=complete): unknown error
> drbd_r0_fs_start_0 (node=testhost-3-2, call=7, rc=5, status=complete): not installed
You should find some nice log entries in your /var/log/messages file
from the resource agents that give you valuable hints ... all errors
occur on node testhost-3-2. Don't forget to do a cleanup of the
resources after fixing the cause.
For Filesystem RA missing/wrong devices or mount point directories are
common errors.
For drbd begin with validating the config with "drbdadm dump all" ...
>
>
>
> ---------------------------------------------------------------------------------------------------
>
>
> attributes standby="off"
> node testhost-3-2 \
> attributes standby="off"
> primitive bonding-cluster-ip ocf:heartbeat:IPaddr2 \
> params ip="10.10.6.14" broadcast="10.10.6.255" nic="bond0:1" cidr_netmask="24" \
only define nic="bond0" ... if you want to identify the VIP easily,
define a label e.g. label="VIP"
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
PS: one note to your drbd configuration below, inline
> op monitor interval="21s" timeout="5s"
> primitive drbd_r0 ocf:linbit:drbd \
> params drbd_resource="r0" \
> op monitor interval="59s" role="Master" timeout="30" \
> op monitor interval="60s" role="Slave" timeout="30"
> primitive drbd_r0_fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/share/" fstype="ext3" \
> meta target-role="stopped"
> ms ms-drbd_r0 drbd_r0 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
> location ms-master-on-testhost-3-1 ms-drbd_r0 \
> rule $id="ms-master-on-testhost-3-1-rule" $role="master" 100: #uname eq testhost-3-1
> colocation drbd_ro-fs-on-drbd_r0 inf: drbd_r0_fs ms-drbd_r0:Master
> order ms-drbd_r0-before-drbd_r0-fs inf: ms-drbd_r0:promote drbd_r0_fs:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.6-4.fc16-89678d4947c5baaaae2f31acd58ea4e1edb854d5" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="1"
>
>
> ---------------------------------------------------------------------------------------------------
>
>
>
> resource r0 {
>
> startup {
> wfc-timeout 90;
> degr-wfc-timeout 120; # 2 minutes.
> }
>
> disk {
> on-io-error detach;
> }
>
> net {
> timeout 60; # 6 seconds (unit = 0.1 seconds)
> connect-int 10; # 10 seconds (unit = 1 second)
> ping-int 10; # 10 seconds (unit = 1 second)
>
> after-sb-0pri discard-younger-primary;
> after-sb-1pri consensus;
> after-sb-2pri violently-as0p;
>
> }
>
> syncer {
> csums-alg md5;
> rate 10M;
> after "r0";
after itself? that makes no sense, remove it.
> al-extents 257;
> }
>
> on testhost-3-1 {
> device /dev/drbd0;
> disk /dev/sda3;
> address 10.10.6.12:7788;
> meta-disk internal;
> }
>
> on testhost-3-2 {
> device /dev/drbd0;
> disk /dev/sda3;
> address 10.10.6.13:7788;
> meta-disk internal;
> }
> }
>
>
>
> ---------------------------------------------------------------------------------------------------
>
>
>
> # Please read the corosync.conf.5 manual page
> compatibility: whitetank
>
> totem {
> version: 2
> secauth: off
> threads: 0
> interface {
> ringnumber: 0
> bindnetaddr: 10.10.6.0
> mcastaddr: 226.94.1.1
> mcastport: 5405
> ttl: 1
> }
> }
>
> logging {
> fileline: off
> to_stderr: yes
> to_logfile: yes
> to_syslog: yes
> logfile: /var/log/cluster/corosync.log
> debug: on
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> }
> }
>
> service {
> ver: 1
> name: pacemaker
> }
>
> amf {
> mode: disabled
> }
>
>
> ---------------------------------------------------------------------------------------------------
>
>
> drbd driver loaded OK; device status:
> version: 8.3.11 (api:88/proto:86-96)
> srcversion: 21CA73FE6D7D9C67B0C6AB2
> m:res cs ro ds p mounted fstype
> 0:r0 Unconfigured
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120307/e8f08ab6/attachment-0004.sig>
More information about the Pacemaker
mailing list