[Pacemaker] Master/Slave DRBD switch caused some problems

Wed Jan 19 08:32:59 UTC 2011

On Tue, Dec 21, 2010 at 6:35 PM, Marc Wilmots <desjter at gmail.com> wrote:
> Hi,
>
> I have two nodes rspa and rspa2 (both Centos 5.3 32bits) with the following
> packages:
>
> drbd83-8.3.8-1.el5.centos
> heartbeat-3.0.3-2.3.el5
> pacemaker-1.0.10-1.4.el5
>
> rspa is stopped, and rspa2 has all the resources (IP, FileSystem, Mysql,
> Apache and DRBD Master)
> When I start heatbeat on rspa, for some reason (I don't have any
> resource_location specified) it tries to move all resources to that node,

I'm guessing that drbd wants to be promoted there - this would result
in the group moving too due to the colocation constraint.

Perhaps the drbd guys can comment on why this is or why the partition
becomes unresponsive.

> but when trying to demote drbd on rspa2 (node2) and promote drbd on rspa
> (node1) something must go wrong as my DRBD partition (being used by MySQL)
> gets unresponsive.
>
> Next it stops Apache (works), and tries to stop MySQL which fails because it
> uses the unresponsive partition.
> As a result of this my high availability cluster ends up in the limbo; it
> doesn't migrate to node1, neither to node2.
>
> Any help is welcome here...
>
>  [root at rspa2 ~]# crm status
> ============
> Last updated: Tue Dec 21 18:12:47 2010
> Stack: Heartbeat
> Current DC: rspa2.sadiel.es (2680c85b-7e6c-4610-88b2-510feb60c4b4) -
> partition with quorum
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ rspa2.domain rspa.domain ]
>
>  Resource Group: mysql
>      fs_mysql    (ocf::heartbeat:Filesystem):    Started rspa2.domain
>      ip_mysql    (ocf::heartbeat:IPaddr2):    Started rspa2.domain
>      mysqld    (lsb:mysqld):    Started rspa2.domain (unmanaged) FAILED
>      apache    (lsb:httpd):    Stopped
>  Master/Slave Set: ms_drbd_mysql
>      Masters: [ rspa2.domain ]
>      Slaves: [ rspa.domain ]
>
> Failed actions:
>     mysqld_stop_0 (node=rspa2.domain, call=18, rc=-2, status=Timed Out):
> unknown exec error
>
> Please see my Pacemaker config:
>
> node $id="2680c85b-7e6c-4610-88b2-510feb60c4b4" rspa2.domain \
>     attributes standby="off"
> node $id="f9be4a80-ec2a-42e3-8d86-62dd050b437b" rspa.domain \
>     attributes standby="off"
> primitive apache lsb:httpd
> primitive drbd_mysql ocf:linbit:drbd \
>     params drbd_resource="r0" \
>     op monitor interval="15s" \
>     op monitor interval="16s" role="Master"
> primitive fs_mysql ocf:heartbeat:Filesystem \
>     params device="/dev/drbd0" directory="/opt/drbd/" fstype="xfs"
> primitive ip_mysql ocf:heartbeat:IPaddr2 \
>     params ip="172.18.2.150" nic="eth0:1"
> primitive mysqld lsb:mysqld
> group mysql fs_mysql ip_mysql mysqld apache
> ms ms_drbd_mysql drbd_mysql \
>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true" target-role="Started" is-managed="true"
> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> property $id="cib-bootstrap-options" \
>     no-quorum-policy="ignore" \
>     stonith-enabled="false" \
>     expected-quorum-votes="2" \
>     dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>     cluster-infrastructure="Heartbeat"
>
> This is what's printed in /var/log/messages: http://pastebin.com/W68jPQKJ
> And /var/log/ha.log : http://pastebin.com/SBQz1gU3
>
> My DRBD partition (dev/drbd0) is mounted on /opt/drbd and when I do "ls" it
> just hangs.
> In case it's useful, please see here lsof output:
>
> [root at rspa2 ~]# lsof | grep drbd
> drbd0_wor  3422      root  cwd       DIR        8,2     4096          2 /
> drbd0_wor  3422      root  rtd       DIR        8,2     4096          2 /
> drbd0_wor  3422      root  txt   unknown
> /proc/3422/exe
> drbd0_rec  3425      root  cwd       DIR        8,2     4096          2 /
> drbd0_rec  3425      root  rtd       DIR        8,2     4096          2 /
> drbd0_rec  3425      root  txt   unknown
> /proc/3425/exe
> drbd0_ase  4876      root  cwd       DIR        8,2     4096          2 /
> drbd0_ase  4876      root  rtd       DIR        8,2     4096          2 /
> drbd0_ase  4876      root  txt   unknown
> /proc/4876/exe
> mysqld    12322     mysql  cwd       DIR      147,0       96        131
> /opt/drbd/mysql
> mysqld    12322     mysql    3uW     REG      147,0 10485760        135
> /opt/drbd/mysql/ibdata1
> mysqld    12322     mysql    8uW     REG      147,0  5242880        133
> /opt/drbd/mysql/ib_logfile0
> mysqld    12322     mysql    9uW     REG      147,0  5242880        134
> /opt/drbd/mysql/ib_logfile1
> ls        12729      root    3r      DIR      147,0       51        128
> /opt/drbd
> bash      12889      root    3r      DIR      147,0       51        128
> /opt/drbd
> ls        13117      root    3r      DIR      147,0       51        128
> /opt/drbd
>
> Heartbeat configuration file:
> [root at rspa2 ~]# cat /etc/ha.d/ha.cf
> use_logd no
> logfile /var/log/ha.log
> autojoin none
> warntime 5
> deadtime 15
> initdead 30
> ucast eth0 172.18.2.137
> node rspa.domain rspa2.domain
> crm yes
>
> And last but not least, my DRBD configuration on both nodes:
>
> global {
>   usage-count yes;
> }
> common {
>   protocol C;
>   syncer {
>     rate 10M;
>   }
> }
> resource r0 {
>   net {
>         data-integrity-alg md5;
>   }
>   on rspa.domain {
>     device    /dev/drbd0;
>     disk      /dev/sda4;
>     address   IP:7789;
>     meta-disk internal;
>   }
>   on rspa2.domain {
>     device    /dev/drbd0;
>     disk      /dev/sda4;
>     address   IP:7789;
>     meta-disk internal;
>   }
> }
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>