[Pacemaker] Failover on the 3rd service failure.

Sat Jul 26 18:42:23 CEST 2014

Hi all,

I'm trying to figure out why this doesn't work - main objective is, when
mysql service fails on the 3rd time in the active node, all the resources
should failover to the other node. Here's my configuration:
[root at modb2 ~]# crm configure show
node modb1.domain.tld
node modb2.domain.tld
primitive p_drbd_mysql ocf:linbit:drbd \
    params drbd_resource=data \
    op start timeout=90s interval=0 \
    op stop timeout=180s interval=0 \
    op promote timeout=180s interval=0 \
    op demote timeout=180s interval=0 \
    op monitor interval=30s role=Slave \
    op monitor interval=29s role=Master
primitive p_fs_mysql Filesystem \
    params device="/dev/drbd0" directory="/mysql" fstype=ext4
options=noatime \
    op start timeout=60s interval=0 \
    op stop timeout=180s interval=0 \
    op monitor interval=60s timeout=60s
primitive p_ip_mysql IPaddr2 \
    params ip=172.16.45.113 cidr_netmask=24 \
    op monitor interval=30s \
    meta target-role=Started
primitive p_mysql lsb:mysql \
    meta migration-threshold=2 \
    op monitor interval=20s timeout=10s \
    op start timeout=120s interval=0 \
    op stop timeout=120s interval=0
group g_mysql p_fs_mysql p_ip_mysql p_mysql \
    meta migration-threshold=5
ms ms_drbd_mysql p_drbd_mysql \
    meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start
property cib-bootstrap-options: \
    dc-version=1.1.10-14.el6-368c726 \
    cluster-infrastructure="classic openais (with plugin)" \
    expected-quorum-votes=2 \
    no-quorum-policy=ignore \
    pe-warn-series-max=1000 \
    pe-input-series-max=1000 \
    pe-error-series-max=1000 \
    cluster-recheck-interval=5min \
    stonith-enabled=false \
    default-action-timeout=180s \
    start-failure-is-fatal=false
rsc_defaults rsc-options: \
    resource-stickiness=100
[root at modb2 ~]#

Also, with the above configuration, if the active is modb1 and I shut it
down, here's what I see in modb2:
[root at modb2 ~]# crm_mon -1
Last updated: Sun Jul 27 00:00:57 2014
Last change: Sat Jul 26 23:45:15 2014 via cibadmin on modb1.domain.tld
Stack: classic openais (with plugin)
Current DC: modb2.domain.tld - partition WITHOUT quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
5 Resources configured

Online: [ modb2.domain.tld ]
OFFLINE: [ modb1.domain.tld ]

 Resource Group: g_mysql
     p_fs_mysql    (ocf::heartbeat:Filesystem):    Started modb2.domain.tld
     p_ip_mysql    (ocf::heartbeat:IPaddr2):    Started modb2.domain.tld
     *p_mysql    (lsb:mysql):    Stopped *
 Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
     Masters: [ modb2.domain.tld ]
     Stopped: [ modb1.domain.tld ]
[root at modb2 ~]#

And if modb1 is back online, it takes back all the resources and starts
them:
[root at modb2 ~]# crm_mon -1
Last updated: Sun Jul 27 00:04:38 2014
Last change: Sat Jul 26 23:45:15 2014 via cibadmin on modb1.domain.tld
Stack: classic openais (with plugin)
Current DC: modb2.domain.tld - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
5 Resources configured

Online: [ modb1.domain.tld modb2.domain.tld ]

 Resource Group: g_mysql
     p_fs_mysql    (ocf::heartbeat:Filesystem):    Started modb1.domain.tld
     p_ip_mysql    (ocf::heartbeat:IPaddr2):    Started modb1.domain.tld
     p_mysql    (lsb:mysql):    Stopped
 Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
     Masters: [ modb1.domain.tld ]
     Slaves: [ modb2.domain.tld ]
[root at modb2 ~]#
[root at modb2 ~]#
[root at modb2 ~]# crm_mon -1
Last updated: Sun Jul 27 00:04:57 2014
Last change: Sat Jul 26 23:45:15 2014 via cibadmin on modb1.domain.tld
Stack: classic openais (with plugin)
Current DC: modb2.domain.tld - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
5 Resources configured

Online: [ modb1.domain.tld modb2.domain.tld ]

 Resource Group: g_mysql
     p_fs_mysql    (ocf::heartbeat:Filesystem):    Started modb1.domain.tld
     p_ip_mysql    (ocf::heartbeat:IPaddr2):    Started modb1.domain.tld
     p_mysql    (lsb:mysql):    Started modb1.domain.tld
 Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
     Masters: [ modb1.domain.tld ]
     Slaves: [ modb2.domain.tld ]
[root at modb2 ~]#

Then when I check the configuration, the "target-role=Started" is added
automatically:
[root at modb2 ~]# crm configure show
node modb1.domain.tld
node modb2.domain.tld
primitive p_drbd_mysql ocf:linbit:drbd \
    params drbd_resource=data \
    op start timeout=90s interval=0 \
    op stop timeout=180s interval=0 \
    op promote timeout=180s interval=0 \
    op demote timeout=180s interval=0 \
    op monitor interval=30s role=Slave \
    op monitor interval=29s role=Master
primitive p_fs_mysql Filesystem \
    params device="/dev/drbd0" directory="/mysql" fstype=ext4
options=noatime \
    op start timeout=60s interval=0 \
    op stop timeout=180s interval=0 \
    op monitor interval=60s timeout=60s
primitive p_ip_mysql IPaddr2 \
    params ip=172.16.45.113 cidr_netmask=24 \
    op monitor interval=30s \
    meta target-role=Started
primitive p_mysql lsb:mysql \
    meta migration-threshold=3 *target-role=Started *\
    op monitor interval=20s timeout=10s \
    op start timeout=120s interval=0 \
    op stop timeout=120s interval=0
group g_mysql p_fs_mysql p_ip_mysql p_mysql \
    meta migration-threshold=5
ms ms_drbd_mysql p_drbd_mysql \
    meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start
property cib-bootstrap-options: \
    dc-version=1.1.10-14.el6-368c726 \
    cluster-infrastructure="classic openais (with plugin)" \
    expected-quorum-votes=2 \
    no-quorum-policy=ignore \
    pe-warn-series-max=1000 \
    pe-input-series-max=1000 \
    pe-error-series-max=1000 \
    cluster-recheck-interval=5min \
    stonith-enabled=false \
    default-action-timeout=180s \
    start-failure-is-fatal=false
rsc_defaults rsc-options: \
    resource-stickiness=100
[root at modb2 ~]#

Please advice on what commands/configuration I need to execute so to
achieve my goal - failover the cluster to the other node on the 3rd mysql
service failure.

Thank you.
Jef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140727/6a5ed137/attachment.html>