[Pacemaker] Failover on the 3rd service failure.

Sat Jul 26 17:07:04 UTC 2014

After clearing "meta target-role=Started" from p_ip_mysql and restarted
both nodes, it was able to give me what I want. Final config:
[root at modb2 ~]# crm configure show
node modb1.domain.tld
node modb2.domain.tld
primitive p_drbd_mysql ocf:linbit:drbd \
    params drbd_resource=data \
    op start timeout=90s interval=0 \
    op stop timeout=180s interval=0 \
    op promote timeout=180s interval=0 \
    op demote timeout=180s interval=0 \
    op monitor interval=30s role=Slave \
    op monitor interval=29s role=Master
primitive p_fs_mysql Filesystem \
    params device="/dev/drbd0" directory="/mysql" fstype=ext4
options=noatime \
    op start timeout=60s interval=0 \
    op stop timeout=180s interval=0 \
    op monitor interval=60s timeout=60s
primitive p_ip_mysql IPaddr2 \
    params ip=172.16.45.113 cidr_netmask=24 \
    op monitor interval=30s
primitive p_mysql lsb:mysql \
    meta migration-threshold=3 \
    op monitor interval=20s timeout=10s \
    op start timeout=120s interval=0 \
    op stop timeout=120s interval=0
group g_mysql p_fs_mysql p_ip_mysql p_mysql \
    meta migration-threshold=5
ms ms_drbd_mysql p_drbd_mysql \
    meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start
property cib-bootstrap-options: \
    dc-version=1.1.10-14.el6-368c726 \
    cluster-infrastructure="classic openais (with plugin)" \
    expected-quorum-votes=2 \
    no-quorum-policy=ignore \
    pe-warn-series-max=1000 \
    pe-input-series-max=1000 \
    pe-error-series-max=1000 \
    cluster-recheck-interval=5min \
    stonith-enabled=false \
    default-action-timeout=180s \
    start-failure-is-fatal=false \
    last-lrm-refresh=1406392123
rsc_defaults rsc-options: \
    resource-stickiness=100
[root at modb2 ~]#

May I know why those "target-role=Started" are added automatically?

Thanks,
Jef

On Sun, Jul 27, 2014 at 12:42 AM, Cayab, Jefrey E. <jcayab at gmail.com> wrote:

> Hi all,
>
> I'm trying to figure out why this doesn't work - main objective is, when
> mysql service fails on the 3rd time in the active node, all the resources
> should failover to the other node. Here's my configuration:
> [root at modb2 ~]# crm configure show
> node modb1.domain.tld
> node modb2.domain.tld
> primitive p_drbd_mysql ocf:linbit:drbd \
>     params drbd_resource=data \
>     op start timeout=90s interval=0 \
>     op stop timeout=180s interval=0 \
>     op promote timeout=180s interval=0 \
>     op demote timeout=180s interval=0 \
>     op monitor interval=30s role=Slave \
>     op monitor interval=29s role=Master
> primitive p_fs_mysql Filesystem \
>     params device="/dev/drbd0" directory="/mysql" fstype=ext4
> options=noatime \
>     op start timeout=60s interval=0 \
>     op stop timeout=180s interval=0 \
>     op monitor interval=60s timeout=60s
> primitive p_ip_mysql IPaddr2 \
>     params ip=172.16.45.113 cidr_netmask=24 \
>     op monitor interval=30s \
>     meta target-role=Started
> primitive p_mysql lsb:mysql \
>     meta migration-threshold=2 \
>     op monitor interval=20s timeout=10s \
>     op start timeout=120s interval=0 \
>     op stop timeout=120s interval=0
> group g_mysql p_fs_mysql p_ip_mysql p_mysql \
>     meta migration-threshold=5
> ms ms_drbd_mysql p_drbd_mysql \
>     meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
> order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start
> property cib-bootstrap-options: \
>     dc-version=1.1.10-14.el6-368c726 \
>     cluster-infrastructure="classic openais (with plugin)" \
>     expected-quorum-votes=2 \
>     no-quorum-policy=ignore \
>     pe-warn-series-max=1000 \
>     pe-input-series-max=1000 \
>     pe-error-series-max=1000 \
>     cluster-recheck-interval=5min \
>     stonith-enabled=false \
>     default-action-timeout=180s \
>     start-failure-is-fatal=false
> rsc_defaults rsc-options: \
>     resource-stickiness=100
> [root at modb2 ~]#
>
> Also, with the above configuration, if the active is modb1 and I shut it
> down, here's what I see in modb2:
> [root at modb2 ~]# crm_mon -1
> Last updated: Sun Jul 27 00:00:57 2014
> Last change: Sat Jul 26 23:45:15 2014 via cibadmin on modb1.domain.tld
> Stack: classic openais (with plugin)
> Current DC: modb2.domain.tld - partition WITHOUT quorum
> Version: 1.1.10-14.el6-368c726
> 2 Nodes configured, 2 expected votes
> 5 Resources configured
>
>
> Online: [ modb2.domain.tld ]
> OFFLINE: [ modb1.domain.tld ]
>
>  Resource Group: g_mysql
>      p_fs_mysql    (ocf::heartbeat:Filesystem):    Started modb2.domain.tld
>      p_ip_mysql    (ocf::heartbeat:IPaddr2):    Started modb2.domain.tld
>      *p_mysql    (lsb:mysql):    Stopped *
>  Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
>      Masters: [ modb2.domain.tld ]
>      Stopped: [ modb1.domain.tld ]
> [root at modb2 ~]#
>
> And if modb1 is back online, it takes back all the resources and starts
> them:
> [root at modb2 ~]# crm_mon -1
> Last updated: Sun Jul 27 00:04:38 2014
> Last change: Sat Jul 26 23:45:15 2014 via cibadmin on modb1.domain.tld
> Stack: classic openais (with plugin)
> Current DC: modb2.domain.tld - partition with quorum
> Version: 1.1.10-14.el6-368c726
> 2 Nodes configured, 2 expected votes
> 5 Resources configured
>
>
> Online: [ modb1.domain.tld modb2.domain.tld ]
>
>  Resource Group: g_mysql
>      p_fs_mysql    (ocf::heartbeat:Filesystem):    Started
> modb1.domain.tld
>      p_ip_mysql    (ocf::heartbeat:IPaddr2):    Started modb1.domain.tld
>      p_mysql    (lsb:mysql):    Stopped
>  Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
>      Masters: [ modb1.domain.tld ]
>      Slaves: [ modb2.domain.tld ]
> [root at modb2 ~]#
> [root at modb2 ~]#
> [root at modb2 ~]# crm_mon -1
> Last updated: Sun Jul 27 00:04:57 2014
> Last change: Sat Jul 26 23:45:15 2014 via cibadmin on modb1.domain.tld
> Stack: classic openais (with plugin)
> Current DC: modb2.domain.tld - partition with quorum
> Version: 1.1.10-14.el6-368c726
> 2 Nodes configured, 2 expected votes
> 5 Resources configured
>
>
> Online: [ modb1.domain.tld modb2.domain.tld ]
>
>  Resource Group: g_mysql
>      p_fs_mysql    (ocf::heartbeat:Filesystem):    Started
> modb1.domain.tld
>      p_ip_mysql    (ocf::heartbeat:IPaddr2):    Started modb1.domain.tld
>      p_mysql    (lsb:mysql):    Started modb1.domain.tld
>  Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
>      Masters: [ modb1.domain.tld ]
>      Slaves: [ modb2.domain.tld ]
> [root at modb2 ~]#
>
> Then when I check the configuration, the "target-role=Started" is added
> automatically:
> [root at modb2 ~]# crm configure show
> node modb1.domain.tld
> node modb2.domain.tld
> primitive p_drbd_mysql ocf:linbit:drbd \
>     params drbd_resource=data \
>     op start timeout=90s interval=0 \
>     op stop timeout=180s interval=0 \
>     op promote timeout=180s interval=0 \
>     op demote timeout=180s interval=0 \
>     op monitor interval=30s role=Slave \
>     op monitor interval=29s role=Master
> primitive p_fs_mysql Filesystem \
>     params device="/dev/drbd0" directory="/mysql" fstype=ext4
> options=noatime \
>     op start timeout=60s interval=0 \
>     op stop timeout=180s interval=0 \
>     op monitor interval=60s timeout=60s
> primitive p_ip_mysql IPaddr2 \
>     params ip=172.16.45.113 cidr_netmask=24 \
>     op monitor interval=30s \
>     meta target-role=Started
> primitive p_mysql lsb:mysql \
>     meta migration-threshold=3 *target-role=Started *\
>     op monitor interval=20s timeout=10s \
>     op start timeout=120s interval=0 \
>     op stop timeout=120s interval=0
> group g_mysql p_fs_mysql p_ip_mysql p_mysql \
>     meta migration-threshold=5
> ms ms_drbd_mysql p_drbd_mysql \
>     meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
> order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start
> property cib-bootstrap-options: \
>     dc-version=1.1.10-14.el6-368c726 \
>     cluster-infrastructure="classic openais (with plugin)" \
>     expected-quorum-votes=2 \
>     no-quorum-policy=ignore \
>     pe-warn-series-max=1000 \
>     pe-input-series-max=1000 \
>     pe-error-series-max=1000 \
>     cluster-recheck-interval=5min \
>     stonith-enabled=false \
>     default-action-timeout=180s \
>     start-failure-is-fatal=false
> rsc_defaults rsc-options: \
>     resource-stickiness=100
> [root at modb2 ~]#
>
>
> Please advice on what commands/configuration I need to execute so to
> achieve my goal - failover the cluster to the other node on the 3rd mysql
> service failure.
>
> Thank you.
> Jef
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140727/38af2f2e/attachment.htm>