[Pacemaker] Failover when storage fails

Wed May 11 16:28:58 UTC 2011

After further testing even with stonith enabled the cluster still gets stuck in this state, presumably waiting on IO. I can get around it by setting "on-fail=fence" on the LVM resources but shouldn't Pacemaker be smart enough to realise the host is effectively offline? Or am I missing some timeout value that would fix this situation?

pacemaker-1.1.2-7.el6.x86_64
corosync-1.2.3-21.el6.x86_64
RHEL 6.0

Config:

node host001.domain \
        attributes standby="off"
node host002.domain \
        attributes standby="off"
primitive MyApp_IP ocf:heartbeat:IPaddr \
        params ip="192.168.104.26" \
        op monitor interval="10s"
primitive MyApp_fs_graph ocf:heartbeat:Filesystem \
        params device="/dev/VolGroupB00/AppLV2" directory="/naab1" fstype="ext4" \
        op monitor interval="10" timeout="10"
primitive MyApp_fs_landing ocf:heartbeat:Filesystem \
        params device="/dev/VolGroupB01/AppLV1" directory="/naab2" fstype="ext4" \
        op monitor interval="10" timeout="10"
primitive MyApp_lvm_graph ocf:heartbeat:LVM \
        params volgrpname="VolGroupB00" exclusive="yes" \
        op monitor interval="10" timeout="10" on-fail="fence" depth="0"
primitive MyApp_lvm_landing ocf:heartbeat:LVM \
        params volgrpname="VolGroupB01" exclusive="yes" \
        op monitor interval="10" timeout="10" on-fail="fence" depth="0"
primitive MyApp_scsi_reservation ocf:heartbeat:sg_persist \
        params sg_persist_resource="scsi_reservation0" devs="/dev/dm-6 /dev/dm-7" required_devs_nof="2" reservation_type="1"
primitive MyApp_init_script lsb:AppInitScript \
        op monitor interval="10" timeout="10"
primitive fence_host001.domain stonith:fence_ipmilan \
        params ipaddr="192.168.16.148" passwd="password" login="root" pcmk_host_list="host001.domain" pcmk_host_check="static-list" \
        meta target-role="Started"
primitive fence_host002.domain stonith:fence_ipmilan \
        params ipaddr="192.168.16.149" passwd="password" login="root" pcmk_host_list="host002.domain" pcmk_host_check="static-list" \
        meta target-role="Started"
group MyApp_group MyApp_lvm_graph MyApp_lvm_landing MyApp_fs_graph MyApp_fs_landing MyApp_IP MyApp_init_script \
        meta target-role="Started" migration-threshold="2" on-fail="restart" failure-timeout="300s"
ms ms_MyApp_scsi_reservation MyApp_scsi_reservation \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation MyApp_group_on_scsi_reservation inf: MyApp_group ms_MyApp_scsi_reservation:Master
order MyApp_group_after_scsi_reservation inf: ms_MyApp_scsi_reservation:promote MyApp_group:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="true" \
        last-lrm-refresh="1305129673"
rsc_defaults $id="rsc-options" \
        resource-stickiness="1"

From: Max Williams [mailto:Max.Williams at betfair.com]
Sent: 11 May 2011 13:55
To: The Pacemaker cluster resource manager (pacemaker at oss.clusterlabs.org)
Subject: [Pacemaker] Failover when storage fails

Hi,
I want to configure pacemaker to failover a group of resources and sg_persist (master/slave) when there is a problem with the storage but when I cause the iSCSI LUN to disappear simulating a failure, the cluster always gets stuck in this state:

Last updated: Wed May 11 10:52:43 2011
Stack: openais
Current DC: host001.domain - partition with quorum
Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ host002.domain host001.domain ]

fence_host002.domain     (stonith:fence_ipmilan):        Started host001.domain
fence_host001.domain     (stonith:fence_ipmilan):        Started host001.domain
Resource Group: MyApp_group
     MyApp_lvm_graph    (ocf::heartbeat:LVM):   Started host002.domain FAILED
     MyApp_lvm_landing  (ocf::heartbeat:LVM):   Started host002.domain FAILED
     MyApp_fs_graph     (ocf::heartbeat:Filesystem):    Started host002.domain
     MyApp_fs_landing   (ocf::heartbeat:Filesystem):    Started host002.domain
     MyApp_IP   (ocf::heartbeat:IPaddr):        Stopped
     MyApp_init_script   (lsb:abworkload):              Stopped
Master/Slave Set: ms_MyApp_scsi_reservation
     Masters: [ host002.domain ]
     Slaves: [ host001.domain ]

Failed actions:
    MyApp_lvm_graph_monitor_10000 (node=host002.domain, call=129, rc=-2, status=Timed Out): unknown exec error
    MyApp_lvm_landing_monitor_10000 (node=host002.domain, call=130, rc=-2, status=Timed Out): unknown exec error

This is printed over and over in the logs:

May 11 12:34:56 host002 lrmd: [2561]: info: perform_op:2884: operation stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its parameters: fstype=[ext4] crm_feature_set=[3.0.2] device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000] directory=[/naab1]  for rsc is already running.
May 11 12:34:56 host002 lrmd: [2561]: info: perform_op:2894: postponing all ops on resource MyApp_fs_graph by 1000 ms
May 11 12:34:57 host002 lrmd: [2561]: info: perform_op:2884: operation stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its parameters: fstype=[ext4] crm_feature_set=[3.0.2] device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000] directory=[/naab1]  for rsc is already running.
May 11 12:34:57 host002 lrmd: [2561]: info: perform_op:2894: postponing all ops on resource MyApp_fs_graph by 1000 ms
May 11 12:34:58 host002 lrmd: [2561]: info: perform_op:2884: operation stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its parameters: fstype=[ext4] crm_feature_set=[3.0.2] device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000] directory=[/naab1]  for rsc is already running.
May 11 12:34:58 host002 lrmd: [2561]: info: perform_op:2894: postponing all ops on resource MyApp_fs_graph by 1000 ms
May 11 12:34:58 host002 lrmd: [2561]: WARN: MyApp_lvm_graph:monitor process (PID 1938) timed out (try 1).  Killing with signal SIGTERM (15).
May 11 12:34:58 host002 lrmd: [2561]: WARN: MyApp_lvm_landing:monitor process (PID 1939) timed out (try 1).  Killing with signal SIGTERM (15).
May 11 12:34:58 host002 lrmd: [2561]: WARN: operation monitor[190] on ocf::LVM::MyApp_lvm_graph for client 31850, its parameters: CRM_meta_depth=[0] depth=[0] exclusive=[yes] crm_feature_set=[3.0.2] volgrpname=[VolGroupB00] CRM_meta_on_fail=[standby] CRM_meta_name=[monitor] CRM_meta_interval=[10000] CRM_meta_timeout=[10000] : pid [1938] timed out
May 11 12:34:58 host002 lrmd: [2561]: WARN: operation monitor[191] on ocf::LVM::MyApp_lvm_landing for client 31850, its parameters: CRM_meta_depth=[0] depth=[0] exclusive=[yes] crm_feature_set=[3.0.2] volgrpname=[VolGroupB01] CRM_meta_on_fail=[standby] CRM_meta_name=[monitor] CRM_meta_interval=[10000] CRM_meta_timeout=[10000] : pid [1939] timed out
May 11 12:34:58 host002 crmd: [31850]: ERROR: process_lrm_event: LRM operation MyApp_lvm_graph_monitor_10000 (190) Timed Out (timeout=10000ms)
May 11 12:34:58 host002 crmd: [31850]: ERROR: process_lrm_event: LRM operation MyApp_lvm_landing_monitor_10000 (191) Timed Out (timeout=10000ms)
May 11 12:34:59 host002 lrmd: [2561]: info: perform_op:2884: operation stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its parameters: fstype=[ext4] crm_feature_set=[3.0.2] device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000] directory=[/naab1]  for rsc is already running.
May 11 12:34:59 host002 lrmd: [2561]: info: perform_op:2894: postponing all ops on resource MyApp_fs_graph by 1000 ms

And I noticed there are about 100 vgdisplay processes running in D state.

How can I configure Pacemaker so the other host forces sg_persist to be a master and then just takes the whole resource group without fencing?

I've tried "on-fail=standby" or "migration-threshold=0" but it just always gets stuck in this state. If I reconnect the LUN everything resumes and it instantly fails over but this is less than ideal.

Thanks,
Max

________________________________________________________________________
In order to protect our email recipients, Betfair Group use SkyScan from
MessageLabs to scan all Incoming and Outgoing mail for viruses.

________________________________________________________________________

________________________________________________________________________
In order to protect our email recipients, Betfair Group use SkyScan from 
MessageLabs to scan all Incoming and Outgoing mail for viruses.

________________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110511/25de1f5e/attachment.htm>