[Pacemaker] Failover when storage fails

Thu Jun 2 16:47:17 UTC 2011

Just to update the list with the outcome of this issue, it's resolved in Pacemaker 1.1.5.
Cheers,
Max

-----Original Message-----
From: Max Williams [mailto:Max.Williams at betfair.com] 
Sent: 13 May 2011 09:55
To: The Pacemaker cluster resource manager (pacemaker at oss.clusterlabs.org)
Subject: Re: [Pacemaker] Failover when storage fails

Well this is not what I am seeing here. Perhaps a bug?
I also tried adding "op stop interval=0 timeout=10" to the LVM resources but still when the storage disappears the cluster just stops where it is and those log entries (below) just get printed in a loop.
Cheers,
Max

-----Original Message-----
From: Tim Serong [mailto:tserong at novell.com]
Sent: 13 May 2011 04:22
To: The Pacemaker cluster resource manager (pacemaker at oss.clusterlabs.org)
Subject: Re: [Pacemaker] Failover when storage fails

On 5/12/2011 at 02:28 AM, Max Williams <Max.Williams at betfair.com> wrote: 
> After further testing even with stonith enabled the cluster still gets 
> stuck in this state, presumably waiting on IO. I can get around it by 
> setting "on-fail=fence" on the LVM resources but shouldn't Pacemaker 
> be smart enough to realise the host is effectively offline?

If you've got STONITH enabled, nodes should just get fenced when this occurs, without your having to specify on-fail=fence for the monitor op.
What *should* happen is, the monitor fails or times out, then pacemaker will try to stop the resource.  If the stop also fails or times out, the node will be fenced.  See:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-operations.html

Also, http://ourobengr.com/ha#causes is relevant here.

Regards,

Tim

> Or am I missing some timeout
> value that would fix this situation? 
>  
> pacemaker-1.1.2-7.el6.x86_64
> corosync-1.2.3-21.el6.x86_64
> RHEL 6.0
>  
> Config: 
>  
> node host001.domain \ 
>         attributes standby="off" 
> node host002.domain \ 
>         attributes standby="off" 
> primitive MyApp_IP ocf:heartbeat:IPaddr \ 
>         params ip="192.168.104.26" \ 
>         op monitor interval="10s" 
> primitive MyApp_fs_graph ocf:heartbeat:Filesystem \ 
>         params device="/dev/VolGroupB00/AppLV2" directory="/naab1"  
> fstype="ext4" \ 
>         op monitor interval="10" timeout="10" 
> primitive MyApp_fs_landing ocf:heartbeat:Filesystem \ 
>         params device="/dev/VolGroupB01/AppLV1" directory="/naab2"  
> fstype="ext4" \ 
>         op monitor interval="10" timeout="10" 
> primitive MyApp_lvm_graph ocf:heartbeat:LVM \ 
>         params volgrpname="VolGroupB00" exclusive="yes" \ 
>         op monitor interval="10" timeout="10" on-fail="fence" depth="0" 
> primitive MyApp_lvm_landing ocf:heartbeat:LVM \ 
>         params volgrpname="VolGroupB01" exclusive="yes" \ 
>         op monitor interval="10" timeout="10" on-fail="fence" depth="0" 
> primitive MyApp_scsi_reservation ocf:heartbeat:sg_persist \ 
>         params sg_persist_resource="scsi_reservation0" devs="/dev/dm-6 
> /dev/dm-7" required_devs_nof="2" reservation_type="1"
> primitive MyApp_init_script lsb:AppInitScript \ 
>         op monitor interval="10" timeout="10" 
> primitive fence_host001.domain stonith:fence_ipmilan \ 
>         params ipaddr="192.168.16.148" passwd="password" login="root"  
> pcmk_host_list="host001.domain" pcmk_host_check="static-list" \ 
>         meta target-role="Started" 
> primitive fence_host002.domain stonith:fence_ipmilan \ 
>         params ipaddr="192.168.16.149" passwd="password" login="root"  
> pcmk_host_list="host002.domain" pcmk_host_check="static-list" \ 
>         meta target-role="Started" 
> group MyApp_group MyApp_lvm_graph MyApp_lvm_landing MyApp_fs_graph 
> MyApp_fs_landing MyApp_IP MyApp_init_script \
>         meta target-role="Started" migration-threshold="2" on-fail="restart"  
> failure-timeout="300s" 
> ms ms_MyApp_scsi_reservation MyApp_scsi_reservation \ 
>         meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"  
> notify="true" 
> colocation MyApp_group_on_scsi_reservation inf: MyApp_group 
> ms_MyApp_scsi_reservation:Master order 
> MyApp_group_after_scsi_reservation inf:
> ms_MyApp_scsi_reservation:promote MyApp_group:start property 
> $id="cib-bootstrap-options" \
>         dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \ 
>         cluster-infrastructure="openais" \ 
>         expected-quorum-votes="2" \ 
>         no-quorum-policy="ignore" \ 
>         stonith-enabled="true" \ 
>         last-lrm-refresh="1305129673" 
> rsc_defaults $id="rsc-options" \ 
>         resource-stickiness="1" 
>  
>  
>  
>  
>  
> From: Max Williams [mailto:Max.Williams at betfair.com]
> Sent: 11 May 2011 13:55
> To: The Pacemaker cluster resource manager
> (pacemaker at oss.clusterlabs.org)
> Subject: [Pacemaker] Failover when storage fails
>  
> Hi,
> I want to configure pacemaker to failover a group of resources and 
> sg_persist (master/slave) when there is a problem with the storage but 
> when I cause the iSCSI LUN to disappear simulating a failure, the 
> cluster always gets stuck in this state:
>  
> Last updated: Wed May 11 10:52:43 2011
> Stack: openais
> Current DC: host001.domain - partition with quorum
> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
> 2 Nodes configured, 2 expected votes
> 4 Resources configured. 
> ============
>  
> Online: [ host002.domain host001.domain ]
>  
> fence_host002.domain     (stonith:fence_ipmilan):        Started  
> host001.domain 
> fence_host001.domain     (stonith:fence_ipmilan):        Started  
> host001.domain
> Resource Group: MyApp_group 
>      MyApp_lvm_graph    (ocf::heartbeat:LVM):   Started host002.domain  
> FAILED 
>      MyApp_lvm_landing  (ocf::heartbeat:LVM):   Started host002.domain  
> FAILED 
>      MyApp_fs_graph     (ocf::heartbeat:Filesystem):    Started  
> host002.domain 
>      MyApp_fs_landing   (ocf::heartbeat:Filesystem):    Started  
> host002.domain 
>      MyApp_IP   (ocf::heartbeat:IPaddr):        Stopped 
>      MyApp_init_script   (lsb:abworkload):              Stopped 
> Master/Slave Set: ms_MyApp_scsi_reservation 
>      Masters: [ host002.domain ] 
>      Slaves: [ host001.domain ]
>  
> Failed actions: 
>     MyApp_lvm_graph_monitor_10000 (node=host002.domain, call=129, 
> rc=-2, status=Timed Out): unknown exec error
>     MyApp_lvm_landing_monitor_10000 (node=host002.domain, call=130, 
> rc=-2, status=Timed Out): unknown exec error
>  
> This is printed over and over in the logs: 
>  
> May 11 12:34:56 host002 lrmd: [2561]: info: perform_op:2884: operation 
> stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its
> parameters: fstype=[ext4] crm_feature_set=[3.0.2] 
> device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000] 
> directory=[/naab1]  for rsc is already running.
> May 11 12:34:56 host002 lrmd: [2561]: info: perform_op:2894: 
> postponing all ops on resource MyApp_fs_graph by 1000 ms May 11
> 12:34:57 host002 lrmd: [2561]: info: perform_op:2884: operation 
> stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its
> parameters: fstype=[ext4] crm_feature_set=[3.0.2] 
> device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000] 
> directory=[/naab1]  for rsc is already running.
> May 11 12:34:57 host002 lrmd: [2561]: info: perform_op:2894: 
> postponing all ops on resource MyApp_fs_graph by 1000 ms May 11
> 12:34:58 host002 lrmd: [2561]: info: perform_op:2884: operation 
> stop[202] on ocf::Filesystem::MyApp_fs_graph for client 31850, its
> parameters: fstype=[ext4] crm_feature_set=[3.0.2] 
> device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000] 
> directory=[/naab1]  for rsc is already running.
> May 11 12:34:58 host002 lrmd: [2561]: info: perform_op:2894: 
> postponing all ops on resource MyApp_fs_graph by 1000 ms May 11
> 12:34:58 host002 lrmd: [2561]: WARN: MyApp_lvm_graph:monitor process 
> (PID 1938) timed out (try 1).  Killing with signal SIGTERM (15).
> May 11 12:34:58 host002 lrmd: [2561]: WARN: MyApp_lvm_landing:monitor 
> process (PID 1939) timed out (try 1).  Killing with signal SIGTERM (15).
> May 11 12:34:58 host002 lrmd: [2561]: WARN: operation monitor[190] on 
> ocf::LVM::MyApp_lvm_graph for client 31850, its parameters:
> CRM_meta_depth=[0] depth=[0] exclusive=[yes] crm_feature_set=[3.0.2] 
> volgrpname=[VolGroupB00] CRM_meta_on_fail=[standby] 
> CRM_meta_name=[monitor] CRM_meta_interval=[10000] 
> CRM_meta_timeout=[10000] : pid [1938] timed out May 11 12:34:58
> host002 lrmd: [2561]: WARN: operation monitor[191] on ocf::LVM::MyApp_lvm_landing for client 31850, its parameters:
> CRM_meta_depth=[0] depth=[0] exclusive=[yes] crm_feature_set=[3.0.2] 
> volgrpname=[VolGroupB01] CRM_meta_on_fail=[standby] 
> CRM_meta_name=[monitor] CRM_meta_interval=[10000] 
> CRM_meta_timeout=[10000] : pid [1939] timed out May 11 12:34:58
> host002 crmd: [31850]: ERROR: process_lrm_event: LRM operation
> MyApp_lvm_graph_monitor_10000 (190) Timed Out (timeout=10000ms) May 11
> 12:34:58 host002 crmd: [31850]: ERROR: process_lrm_event: LRM 
> operation MyApp_lvm_landing_monitor_10000 (191) Timed Out
> (timeout=10000ms) May 11 12:34:59 host002 lrmd: [2561]: info: 
> perform_op:2884: operation stop[202] on 
> ocf::Filesystem::MyApp_fs_graph for client 31850, its
> parameters: fstype=[ext4] crm_feature_set=[3.0.2] 
> device=[/dev/VolGroupB00/abb_graph] CRM_meta_timeout=[20000] 
> directory=[/naab1]  for rsc is already running.
> May 11 12:34:59 host002 lrmd: [2561]: info: perform_op:2894: 
> postponing all ops on resource MyApp_fs_graph by 1000 ms
>  
> And I noticed there are about 100 vgdisplay processes running in D state. 
>  
> How can I configure Pacemaker so the other host forces sg_persist to 
> be a master and then just takes the whole resource group without fencing?
>  
> I've tried "on-fail=standby" or "migration-threshold=0" but it just 
> always gets stuck in this state. If I reconnect the LUN everything 
> resumes and it instantly fails over but this is less than ideal.
>  
> Thanks,
> Max
>  
>  
>  
>  
>  
>  
>  
> ______________________________________________________________________
> __ In order to protect our email recipients, Betfair Group use SkyScan 
> from MessageLabs to scan all Incoming and Outgoing mail for viruses.
>  
> ______________________________________________________________________
> __
>  
> ______________________________________________________________________
> __ In order to protect our email recipients, Betfair Group use SkyScan 
> from MessageLabs to scan all Incoming and Outgoing mail for viruses.
>  
> ______________________________________________________________________
> __

--
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

________________________________________________________________________
In order to protect our email recipients, Betfair Group use SkyScan from MessageLabs to scan all Incoming and Outgoing mail for viruses.

________________________________________________________________________

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

________________________________________________________________________
In order to protect our email recipients, Betfair Group use SkyScan from 
MessageLabs to scan all Incoming and Outgoing mail for viruses.

________________________________________________________________________