[Pacemaker] Rsource failover error

cfk at itri.org.tw cfk at itri.org.tw
Mon Mar 21 03:25:09 EDT 2011


Hello,

Many thanks.

BR,

CFK

-----Original Message-----
From: Andreas Kurz [mailto:andreas.kurz at linbit.com] 
Sent: Friday, March 18, 2011 4:22 PM
To: pacemaker at oss.clusterlabs.org
Subject: Re: [Pacemaker] Rsource failover error

hello,

On 2011-03-18 08:37, cfk at itri.org.tw wrote:
> Dear all,
> 
>  
> 
> I am a new member to this mailing list. Please let me know if the
> explanation is not clear enough.
> 
>  
> 
> I setup a Centos 5.4 cluster environment (2 nodes, alpha1 and alpha2)
> with the following software:
> 
> Corosync 1.3.0
> 
> Pacemaker 1.0.10.
> 
> Drbd 8.3.9
> 
>  
> 
> The environment is constructed as Active/Passive cluster mode based on
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf.
> 
>  
> 
> I setup four resources ( IP, DRBD, FileSystem, Apache) and want to test
> different failover situations.
> 
>  
> 
> When I kill the corosync process at Active host, the Pacemaker seems
> fail to move DRBD:Master to the original Passive host, said Alpha2.

is there a log entry like 'Multiple primaries not allowed by config' ?
... if you only kill corosync and DRBD is still connected and running
fine DRBD will refuse to be promoted on both sides if not configured.

and yes ... stonith would solve this problem.

Regards,
Andreas

> 
>  
> 
> Corosync and DRBD configuration files are attached in this mail, and the
> crm configuration is listed below
> 
> =====================================================================================
> 
> node alpha1
> 
> node alpha2
> 
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> 
>         params ip="192.168.75.10" cidr_netmask="32" \
> 
>         op monitor interval="10s"
> 
> primitive Disk ocf:linbit:drbd \
> 
>         params drbd_resource="ccmadata" \
> 
>         op monitor interval="60s"
> 
> primitive FS ocf:heartbeat:Filesystem \
> 
>         params device="/dev/drbd0" directory="/var/www/html" fstype="
> 
> ext3"
> 
> primitive WebSite ocf:heartbeat:apache \
> 
>         params configfile="/etc/httpd/conf/httpd.conf" \
> 
>         op monitor interval="1min"
> 
> ms DiskClone Disk \
> 
>         meta master-max="1" master-node-max="1" clone-max="2"
> 
> clone-node-max="1" notify="true"
> 
> colocation drbd-with-ip inf: ClusterIP DiskClone:Master
> 
> colocation fs-on-drbd inf: FS DiskClone:Master
> 
> colocation website-with-fs inf: WebSite FS
> 
> order DiskClone-after-IP inf: DiskClone:promote ClusterIP:start
> 
> order FS-after-DiskClone inf: DiskClone:promote FS:start
> 
> order WebSite-after-FS inf: FS:start WebSite:start
> 
> property $id="cib-bootstrap-options" \
> 
>         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
> 
>         cluster-infrastructure="openais" \
> 
>         expected-quorum-votes="2" \
> 
>         stonith-enabled="false" \
> 
>         no-quorum-policy="ignore"
> 
> =====================================================================================
> 
>  
> 
> The first abnormal monitoring message by crm_mon command is
> 
> =====================================================================================
> 
> Last updated: Thu Mar 17 18:19:04 2011
> 
> Stack: openais
> 
> Current DC: alpha2 - partition WITHOUT quorum
> 
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 
> 2 Nodes configured, 2 expected votes
> 
> 4 Resources configured.
> 
> ============
> 
>  
> 
> Online: [ alpha2 ]
> 
> OFFLINE: [ alpha1 ]
> 
>  
> 
>  Master/Slave Set: DiskClone
> 
>      Slaves: [ alpha2 ]
> 
>      Stopped: [ Disk:0 ]
> 
> =====================================================================================
> 
>  
> 
> The last abnormal monitoring message is
> 
> =====================================================================================
> 
> ============
> 
> Last updated: Thu Mar 17 18:20:01 2011
> 
> Stack: openais
> 
> Current DC: alpha2 - partition WITHOUT quorum
> 
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 
> 2 Nodes configured, 2 expected votes
> 
> 4 Resources configured.
> 
> ============
> 
>  
> 
> Online: [ alpha2 ]
> 
> OFFLINE: [ alpha1 ]
> 
>  
> 
>  Master/Slave Set: DiskClone
> 
>      Slaves: [ alpha2 ]
> 
>      Stopped: [ Disk:1 ]
> 
>  
> 
> Failed actions:
> 
>     Disk:1_promote_0 (node=alpha2, call=12, rc=-2, status=Timed Out):
> 
> unknown ex
> 
> ec error
> 
>     Disk:0_promote_0 (node=alpha2, call=22, rc=-2, status=Timed Out):
> 
> unknown ex
> 
> ec error
> 
> =====================================================================================
> 
>  
> 
> Corosync log on host Alpha1 is drbd_test_alpha1.log, and that on hoat
> Alpha2 is drbd_test_alpha2.log
> 
>  
> 
> My questions are:
> 
> 1)     How to solve this issue? Do I miss some crm configuration for
> this situation?
> 
> 2)     According to corosync log on host Alpha2, Pacemaker wants to
> prompt 2 DRBD masters (Please correct me if I am wrong). The action is
> failed because the operation mode is set as Active/Passive mode and only
> 1 DRBD master is allowed to exist. Should I add additional crm or
> drbd.conf configurations?
> 
> 3)     I am still study STONITH. Is my question a split-brain issue?
> 
>  
> 
> Thanks for your help.
> 
>  
> 
> BR,
> 
> Chia-Feng Kang
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,
> 並請銷毀此信件。
> This email may contain confidential information. Please do not use or
> disclose it in any way and delete it if you are not the intended recipient.
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


-- 
: Andreas Kurz					
: LINBIT | Your Way to High Availability
: Tel +43-1-8178292-64, Fax +43-1-8178292-82
:
: http://www.linbit.com

DRBDR and LINBITR are registered trademarks of LINBIT, Austria.

This e-mail is solely for use by the intended recipient(s). Information
contained in this e-mail and its attachments may be confidential,
privileged or copyrighted. If you are not the intended recipient you are
hereby formally notified that any use, copying, disclosure or
distribution of the contents of this e-mail, in whole or in part, is
prohibited. Also please notify immediately the sender by return e-mail
and delete this e-mail from your system. Thank you for your co-operation.



====================================================================
本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 
This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.


More information about the Pacemaker mailing list