[Pacemaker] Rsource failover error

Fri Mar 18 03:37:06 EDT 2011

Dear all,

I am a new member to this mailing list. Please let me know if the explanation is not clear enough.

I setup a Centos 5.4 cluster environment (2 nodes, alpha1 and alpha2) with the following software:
Corosync 1.3.0
Pacemaker 1.0.10.
Drbd 8.3.9

The environment is constructed as Active/Passive cluster mode based on http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf.

I setup four resources ( IP, DRBD, FileSystem, Apache) and want to test different failover situations.

When I kill the corosync process at Active host, the Pacemaker seems fail to move DRBD:Master to the original Passive host, said Alpha2.

Corosync and DRBD configuration files are attached in this mail, and the crm configuration is listed below
=====================================================================================
node alpha1
node alpha2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="192.168.75.10" cidr_netmask="32" \
        op monitor interval="10s"
primitive Disk ocf:linbit:drbd \
        params drbd_resource="ccmadata" \
        op monitor interval="60s"
primitive FS ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/var/www/html" fstype="
ext3"
primitive WebSite ocf:heartbeat:apache \
        params configfile="/etc/httpd/conf/httpd.conf" \
        op monitor interval="1min"
ms DiskClone Disk \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation drbd-with-ip inf: ClusterIP DiskClone:Master
colocation fs-on-drbd inf: FS DiskClone:Master
colocation website-with-fs inf: WebSite FS
order DiskClone-after-IP inf: DiskClone:promote ClusterIP:start
order FS-after-DiskClone inf: DiskClone:promote FS:start
order WebSite-after-FS inf: FS:start WebSite:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
=====================================================================================

The first abnormal monitoring message by crm_mon command is
=====================================================================================
Last updated: Thu Mar 17 18:19:04 2011
Stack: openais
Current DC: alpha2 - partition WITHOUT quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ alpha2 ]
OFFLINE: [ alpha1 ]

 Master/Slave Set: DiskClone
     Slaves: [ alpha2 ]
     Stopped: [ Disk:0 ]
=====================================================================================

The last abnormal monitoring message is
=====================================================================================
============
Last updated: Thu Mar 17 18:20:01 2011
Stack: openais
Current DC: alpha2 - partition WITHOUT quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ alpha2 ]
OFFLINE: [ alpha1 ]

 Master/Slave Set: DiskClone
     Slaves: [ alpha2 ]
     Stopped: [ Disk:1 ]

Failed actions:
    Disk:1_promote_0 (node=alpha2, call=12, rc=-2, status=Timed Out):
unknown ex
ec error
    Disk:0_promote_0 (node=alpha2, call=22, rc=-2, status=Timed Out):
unknown ex
ec error
=====================================================================================

Corosync log on host Alpha1 is drbd_test_alpha1.log, and that on hoat Alpha2 is drbd_test_alpha2.log

My questions are:
1)     How to solve this issue? Do I miss some crm configuration for this situation?
2)     According to corosync log on host Alpha2, Pacemaker wants to prompt 2 DRBD masters (Please correct me if I am wrong). The action is failed because the operation mode is set as Active/Passive mode and only 1 DRBD master is allowed to exist. Should I add additional crm or drbd.conf configurations?
3)     I am still study STONITH. Is my question a split-brain issue?

Thanks for your help.

BR,
Chia-Feng Kang

====================================================================
本信件可能包含工研院機密資訊，非指定之收件者，請勿使用或揭露本信件內容，並請銷毀此信件。 
This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110318/b2075a65/attachment-0002.html>