[Pacemaker] strange drbd migration fail

Matthew O'Connor matt at ecsorl.com
Mon Jul 15 17:25:26 EDT 2013


I have run into a strange problem with a DRBD resource migrating master
role from one node to the other.  3-node cluster, Pacemaker v1.1.9,
Corosync v1.4.5, DRBD 8.3.11.  Both Pacemaker and Corosync are built
from source.  Two of the nodes are running DRBD resources between them
in simple single-master relationships. 

The nodes are called c3, c4, and c5; c5 is location-constrained to never
receive DRBD resource clones.  I have a drbd resource called
ms_drbd-p_dummy1, and a dummy resource called p_dummy1.  The resource is
colocated with the drbd master, and ordered such that master is promoted
before the resource is started.  The config generally follows accepted
online examples (see below).

When c3 is master, I attempt to migrate to c4 by issuing "resource
migrate p_dummy1".  I see a fleeting FAILED notice in crm_mon for c3,
then the cluster then starts into this cycle where it keeps bringing the
relevant DRBD resource up and down very quickly.  Syslog shows the
connection being setup and torn-down over and over.  The up-down cycle
is broken by issuing "resource unmigrate p_dummy1" after which c4
becomes DRBD master and c3 its slave.  That is to say, the migration
works but only after the unmigrate is issued.  Migrating from c4 to c3
works every time.

Putting either node into standby works fine, resources migrate without
issue in that case.  I have fencing enabled and tested, but it's not
being called into action here.  I also tried re-creating my DRBD
resource and resyncing, with no change to the results.  I can manually
shift either node to primary using drbdadm while the resource is
unmanaged by Pacemaker.  I have also duplicated this behavior with one
of my other DRBD resources and a second dummy resource.  Finally, I
confirmed this between a new drbd and dummy resource set between c4 and
c5 (where c4->c5 transition fails until unmigrate is issued, but c5->c4
migrate works fine).

An attempt to manually demote ms_drbd-aoe1 resulted in Pacemaker
reporting a failure, even though /proc/drbd subsequently showed both
nodes in Secondary.

This syslog fragment shows the attempt, failure, unmigrate and the
eventual success of migration: http://pastebin.com/tBtydG1f

Key configuration elements:

primitive p_drbd-aoe1 ocf:linbit:drbd \
        params drbd_resource="aoe1" \
        op start interval="0" timeout="5m" \
        op promote interval="0" timeout="90s" \
        op demote interval="0" timeout="90s" \
        op stop interval="0" timeout="3m" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20"

primitive p_dummy1 ocf:heartbeat:Dummy

ms ms_drbd-aoe1 p_drbd-aoe1 \
        meta master-max="1" notify="true" clone-max="2"
master-node-max="1" clone-node-max="1" target-role="Started"
is-managed="true"

colocation colo_dummy inf: p_dummy1 ms_drbd-aoe1:Master
order o_dummy inf: ms_drbd-aoe1:promote p_dummy1:start

Any ideas?

Thanks!!



-- 
Thank you!
  Matthew O'Connor
  (GPG Key ID: 55F981C4)


CONFIDENTIAL NOTICE: The information contained in this electronic message is legally privileged, confidential and exempt from disclosure under applicable law. It is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return e-mail and delete the original message and any copies of it from your computer system. Thank you.
 
EXPORT CONTROL WARNING:  This document may contain technical data that is subject to the International Traffic in Arms Regulations (ITAR) controls and may not be exported or otherwise disclosed to any foreign person or firm, whether in the US or abroad, without first complying with all requirements of the ITAR, 22 CFR 120-130, including the requirement for obtaining an export license if applicable. In addition, this document may contain technology that is subject to the Export Administration Regulations (EAR) and may not be exported or otherwise disclosed to any non-U.S. person, whether in the US or abroad, without first complying with all requirements of the EAR, 15 CFR 730-774, including the requirement for obtaining an export license if applicable. Violation of these export laws is subject to severe criminal penalties.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130715/ffd29df7/attachment-0002.html>


More information about the Pacemaker mailing list