[Pacemaker] strange drbd migration fail

Matthew O'Connor matt at ecsorl.com
Tue Jul 16 17:36:37 EDT 2013


Hi,

OK, so I was a little hasty in announcing success, because the issue
only occurs when all three of my nodes are active.

With DRBD resource A managed by nodes c3 and c4, and DRBD resource B
managed by nodes c4 and c5, I can reliably cause a never-ending flapping
failure by performing the steps I outlined in my first email when all
three nodes are active.  If I put c5 into standby, I can migrate A back
and forth no problem (did this 100 times with a script, even).  If I put
c3 into standby, B can be migrated.  With all three nodes active, A and
B only respond correctly to a migrate of their related dummy resource
when migrating in one direction.  With any of the nodes, and any active
migration failure cycle, the cycle is broken with "resource unmigrate
<target>".

The order of the nodes does not seem to matter. 
  c4->c3 works while c3->c4 fails.
  c5->c4 works while c4->c5 fails. 
  c3->c5 works while c5->c3 fails. 

Note again that once the unmigrate command is issued, the target node in
the failure case (c4, c5, and c3 respectively) do in fact ultimately
become promoted to Master.  For sake of completeness, here are the exact
steps to duplicate (and hopefully someone can):

1. create 3 nodes, stack = classic openais (with plugin)
2. create a DRBD resource between nodes 1 and 2.
3. set location constraints to forbid node 3 from ever receiving the
DRBD resource.
4. create a dummy resource that is colocated with the DRBD master.
5. migrate the dummy resource back and forth using "resource migrate"
and "resource unmigrate"

-- Matt



On 07/16/2013 12:29 PM, Matthew O'Connor wrote:
> Hi,
>
> Probably safe to disregard this issue...  I found I was somehow not
> building the latest 1.1.9.  After building and installing
> 1.1.9-cad5efc the problem appears to have gone away.
>
> On 07/15/2013 05:25 PM, Matthew O'Connor wrote:
>> I have run into a strange problem with a DRBD resource migrating
>> master role from one node to the other.  3-node cluster, Pacemaker
>> v1.1.9, Corosync v1.4.5, DRBD 8.3.11.  Both Pacemaker and Corosync
>> are built from source.  Two of the nodes are running DRBD resources
>> between them in simple single-master relationships. 
>>
>> The nodes are called c3, c4, and c5; c5 is location-constrained to
>> never receive DRBD resource clones.  I have a drbd resource called
>> ms_drbd-p_dummy1, and a dummy resource called p_dummy1.  The resource
>> is colocated with the drbd master, and ordered such that master is
>> promoted before the resource is started.  The config generally
>> follows accepted online examples (see below).
>>
>> When c3 is master, I attempt to migrate to c4 by issuing "resource
>> migrate p_dummy1".  I see a fleeting FAILED notice in crm_mon for c3,
>> then the cluster then starts into this cycle where it keeps bringing
>> the relevant DRBD resource up and down very quickly.  Syslog shows
>> the connection being setup and torn-down over and over.  The up-down
>> cycle is broken by issuing "resource unmigrate p_dummy1" after which
>> c4 becomes DRBD master and c3 its slave.  That is to say, the
>> migration works but only after the unmigrate is issued.  Migrating
>> from c4 to c3 works every time.
>>
>> Putting either node into standby works fine, resources migrate
>> without issue in that case.  I have fencing enabled and tested, but
>> it's not being called into action here.  I also tried re-creating my
>> DRBD resource and resyncing, with no change to the results.  I can
>> manually shift either node to primary using drbdadm while the
>> resource is unmanaged by Pacemaker.  I have also duplicated this
>> behavior with one of my other DRBD resources and a second dummy
>> resource.  Finally, I confirmed this between a new drbd and dummy
>> resource set between c4 and c5 (where c4->c5 transition fails until
>> unmigrate is issued, but c5->c4 migrate works fine).
>>
>> An attempt to manually demote ms_drbd-aoe1 resulted in Pacemaker
>> reporting a failure, even though /proc/drbd subsequently showed both
>> nodes in Secondary.
>>
>> This syslog fragment shows the attempt, failure, unmigrate and the
>> eventual success of migration: http://pastebin.com/tBtydG1f
>>
>> Key configuration elements:
>>
>> primitive p_drbd-aoe1 ocf:linbit:drbd \
>>         params drbd_resource="aoe1" \
>>         op start interval="0" timeout="5m" \
>>         op promote interval="0" timeout="90s" \
>>         op demote interval="0" timeout="90s" \
>>         op stop interval="0" timeout="3m" \
>>         op monitor interval="20" role="Slave" timeout="20" \
>>         op monitor interval="10" role="Master" timeout="20"
>>
>> primitive p_dummy1 ocf:heartbeat:Dummy
>>
>> ms ms_drbd-aoe1 p_drbd-aoe1 \
>>         meta master-max="1" notify="true" clone-max="2"
>> master-node-max="1" clone-node-max="1" target-role="Started"
>> is-managed="true"
>>
>> colocation colo_dummy inf: p_dummy1 ms_drbd-aoe1:Master
>> order o_dummy inf: ms_drbd-aoe1:promote p_dummy1:start
>>
>> Any ideas?
>>
>> Thanks!!
>>
>>
>>
>> -- 
>> Thank you!
>>   Matthew O'Connor
>>   (GPG Key ID: 55F981C4)
>>
>>
>> CONFIDENTIAL NOTICE: The information contained in this electronic message is legally privileged, confidential and exempt from disclosure under applicable law. It is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return e-mail and delete the original message and any copies of it from your computer system. Thank you.
>>  
>> EXPORT CONTROL WARNING:  This document may contain technical data that is subject to the International Traffic in Arms Regulations (ITAR) controls and may not be exported or otherwise disclosed to any foreign person or firm, whether in the US or abroad, without first complying with all requirements of the ITAR, 22 CFR 120-130, including the requirement for obtaining an export license if applicable. In addition, this document may contain technology that is subject to the Export Administration Regulations (EAR) and may not be exported or otherwise disclosed to any non-U.S. person, whether in the US or abroad, without first complying with all requirements of the EAR, 15 CFR 730-774, including the requirement for obtaining an export license if applicable. Violation of these export laws is subject to severe criminal penalties.
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> -- 
> Thank you!
>   Matthew O'Connor
>   (GPG Key ID: 55F981C4)
>
>
> CONFIDENTIAL NOTICE: The information contained in this electronic message is legally privileged, confidential and exempt from disclosure under applicable law. It is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return e-mail and delete the original message and any copies of it from your computer system. Thank you.
>  
> EXPORT CONTROL WARNING:  This document may contain technical data that is subject to the International Traffic in Arms Regulations (ITAR) controls and may not be exported or otherwise disclosed to any foreign person or firm, whether in the US or abroad, without first complying with all requirements of the ITAR, 22 CFR 120-130, including the requirement for obtaining an export license if applicable. In addition, this document may contain technology that is subject to the Export Administration Regulations (EAR) and may not be exported or otherwise disclosed to any non-U.S. person, whether in the US or abroad, without first complying with all requirements of the EAR, 15 CFR 730-774, including the requirement for obtaining an export license if applicable. Violation of these export laws is subject to severe criminal penalties.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Thank you!
  Matthew O'Connor
  (GPG Key ID: 55F981C4)


CONFIDENTIAL NOTICE: The information contained in this electronic message is legally privileged, confidential and exempt from disclosure under applicable law. It is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return e-mail and delete the original message and any copies of it from your computer system. Thank you.
 
EXPORT CONTROL WARNING:  This document may contain technical data that is subject to the International Traffic in Arms Regulations (ITAR) controls and may not be exported or otherwise disclosed to any foreign person or firm, whether in the US or abroad, without first complying with all requirements of the ITAR, 22 CFR 120-130, including the requirement for obtaining an export license if applicable. In addition, this document may contain technology that is subject to the Export Administration Regulations (EAR) and may not be exported or otherwise disclosed to any non-U.S. person, whether in the US or abroad, without first complying with all requirements of the EAR, 15 CFR 730-774, including the requirement for obtaining an export license if applicable. Violation of these export laws is subject to severe criminal penalties.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130716/03474643/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5029 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130716/03474643/attachment-0003.p7s>


More information about the Pacemaker mailing list