[Pacemaker] [DRBD-user] examples of dual primary DRBD

Tue Oct 11 07:12:19 UTC 2011

On 10/11/11 04:35, Andrew Beekhof wrote:
> On Mon, Oct 10, 2011 at 9:12 PM, Florian Haas<florian at hastexo.com>  wrote:
>> On 2011-10-08 15:55, Bart Coninckx wrote:
>>> On 10/08/11 00:25, Lars Ellenberg wrote:
>>>> On Fri, Oct 07, 2011 at 10:21:08PM +0200, Bart Coninckx wrote:
>>>>> On 10/06/11 22:03, Florian Haas wrote:
>>>>>> On 2011-10-06 21:43, Bart Coninckx wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> would you mind sending me examples of your crm config for a dual
>>>>>>> primary
>>>>>>> DRBD resource?
>>>>>>>
>>>>>>> I used the one on
>>>>>>>
>>>>>>> http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html
>>>>>>>
>>>>>>> and on
>>>>>>>
>>>>>>> http://www.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2
>>>>>>>
>>>>>>> and they both result into split brain, except for when I start drbd
>>>>>>> manually first.
>>>>>>
>>>>>> They clearly should not. Rather than soliciting other people's
>>>>>> configurations and then try to adapt yours based on that, why don't you
>>>>>> upload _your_ CIB (not just a "crm configure dump", but a full
>>>>>> "cibadmin
>>>>>> -Q") and your DRBD configuration to your pastebin/pastie/fpaste and let
>>>>>> people tell you where your problem is?
>>>>>
>>>>> OK, I posted the drbd.conf on http://pastebin.com/SQe9YxhY
>>>>>
>>>>> cibadmin -Q is on http://pastebin.com/gTZqsACq
>>>>>
>>>>> The split brain logging is on http://pastebin.com/7unKKkdi .
>>>>
>>>> I somehow think you added some "--force" or "--overwrite-data-of-peer"
>>>> to some drbdadm/drbdsetup primary invocation?
>>>>
>>>>> Could this be some sort of timing issue? Manually things are find,
>>>>> but there are some seconds in between the primary promotions.
>>>>
>>>
>>> OK, seems to be some sort of timing issue. I "fixed" this by adding a
>>> "sleep 1" in the RA right before the "do_drbdadm primary $DRBD_RESOURCE"
>>> line.
>>>
>>> I'm surprised though that I'm the first one to run into this.
>>
>> Er, wait. I'm cross-posting this to the Pacemaker list on a hunch.
>>
>> Andrew, in Boston last year you mentioned you were planning to implement
>> a change to Master/Slave sets in which, iirc, startup and promotion
>> would happen in one fell swoop (I believe the NTT folks made a
>> compelling case for this). Has that change ever been implemented?
>
> Alas no.
> I still have intentions of doing so, but I was consumed with Matahari
> for most of this year and have been playing catch-up ever since.
>
> If you were inclined, you could (re)create a bug for this in
> http://bugs.clusterlabs.org
>
>> And if
>> so, at which Pacemaker version? Is there a configuration option to
>> revert back to the old behavior where the resource would be started
>> first, and then promotion would occur some time after that?
>>
>> Cheers,
>> Florian
>>
>> --
>> Need help with High Availability?
>> http://www.hastexo.com/now
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

Florian,

Does this mean you thought this problem could have been the result of 
changes done by Andrew to the DRBD RA? But sindce he hasn't done them 
yet, isn't?

thx,

B.