[Pacemaker] [DRBD-user] examples of dual primary DRBD

Tue Oct 11 02:35:22 UTC 2011

On Mon, Oct 10, 2011 at 9:12 PM, Florian Haas <florian at hastexo.com> wrote:
> On 2011-10-08 15:55, Bart Coninckx wrote:
>> On 10/08/11 00:25, Lars Ellenberg wrote:
>>> On Fri, Oct 07, 2011 at 10:21:08PM +0200, Bart Coninckx wrote:
>>>> On 10/06/11 22:03, Florian Haas wrote:
>>>>> On 2011-10-06 21:43, Bart Coninckx wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> would you mind sending me examples of your crm config for a dual
>>>>>> primary
>>>>>> DRBD resource?
>>>>>>
>>>>>> I used the one on
>>>>>>
>>>>>> http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html
>>>>>>
>>>>>> and on
>>>>>>
>>>>>> http://www.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2
>>>>>>
>>>>>> and they both result into split brain, except for when I start drbd
>>>>>> manually first.
>>>>>
>>>>> They clearly should not. Rather than soliciting other people's
>>>>> configurations and then try to adapt yours based on that, why don't you
>>>>> upload _your_ CIB (not just a "crm configure dump", but a full
>>>>> "cibadmin
>>>>> -Q") and your DRBD configuration to your pastebin/pastie/fpaste and let
>>>>> people tell you where your problem is?
>>>>
>>>> OK, I posted the drbd.conf on http://pastebin.com/SQe9YxhY
>>>>
>>>> cibadmin -Q is on http://pastebin.com/gTZqsACq
>>>>
>>>> The split brain logging is on http://pastebin.com/7unKKkdi .
>>>
>>> I somehow think you added some "--force" or "--overwrite-data-of-peer"
>>> to some drbdadm/drbdsetup primary invocation?
>>>
>>>> Could this be some sort of timing issue? Manually things are find,
>>>> but there are some seconds in between the primary promotions.
>>>
>>
>> OK, seems to be some sort of timing issue. I "fixed" this by adding a
>> "sleep 1" in the RA right before the "do_drbdadm primary $DRBD_RESOURCE"
>> line.
>>
>> I'm surprised though that I'm the first one to run into this.
>
> Er, wait. I'm cross-posting this to the Pacemaker list on a hunch.
>
> Andrew, in Boston last year you mentioned you were planning to implement
> a change to Master/Slave sets in which, iirc, startup and promotion
> would happen in one fell swoop (I believe the NTT folks made a
> compelling case for this). Has that change ever been implemented?

Alas no.
I still have intentions of doing so, but I was consumed with Matahari
for most of this year and have been playing catch-up ever since.

If you were inclined, you could (re)create a bug for this in
http://bugs.clusterlabs.org

> And if
> so, at which Pacemaker version? Is there a configuration option to
> revert back to the old behavior where the resource would be started
> first, and then promotion would occur some time after that?
>
> Cheers,
> Florian
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>