[Pacemaker] DRBD Recovery Policies

Menno Luiten mluiten at artifix.net
Fri Mar 12 10:40:27 UTC 2010


On 12-03-10 11:26, Darren.Mansell at opengi.co.uk wrote:
> Fairly standard, but I don't really want it to be fenced, as I want to
> keep the data that has been updated on the single remaining nodeB while
> NodeA was being repaired:

That is exactly what fencing is all about; preventing any node to take 
over the primary/master role with outdated data. So I'm not sure what 
you mean with not wanting it to be fenced.

Anyway, it would be enabled by adding the following lines to your 
drbd.conf (depending on the path of your drbd installation). Try it out 
and see if it fits your needs.

>
> global {
>    dialog-refresh       1;
>    minor-count  5;
> }
> common {
>    syncer { rate 10M; }
> }
> resource cluster_disk {
>    protocol  C;
>    disk {
>       on-io-error       pass_on;
>    }
>    syncer {
>    }
> handlers {

     fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
     after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";

>    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>    }
> net {
>       after-sb-1pri discard-secondary;
>    }
> startup {
>       wait-after-sb;
>   }
>    on cluster1 {
>       device    /dev/drbd0;
>       address   12.0.0.1:7789;
>       meta-disk internal;
>       disk      /dev/sdb1;
>    }
>    on cluster2 {
>       device    /dev/drbd0;
>       address   12.0.0.2:7789;
>       meta-disk internal;
>       disk      /dev/sdb1;
>    }
> }
>
>
>
> -----Original Message-----
> From: Menno Luiten [mailto:mluiten at artifix.net]
> Sent: 12 March 2010 10:05
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] DRBD Recovery Policies
>
> Are you absolutely sure you set the resource-fencing parameters
> correctly in your drbd.conf (you can post your drbd.conf if unsure) and
> reloaded the configuration?
>
> On 12-03-10 10:48, Darren.Mansell at opengi.co.uk wrote:
>> The odd thing is - it didn't. From my test, it failed back,
> re-promoted
>> NodeA to be the DRBD master and failed all grouped resources back too.
>>
>> Everything was working with the ~7GB of data I had put onto NodeB
> while
>> NodeA was down, now available on NodeA...
>>
>> /proc/drbd on the slave said Secondary/Primary UpToDate/Inconsistent
>> while it was syncing data back - so it was able to mount the
>> inconsistent data on the primary node and access the files that hadn't
>> yet sync'd over?! I mounted a 4GB ISO that shouldn't have been able to
>> be there yet and was able to access data inside it..
>>
>> Is my understanding of DRBD limited and it's actually able to provide
>> access to not fully sync'd files over the network link or something?
>>
>> If so - wow.
>>
>> I'm confused ;)
>>
>>
>> -----Original Message-----
>> From: Menno Luiten [mailto:mluiten at artifix.net]
>> Sent: 11 March 2010 19:35
>> To: pacemaker at oss.clusterlabs.org
>> Subject: Re: [Pacemaker] DRBD Recovery Policies
>>
>> Hi Darren,
>>
>> I believe that this is handled by DRBD by fencing the Master/Slave
>> resource during resync using Pacemaker. See
>> http://www.drbd.org/users-guide/s-pacemaker-fencing.html. This would
>> prevent Node A to promote/start services with outdated data
>> (fence-peer), and it would be forced to wait with takeover until the
>> resync is completed (after-resync-target).
>>
>> Regards,
>> Menno
>>
>> Op 11-3-2010 15:52, Darren.Mansell at opengi.co.uk schreef:
>>> I've been reading the DRBD Pacemaker guide on the DRBD.org site and
>> I'm
>>> not sure I can find the answer to my question.
>>>
>>> Imagine a scenario:
>>>
>>> (NodeA
>>>
>>> NodeB
>>>
>>> Order and group:
>>>
>>> M/S DRBD Promote/Demote
>>>
>>> FS Mount
>>>
>>> Other resource that depends on the F/S mount
>>>
>>> DRBD master location score of 100 on NodeA)
>>>
>>> NodeA is down, resources failover to NodeB and everything happily
> runs
>>> for days. When NodeA is brought back online it isn't treated as
>>> split-brain as a normal demote/promote would happen. But the data on
>>> NodeA would be very old and possibly take a long time to sync from
>> NodeB.
>>>
>>> What would happen in this scenario? Would the RA defer the promote
>> until
>>> the sync is completed? Would the inability to promote cause the
>> failback
>>> to not happen and a resource cleanup is required once the sync has
>>> completed?
>>>
>>> I guess this is really down to how advanced the Linbit DRBD RA is?
>>>
>>> Thanks
>>>
>>> Darren
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker




More information about the Pacemaker mailing list