[Pacemaker] DRBD Recovery Policies

Darren.Mansell at opengi.co.uk Darren.Mansell at opengi.co.uk
Fri Mar 12 09:48:57 UTC 2010


The odd thing is - it didn't. From my test, it failed back, re-promoted
NodeA to be the DRBD master and failed all grouped resources back too.

Everything was working with the ~7GB of data I had put onto NodeB while
NodeA was down, now available on NodeA...

/proc/drbd on the slave said Secondary/Primary UpToDate/Inconsistent
while it was syncing data back - so it was able to mount the
inconsistent data on the primary node and access the files that hadn't
yet sync'd over?! I mounted a 4GB ISO that shouldn't have been able to
be there yet and was able to access data inside it..

Is my understanding of DRBD limited and it's actually able to provide
access to not fully sync'd files over the network link or something?

If so - wow.

I'm confused ;)


-----Original Message-----
From: Menno Luiten [mailto:mluiten at artifix.net] 
Sent: 11 March 2010 19:35
To: pacemaker at oss.clusterlabs.org
Subject: Re: [Pacemaker] DRBD Recovery Policies

Hi Darren,

I believe that this is handled by DRBD by fencing the Master/Slave 
resource during resync using Pacemaker. See 
http://www.drbd.org/users-guide/s-pacemaker-fencing.html. This would 
prevent Node A to promote/start services with outdated data 
(fence-peer), and it would be forced to wait with takeover until the 
resync is completed (after-resync-target).

Regards,
Menno

Op 11-3-2010 15:52, Darren.Mansell at opengi.co.uk schreef:
> I've been reading the DRBD Pacemaker guide on the DRBD.org site and
I'm
> not sure I can find the answer to my question.
>
> Imagine a scenario:
>
> (NodeA
>
> NodeB
>
> Order and group:
>
> M/S DRBD Promote/Demote
>
> FS Mount
>
> Other resource that depends on the F/S mount
>
> DRBD master location score of 100 on NodeA)
>
> NodeA is down, resources failover to NodeB and everything happily runs
> for days. When NodeA is brought back online it isn't treated as
> split-brain as a normal demote/promote would happen. But the data on
> NodeA would be very old and possibly take a long time to sync from
NodeB.
>
> What would happen in this scenario? Would the RA defer the promote
until
> the sync is completed? Would the inability to promote cause the
failback
> to not happen and a resource cleanup is required once the sync has
> completed?
>
> I guess this is really down to how advanced the Linbit DRBD RA is?
>
> Thanks
>
> Darren
>
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker




More information about the Pacemaker mailing list