[Pacemaker] [DRBD-user] drbd on heartbeat links

Pavlos Parissis pavlos.parissis at gmail.com
Wed Nov 3 10:14:21 UTC 2010


On 2 November 2010 22:57, Lars Ellenberg <lars.ellenberg at linbit.com> wrote:
> On Tue, Nov 02, 2010 at 10:07:17PM +0100, Pavlos Parissis wrote:
>> On 2 November 2010 16:15, Dan Frincu <dfrincu at streamwide.ro> wrote:
>> > Hi,
>> >
>> > Pavlos Parissis wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am trying to figure out how I can resolve the following scenario
>> >>
>> >> Facts
>> >> 3 nodes
>> >> 2 DRBD ms resource
>> >> 2 group resource
>> >> by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2
>> >> drbd1/group1  can only run on node-01 and node-03
>> >> drbd2/group2  can only run on node-02 and node-03
>> >> DRBD fencing_policy is resource-only [1]
>> >> 2 heartbeat links and one of them used by DRBD communication
>> >>
>> >> Scenario
>> >> 1) node-01 loses both heartbeat links
>> >> 2) DRBD monitor detects first the absence of the drbd communication
>> >> and does resource fencing by add location constraint which prevent
>> >> drbd1 to run on node3
>> >> 3) pacemaker fencing kicks in and kills node-01
>> >>
>> >> due to location constraint created at step 2, drbd1/group1 can run in
>> >> the cluster
>> >>
>> >>
>> >
>> > I don't understand exactly what you mean by this. Resource-only fencing
>> > would create a -inf score on node1 when the node loses the drbd
>> > communication channel (the only one drbd uses),
>> Because node-01 is the primary at the moment of the failure,
>> resource-fencing will create an -inf score for the node-03.
>>
>> > however you could still have
>> > heartbeat communication available via the secondary link, then you shouldn't
>> As I wrote none of the heartbeat links is available.
>> After I sent the mail, I realized that the node-03 will not see
>> location constraint created by node-01 because there no heartbeat
>> communication!
>> Thus I think my scenario has a flaw, since none of the heartbeat links
>> are available on node-01.
>> Resource-fencing from DRBD will be triggered but without any effect
>> and node-03 or node-02 will fence node-01, and node-03 will be become
>> the primary for drbd1
>>
>> > fence the entire node, the resource-only fencing does that for you, the only
>> > thing you need to do is to add the drbd fence handlers in /etc/drbd.conf.
>> >       handlers {
>> >               fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>> >               after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>> >       }
>> >
>> > Is this what you meant?
>>
>> No.
>> Dan thanks for your mail.
>>
>>
>> Since there is a flaw on the scenario let's define a similar scenario.
>>
>> status
>> node-01 primary for drbd1 and group1 runs on it
>> node-02 primary for drbd2 and group2 runs on it
>> node-3 secondary for drbd1 and drbd2
>>
>> 2 heartbeat links, and one of them being used for DRBD communication
>>
>> here is the scenario
>> 1) on node-01 heartbeat link which carries also DRBD communication is lost
>> 2) node-01 does resource-fencing and places score -inf for drbd1 on node-03
>> 3) on node-01 second heartbeat link is lost
>> 4) node-01 will be fenced by one other cluster members
>> 5) drbd1 can't run on node-03 due to location constraint created at step 2
>>
>> The problem here is that location constraint will be active even
>> node-01 is fenced.
>
> Which is good, and intended behaviour, as it protects you from
> going online with stale data (changes between 1) and 4) would be lost).
>
>> Any ideas?
>
> The drbd setting "resource-and-stonith" simply tells DRBD
> that you have stonith configured in your cluster.
> It does not by itself trigger any stonith action.
>
> So if you have stonith enabled, and you want to protect against being
> shot while modifying data, you should say "resource-and-stonith".

I do have stonith enabled in my Cluster, but I don't quite understand
what you have wrote.
The resource-and-stonith setting will add the location constraint as
the fencing resource-only and it will also prevent a node with a role
of primary to be fenced, am I right?
So, what happens when Cluster sends a fence event?

Initially, I thought this setting will trigger a fence event and I
didn't use it because I wanted to avoid a node which have the role of
secondary for drbd1 and the role primary for drbd2
to be fenced because the replication link for drbd1 was lost.

I think I need to experiment with this setting in order to understand it


>
> What exactly do you want to solve?
>
> Either you want to avoid going online with stale data,
> so you place that contraint, or use dopd, or some similar mechanism.
>
> Or you don't care, so you don't use those fencing scripts.
>
> Or you usually are in a situation where you not want to use stale data,
> but suddenly your primary data copy is catastrophically lost, and the
> (slightly?) stale other copy is the best you have.
>
> Then you remove the constraint or force drbd primary, or both.
> This should not be outomated, as it involves knowledge the cluster
> cannot have, thus cannot base decisions on.
>
> So again,
>
> What is it you are trying to solve?

Manually intervention for doing what you wrote on the last paragraph.
Looking at the setting for split-brain I thought that it would be
useful to have something similar for these scenarios.
I have been reading several post related to this topic and the more
posts I read the more I realize that any automatic resolution will
basically abolish the work that have been done on DRBD to avoid data
corruption


Lars, thanks for your mail,
Pavlos



More information about the Pacemaker mailing list