[Pacemaker] drbd on heartbeat links

Tue Nov 2 21:07:17 UTC 2010

On 2 November 2010 16:15, Dan Frincu <dfrincu at streamwide.ro> wrote:
> Hi,
>
> Pavlos Parissis wrote:
>>
>> Hi,
>>
>> I am trying to figure out how I can resolve the following scenario
>>
>> Facts
>> 3 nodes
>> 2 DRBD ms resource
>> 2 group resource
>> by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2
>> drbd1/group1  can only run on node-01 and node-03
>> drbd2/group2  can only run on node-02 and node-03
>> DRBD fencing_policy is resource-only [1]
>> 2 heartbeat links and one of them used by DRBD communication
>>
>> Scenario
>> 1) node-01 loses both heartbeat links
>> 2) DRBD monitor detects first the absence of the drbd communication
>> and does resource fencing by add location constraint which prevent
>> drbd1 to run on node3
>> 3) pacemaker fencing kicks in and kills node-01
>>
>> due to location constraint created at step 2, drbd1/group1 can run in
>> the cluster
>>
>>
>
> I don't understand exactly what you mean by this. Resource-only fencing
> would create a -inf score on node1 when the node loses the drbd
> communication channel (the only one drbd uses),
Because node-01 is the primary at the moment of the failure,
resource-fencing will create an -inf score for the node-03.

> however you could still have
> heartbeat communication available via the secondary link, then you shouldn't
As I wrote none of the heartbeat links is available.
After I sent the mail, I realized that the node-03 will not see
location constraint created by node-01 because there no heartbeat
communication!
Thus I think my scenario has a flaw, since none of the heartbeat links
are available on node-01.
Resource-fencing from DRBD will be triggered but without any effect
and node-03 or node-02 will fence node-01, and node-03 will be become
the primary for drbd1

> fence the entire node, the resource-only fencing does that for you, the only
> thing you need to do is to add the drbd fence handlers in /etc/drbd.conf.
>       handlers {
>               fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>               after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>       }
>
> Is this what you meant?

No.
Dan thanks for your mail.

Since there is a flaw on the scenario let's define a similar scenario.

status
node-01 primary for drbd1 and group1 runs on it
node-02 primary for drbd2 and group2 runs on it
node-3 secondary for drbd1 and drbd2

2 heartbeat links, and one of them being used for DRBD communication

here is the scenario
1) on node-01 heartbeat link which carries also DRBD communication is lost
2) node-01 does resource-fencing and places score -inf for drbd1 on node-03
3) on node-01 second heartbeat link is lost
4) node-01 will be fenced by one other cluster members
5) drbd1 can't run on node-03 due to location constraint created at step 2

The problem here is that location constraint will be active even
node-01 is fenced.

Any ideas?

Pavlos

drbd.conf
global {
  usage-count yes;
}
common {
  protocol C;

  syncer {
    csums-alg sha1;
    verify-alg sha1;
    rate 10M;
  }

  net {
    data-integrity-alg sha1;
    max-buffers 20480;
    max-epoch-size 16384;
  }

  disk {
    on-io-error detach;
### Only when DRBD is under cluster ###
    fencing resource-only;
### --- ###
  }

  startup {
    wfc-timeout 60;
    degr-wfc-timeout 30;
    outdated-wfc-timeout 15;
   }

### Only when DRBD is under cluster ###
  handlers {
    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
  }
### --- ###
}

resource drbd_resource_01 {

  on node-01 {
    device    /dev/drbd1;
    disk      /dev/sdb1;
    address   10.10.10.129:7789;
    meta-disk internal;
  }
   on node-03 {
    device    /dev/drbd1;
    disk      /dev/sdb1;
    address   10.10.10.131:7789;
    meta-disk internal;
  }

  syncer {
    cpu-mask 2;
  }
}

resource drbd_resource_02 {

  on node-02 {
    device    /dev/drbd2;
    disk      /dev/sdb1;
    address   10.10.10.130:7790;
    meta-disk internal;
  }
  on node-03 {
    device    /dev/drbd2;
    disk      /dev/sdc1;
    address   10.10.10.131:7790;
    meta-disk internal;
  }

  syncer {
    cpu-mask 1;
  }
}