[Pacemaker] drbd connection

Digimer lists at alteeve.ca
Mon Jun 17 13:16:33 EDT 2013


On 06/17/2013 12:30 PM, Elmar Marschke wrote:
>
> Am 17.06.2013 15:59, schrieb Digimer:
>> On 06/17/2013 09:53 AM, andreas graeper wrote:
>>> hi,
>>> i will not have a stonith-device. i can test for a day a 'expert power
>>> control 8212', but in the end i will stay without.
>>
>> This is an extremely flawed approach. Clustering with shared storage and
>> without stonith will certainly cause data loss or corruption eventually.
>> I can not stress this enough.
>
> hi all,
>
> just an idea, or moreover a question: what about using drbd's abilities
> to automatically handle split brain situations instead of "real
> stonithing" ; maybe like this (global_common.conf):
>
> handlers {
>          split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>          pri-lost-after-sb "/usr/local/sbin/reboot.sh";
>      }
>
>      net {
>          after-sb-0pri discard-least-changes;
>          after-sb-1pri call-pri-lost-after-sb;
>          after-sb-2pri call-pri-lost-after-sb;
>      }
>
> Couldn't this work like a "poor man's stonith device"?
> (Of course this reboots the whole node with all ressources and discards
> the node with the least changes, but maybe there are situations where
> this doesn't matter?)
>
> regards
>
> Elmar

There are two issues here.

First; Pacemaker/corosync needs fencing anyway, and it has a very large 
array of supported fence devices. These are very well tested in the field.

Second; If you put fencing into DRBD directly, you are duplicating 
effort and configs. The 'crm-fence-peer.sh' script was written to "hook" 
DRBD's fencing into the existing pacemaker fencing. This way, you have 
one place to configure and maintain, rather than two.

Back to this specific case;

Andreas tested by failing corosync. This would trigger pacemaker to see 
the node as failed and try to recover the services on the backup node. 
All of this happens without DRBD directly knowing what was going on. Had 
Andreas configured fencing, as soon as pacemaker called it's fence 
against the peer, it would have shut down and then DRBD would have known 
something was wrong (and block) before a split-brain could occur.

It also would mean that, when pacemaker recovered/promoted the surviving 
node, it would not have happened until the peer was off, also protecting 
against a split-brain.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Pacemaker mailing list