[Pacemaker] drbd connection

Tue Jun 18 04:35:06 EDT 2013

Am 17.06.2013 19:16, schrieb Digimer:
> On 06/17/2013 12:30 PM, Elmar Marschke wrote:
>>
>> Am 17.06.2013 15:59, schrieb Digimer:
>>> On 06/17/2013 09:53 AM, andreas graeper wrote:
>>>> hi,
>>>> i will not have a stonith-device. i can test for a day a 'expert power
>>>> control 8212', but in the end i will stay without.
>>>
>>> This is an extremely flawed approach. Clustering with shared storage and
>>> without stonith will certainly cause data loss or corruption eventually.
>>> I can not stress this enough.
>>
>> hi all,
>>
>> just an idea, or moreover a question: what about using drbd's abilities
>> to automatically handle split brain situations instead of "real
>> stonithing" ; maybe like this (global_common.conf):
>>
>> handlers {
>>          split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>>          pri-lost-after-sb "/usr/local/sbin/reboot.sh";
>>      }
>>
>>      net {
>>          after-sb-0pri discard-least-changes;
>>          after-sb-1pri call-pri-lost-after-sb;
>>          after-sb-2pri call-pri-lost-after-sb;
>>      }
>>
>> Couldn't this work like a "poor man's stonith device"?
>> (Of course this reboots the whole node with all ressources and discards
>> the node with the least changes, but maybe there are situations where
>> this doesn't matter?)
>>
>> regards
>>
>> Elmar
>
> There are two issues here.
>
> First; Pacemaker/corosync needs fencing anyway, and it has a very large
> array of supported fence devices. These are very well tested in the field.
>
> Second; If you put fencing into DRBD directly, you are duplicating
> effort and configs. The 'crm-fence-peer.sh' script was written to "hook"
> DRBD's fencing into the existing pacemaker fencing. This way, you have
> one place to configure and maintain, rather than two.
>
> Back to this specific case;
>
> Andreas tested by failing corosync. This would trigger pacemaker to see
> the node as failed and try to recover the services on the backup node.
> All of this happens without DRBD directly knowing what was going on. Had
> Andreas configured fencing, as soon as pacemaker called it's fence
> against the peer, it would have shut down and then DRBD would have known
> something was wrong (and block) before a split-brain could occur.
>
> It also would mean that, when pacemaker recovered/promoted the surviving
> node, it would not have happened until the peer was off, also protecting
> against a split-brain.
>

Thanks Digimer for pointing out the differences between these 
approaches... sure a good way to find a cleaner solution.

best regards
e.