[Pacemaker] Is IPMI reliable to avoid DRBD SplitBrain?
Xiaomin Zhang
zhangxiaomin at gmail.com
Mon Sep 2 16:20:50 UTC 2013
Hi, Digimer:
Below is the output of drbdadm dump:
# /etc/drbd.conf
common {
protocol C;
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri consensus;
after-sb-2pri disconnect;
cram-hmac-alg sha512;
shared-secret acde;
}
disk {
on-io-error detach;
fencing resource-and-stonith;
}
syncer {
rate 33M;
}
startup {
wfc-timeout 120;
}
handlers {
fence-peer /usr/lib/drbd/crm-fence-peer.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
}
}
# resource r0 on suse4: not ignored, not stacked
resource r0 {
on suse2 {
device /dev/drbd0 minor 0;
disk /dev/sdc1;
address ipv4 XXX:7789;
meta-disk internal;
}
on suse4 {
device /dev/drbd0 minor 0;
disk /dev/sdc1;
address ipv4 YYY:7789;
meta-disk internal;
}
}
And for crm configure, please find below configuration:
primitive drbd1 ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="15s"
primitive fs1 ocf:heartbeat:Filesystem \
op monitor interval="15s" \
params device="/dev/drbd0" directory="/opt/drbd" fstype="ext3" \
meta target-role="Started"
primitive suse2-stonith stonith:external/ipmi \
params hostname="suse2" ipaddr="XXX" userid="admin" passwd="xxx"
interface="lan"
primitive suse4-stonith stonith:external/ipmi \
params hostname="suse4" ipaddr="YYY" userid="admin" passwd="yyy"
interface="lan"
ms ms_drbd1 drbd1 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
location drbd-fence-by-handler-ms_drbd1 ms_drbd1 \
rule $id="drbd-fence-by-handler-rule-ms_drbd1" $role="Master" -inf:
#uname ne suse4
location st-suse2 suse2-stonith -inf: suse2
location st-suse4 suse4-stonith -inf: suse4
colocation fs_on_drbd inf: fs1 ms_drbd1:Master
dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
cluster-infrastructure="openais" \
expected-quorum-votes="3" \
stonith-enabled="true" \
last-lrm-refresh="1378051434"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
I think drbd-fence-by-handler-rule-ms_drbd1 rule is generated by
crm-fence-peer.sh. And this keeps existing as the crm-unfence-peer.sh is
never called since last fail over.
What's wrong with my configuration?
Thanks.
On Mon, Sep 2, 2013 at 9:42 PM, Digimer <lists at alteeve.ca> wrote:
> On 02/09/13 08:55, Xiaomin Zhang wrote:
>
>> Hi, guy:
>> I followed the standard way to enable the IPMI based STONITH for a
>> service which relies on DRBD primary-secondary replication.
>> Besides below pacemaker configuration (of cause, STONITH is enabled for
>> pacemaker):
>>
>> primitive suse2-stonith stonith:external/ipmi \
>> params hostname="suse2" ipaddr="XXX" userid="admin"
>> passwd="xxx" interface="lan"
>> primitive suse4-stonith stonith:external/ipmi \
>> params hostname="suse4" ipaddr="YYY" userid="admin"
>> passwd="yyy" interface="lan"
>> location st-suse2 suse2-stonith -inf: suse2
>> location st-suse4 suse4-stonith -inf: suse4
>>
>> I also use 'resource-and-stonith' as DRBD global configuration.
>> This configuration works for many times with below failure tests:
>> 1. iptables -A INPUT -j DROP
>> 2. echo c > /proc/sysrq-trigger
>> 3. /etc/init.d/network stop
>> 4. reboot
>> The failed node will be power cycled the counterpart by IPMI command.
>> However, I still get DRBD SplitBrain issue for some time. Does that mean
>> IPMI is still not so reliable for DATA integration?
>>
>> And I was also so confused that for many times, crm-unfence-peer.sh. is
>> not called after crm-fence-peer.sh. Does this imply that I have
>> something misconfigured?
>> Your advice is really appreciated.
>> Thanks in advance.
>>
>
> I don't think that using the firewall to block traffic is a good way to
> test. That said, if the failure triggers a reboot, then it's working.
>
> Did you setup the fence-handler in DRBD to use 'crm-fence-peer.sh'?
>
> Please share your 'crm configure show' and 'drbdadm dump'.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130903/9933bd50/attachment.htm>
More information about the Pacemaker
mailing list