[Pacemaker] Multiple split-brain problem

Wed Jun 27 10:17:37 CEST 2012

Thank for the link emmanuel, it seems to be a solution for my problem, i
will test it!

2012/6/26 emmanuel segura <emi2fast at gmail.com>

> Look here
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html
>
> :-)
>
> 2012/6/26 coma <coma.inf at gmail.com>
>
>>  Hello,
>>
>> i running on a 2 node cluster with corosync & drbd in active/passive mode
>> for mysql hight availablity.
>>
>> The cluster working fine (failover/failback & replication ok), i have no
>> network outage (network is monitored and i've not seen any failure) but
>> split-brain occurs very often and i don't anderstand why, maybe you can
>> help me?
>>
>> I'm new pacemaker/corosync/DRBD user, so my cluster and drbd
>> configuration are probably not optimal, so if you have any comments, tips
>> or examples I would be very grateful!
>>
>> Here is an exemple of corosync log when a split-brain occurs (1 hour log
>> to see before/after split-brain):
>>
>> http://pastebin.com/3DprkcTA
>>
>> Thank you in advance for any help!
>>
>>
>> More details about my configuration:
>>
>> I have:
>> One prefered "master" node (node1) on a virtual server, and one "slave"
>> node on a physical server.
>> On each server,
>> eth0 is connected on my main LAN for client/server communication (with
>> cluster VIP)
>> Eth1 is connected on a dedicated Vlan for corosync communication
>> (network: 192.168.3.0 /30)
>> Eth2 is connected on a dedicated Vlan for drbd replication (network:
>> 192.168.2.0/30)
>>
>> Here is my drbd configuration:
>>
>>
>> resource drbd-mysql {
>> protocol C;
>>     disk {
>>         on-io-error detach;
>>     }
>>     handlers {
>>         fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>>         after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>>         split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>>     }
>>     net {
>>         cram-hmac-alg sha1;
>>         shared-secret "secret";
>>         after-sb-0pri discard-younger-primary;
>>         after-sb-1pri discard-secondary;
>>         after-sb-2pri call-pri-lost-after-sb;
>>     }
>>     startup {
>>         wfc-timeout  1;
>>         degr-wfc-timeout 1;
>>     }
>>     on node1{
>>         device /dev/drbd1;
>>         address 192.168.2.1:7801;
>>         disk /dev/sdb;
>>         meta-disk internal;
>>     }
>>     on node2 {
>>     device /dev/drbd1;
>>     address 192.168.2.2:7801;
>>     disk /dev/sdb;
>>     meta-disk internal;
>>     }
>> }
>>
>>
>> Here my cluster config:
>>
>> node node1 \
>>         attributes standby="off"
>> node node2 \
>>         attributes standby="off"
>> primitive Cluster-VIP ocf:heartbeat:IPaddr2 \
>>         params ip="10.1.0.130" broadcast="10.1.7.255" nic="eth0"
>> cidr_netmask="21" iflabel="VIP1" \
>>         op monitor interval="10s" timeout="20s" \
>>         meta is-managed="true"
>> primitive cluster_status_page ocf:heartbeat:ClusterMon \
>>         params pidfile="/var/run/crm_mon.pid"
>> htmlfile="/var/www/html/cluster_status.html" \
>>         op monitor interval="4s" timeout="20s"
>> primitive datavg ocf:heartbeat:LVM \
>>         params volgrpname="datavg" exclusive="true" \
>>         op start interval="0" timeout="30" \
>>         op stop interval="0" timeout="30"
>> primitive drbd_mysql ocf:linbit:drbd \
>>         params drbd_resource="drbd-mysql" \
>>         op monitor interval="15s"
>> primitive fs_mysql ocf:heartbeat:Filesystem \
>>         params device="/dev/datavg/data" directory="/data" fstype="ext4"
>> primitive mail_alert ocf:heartbeat:MailTo \
>>         params email="myemail at test.com" \
>>         op monitor interval="10" timeout="10" depth="0"
>> primitive mysqld ocf:heartbeat:mysql \
>>         params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
>> datadir="/data/mysql/databases" user="mysql"
>> pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock"
>> test_passwd="cluster_test" test_table="Cluster_Test.dbcheck"
>> test_user="cluster_test" \
>>         op start interval="0" timeout="120" \
>>         op stop interval="0" timeout="120" \
>>         op monitor interval="30s" timeout="30s" OCF_CHECK_LEVEL="1"
>> target-role="Started"
>> group mysql datavg fs_mysql Cluster-VIP mysqld cluster_status_page
>> mail_alert
>> ms ms_drbd_mysql drbd_mysql \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> location mysql-preferred-node mysql inf: node1
>> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
>> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558"
>> \
>>         cluster-infrastructure="openais" \
>>         expected-quorum-votes="2" \
>>         stonith-enabled="false" \
>>         no-quorum-policy="ignore" \
>>         last-lrm-refresh="1340701656"
>> rsc_defaults $id="rsc-options" \
>>         resource-stickiness="100" \
>>         migration-threshold="2" \
>>         failure-timeout="30s"
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120627/66aef739/attachment.html>