[Pacemaker] Multiple split-brain problem

emmanuel segura emi2fast at gmail.com
Tue Jun 26 16:23:45 CEST 2012


Look here
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html

:-)

2012/6/26 coma <coma.inf at gmail.com>

> Hello,
>
> i running on a 2 node cluster with corosync & drbd in active/passive mode
> for mysql hight availablity.
>
> The cluster working fine (failover/failback & replication ok), i have no
> network outage (network is monitored and i've not seen any failure) but
> split-brain occurs very often and i don't anderstand why, maybe you can
> help me?
>
> I'm new pacemaker/corosync/DRBD user, so my cluster and drbd configuration
> are probably not optimal, so if you have any comments, tips or examples I
> would be very grateful!
>
> Here is an exemple of corosync log when a split-brain occurs (1 hour log
> to see before/after split-brain):
>
> http://pastebin.com/3DprkcTA
>
> Thank you in advance for any help!
>
>
> More details about my configuration:
>
> I have:
> One prefered "master" node (node1) on a virtual server, and one "slave"
> node on a physical server.
> On each server,
> eth0 is connected on my main LAN for client/server communication (with
> cluster VIP)
> Eth1 is connected on a dedicated Vlan for corosync communication (network:
> 192.168.3.0 /30)
> Eth2 is connected on a dedicated Vlan for drbd replication (network:
> 192.168.2.0/30)
>
> Here is my drbd configuration:
>
>
> resource drbd-mysql {
> protocol C;
>     disk {
>         on-io-error detach;
>     }
>     handlers {
>         fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>         after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>         split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>     }
>     net {
>         cram-hmac-alg sha1;
>         shared-secret "secret";
>         after-sb-0pri discard-younger-primary;
>         after-sb-1pri discard-secondary;
>         after-sb-2pri call-pri-lost-after-sb;
>     }
>     startup {
>         wfc-timeout  1;
>         degr-wfc-timeout 1;
>     }
>     on node1{
>         device /dev/drbd1;
>         address 192.168.2.1:7801;
>         disk /dev/sdb;
>         meta-disk internal;
>     }
>     on node2 {
>     device /dev/drbd1;
>     address 192.168.2.2:7801;
>     disk /dev/sdb;
>     meta-disk internal;
>     }
> }
>
>
> Here my cluster config:
>
> node node1 \
>         attributes standby="off"
> node node2 \
>         attributes standby="off"
> primitive Cluster-VIP ocf:heartbeat:IPaddr2 \
>         params ip="10.1.0.130" broadcast="10.1.7.255" nic="eth0"
> cidr_netmask="21" iflabel="VIP1" \
>         op monitor interval="10s" timeout="20s" \
>         meta is-managed="true"
> primitive cluster_status_page ocf:heartbeat:ClusterMon \
>         params pidfile="/var/run/crm_mon.pid"
> htmlfile="/var/www/html/cluster_status.html" \
>         op monitor interval="4s" timeout="20s"
> primitive datavg ocf:heartbeat:LVM \
>         params volgrpname="datavg" exclusive="true" \
>         op start interval="0" timeout="30" \
>         op stop interval="0" timeout="30"
> primitive drbd_mysql ocf:linbit:drbd \
>         params drbd_resource="drbd-mysql" \
>         op monitor interval="15s"
> primitive fs_mysql ocf:heartbeat:Filesystem \
>         params device="/dev/datavg/data" directory="/data" fstype="ext4"
> primitive mail_alert ocf:heartbeat:MailTo \
>         params email="myemail at test.com" \
>         op monitor interval="10" timeout="10" depth="0"
> primitive mysqld ocf:heartbeat:mysql \
>         params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
> datadir="/data/mysql/databases" user="mysql"
> pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock"
> test_passwd="cluster_test" test_table="Cluster_Test.dbcheck"
> test_user="cluster_test" \
>         op start interval="0" timeout="120" \
>         op stop interval="0" timeout="120" \
>         op monitor interval="30s" timeout="30s" OCF_CHECK_LEVEL="1"
> target-role="Started"
> group mysql datavg fs_mysql Cluster-VIP mysqld cluster_status_page
> mail_alert
> ms ms_drbd_mysql drbd_mysql \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location mysql-preferred-node mysql inf: node1
> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1340701656"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="100" \
>         migration-threshold="2" \
>         failure-timeout="30s"
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120626/39e4d782/attachment.html>


More information about the Pacemaker mailing list