Hello,<br><br>i running on a 2 node cluster with corosync &amp; drbd in active/passive mode for mysql hight availablity.<br><br>The cluster working fine (failover/failback &amp; replication ok), i have no network outage (network is monitored and i&#39;ve not seen any failure) but split-brain occurs very often and i don&#39;t anderstand why, maybe you can help me?<br>

<br>I&#39;m new pacemaker/corosync/DRBD user, so my cluster and drbd configuration are probably not optimal, so if you have any comments, tips or examples I would be very grateful!<br><br>Here is an exemple of corosync log when a split-brain occurs (1 hour log to see before/after split-brain):<br>

<br><a href="http://pastebin.com/3DprkcTA">http://pastebin.com/3DprkcTA</a><br><br>Thank you in advance for any help!<br><br><br>More details about my configuration:<br><br>I have:<br>One prefered &quot;master&quot; node (node1) on a virtual server, and one &quot;slave&quot; node on a physical server.<br>

On each server, <br>eth0 is connected on my main LAN for client/server communication (with cluster VIP)<br>Eth1 is connected on a dedicated Vlan for corosync communication (network: 192.168.3.0 /30)<br>Eth2 is connected on a dedicated Vlan for drbd replication (network: <a href="http://192.168.2.0/30">192.168.2.0/30</a>)<br>

<br>Here is my drbd configuration:<br><br><br>resource drbd-mysql {<br>protocol C;<br>    disk {<br>        on-io-error detach;<br>    }<br>    handlers {<br>        fence-peer &quot;/usr/lib/drbd/crm-fence-peer.sh&quot;;<br>

        after-resync-target &quot;/usr/lib/drbd/crm-unfence-peer.sh&quot;;<br>        split-brain &quot;/usr/lib/drbd/notify-split-brain.sh root&quot;;<br>    }<br>    net {<br>        cram-hmac-alg sha1;<br>        shared-secret &quot;secret&quot;;<br>

        after-sb-0pri discard-younger-primary;<br>        after-sb-1pri discard-secondary;<br>        after-sb-2pri call-pri-lost-after-sb;<br>    }<br>    startup {<br>        wfc-timeout  1;<br>        degr-wfc-timeout 1;<br>

    }<br>    on node1{<br>        device /dev/drbd1;<br>        address <a href="http://192.168.2.1:7801">192.168.2.1:7801</a>;<br>        disk /dev/sdb;<br>        meta-disk internal;<br>    }<br>    on node2 {<br>    device /dev/drbd1;<br>

    address <a href="http://192.168.2.2:7801">192.168.2.2:7801</a>;<br>    disk /dev/sdb;<br>    meta-disk internal;<br>    }<br>}<br><br><br>Here my cluster config:<br><br>node node1 \<br>        attributes standby=&quot;off&quot;<br>

node node2 \<br>        attributes standby=&quot;off&quot;<br>primitive Cluster-VIP ocf:heartbeat:IPaddr2 \<br>        params ip=&quot;10.1.0.130&quot; broadcast=&quot;10.1.7.255&quot; nic=&quot;eth0&quot; cidr_netmask=&quot;21&quot; iflabel=&quot;VIP1&quot; \<br>

        op monitor interval=&quot;10s&quot; timeout=&quot;20s&quot; \<br>        meta is-managed=&quot;true&quot;<br>primitive cluster_status_page ocf:heartbeat:ClusterMon \<br>        params pidfile=&quot;/var/run/crm_mon.pid&quot; htmlfile=&quot;/var/www/html/cluster_status.html&quot; \<br>

        op monitor interval=&quot;4s&quot; timeout=&quot;20s&quot;<br>primitive datavg ocf:heartbeat:LVM \<br>        params volgrpname=&quot;datavg&quot; exclusive=&quot;true&quot; \<br>        op start interval=&quot;0&quot; timeout=&quot;30&quot; \<br>

        op stop interval=&quot;0&quot; timeout=&quot;30&quot;<br>primitive drbd_mysql ocf:linbit:drbd \<br>        params drbd_resource=&quot;drbd-mysql&quot; \<br>        op monitor interval=&quot;15s&quot;<br>primitive fs_mysql ocf:heartbeat:Filesystem \<br>

        params device=&quot;/dev/datavg/data&quot; directory=&quot;/data&quot; fstype=&quot;ext4&quot;<br>primitive mail_alert ocf:heartbeat:MailTo \<br>        params email=&quot;<a href="mailto:myemail@test.com">myemail@test.com</a>&quot; \<br>

        op monitor interval=&quot;10&quot; timeout=&quot;10&quot; depth=&quot;0&quot;<br>primitive mysqld ocf:heartbeat:mysql \<br>        params binary=&quot;/usr/bin/mysqld_safe&quot; config=&quot;/etc/my.cnf&quot; datadir=&quot;/data/mysql/databases&quot; user=&quot;mysql&quot; pid=&quot;/var/run/mysqld/mysqld.pid&quot; socket=&quot;/var/lib/mysql/mysql.sock&quot; test_passwd=&quot;cluster_test&quot; test_table=&quot;Cluster_Test.dbcheck&quot; test_user=&quot;cluster_test&quot; \<br>

        op start interval=&quot;0&quot; timeout=&quot;120&quot; \<br>        op stop interval=&quot;0&quot; timeout=&quot;120&quot; \<br>        op monitor interval=&quot;30s&quot; timeout=&quot;30s&quot; OCF_CHECK_LEVEL=&quot;1&quot; target-role=&quot;Started&quot;<br>

group mysql datavg fs_mysql Cluster-VIP mysqld cluster_status_page mail_alert<br>ms ms_drbd_mysql drbd_mysql \<br>        meta master-max=&quot;1&quot; master-node-max=&quot;1&quot; clone-max=&quot;2&quot; clone-node-max=&quot;1&quot; notify=&quot;true&quot;<br>

location mysql-preferred-node mysql inf: node1<br>colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master<br>order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start<br>property $id=&quot;cib-bootstrap-options&quot; \<br>

        dc-version=&quot;1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558&quot; \<br>        cluster-infrastructure=&quot;openais&quot; \<br>        expected-quorum-votes=&quot;2&quot; \<br>        stonith-enabled=&quot;false&quot; \<br>

        no-quorum-policy=&quot;ignore&quot; \<br>        last-lrm-refresh=&quot;1340701656&quot;<br>rsc_defaults $id=&quot;rsc-options&quot; \<br>        resource-stickiness=&quot;100&quot; \<br>        migration-threshold=&quot;2&quot; \<br>

        failure-timeout=&quot;30s&quot;<br>