[Pacemaker] DRBD-MYSQL Pacemaker-Corosync won't failover when heartbeat cable pulled.

Fri Oct 28 14:28:19 UTC 2011

Hello,

On 10/28/2011 04:06 PM, Joe wrote:
> Hello everyone,
> 
> My goal is to build a HA DRBD and MYSQL on two nodes(active/passive).  I
> followed the " cluster from scratch" article to build this environment. 
> If I do standby failover host, it works fine but when I pull the
> heartbeat cable from the active node, the resources do not fail over to
> the secondary. Please advise. Thank you very much. Joe

* "the" heartbeat cable == one heartbeat cable? Not supported setup --
use at least two heartbeat channels, add at least the DRBD replication
network.

* you produced a split brain situation. I'd expect you see errors in
your cluster status and see DRBD log entries with a wording similar to
"not allowed to become primary because allow-two-primaries is not set"
... if you had also pulled DRBD link you would have all resources
running twice

* use stonith to recover from a split-brain situation (or block if its
unsuccessful)

* read more manuals ;-)

> 
> *CENTOS 5.6/drbd8.3/corosync/pacemaker
> 
> node-0 IP: 192.168.1.101 (heartbeat) 10.0.0.10 (drdb) 
> node-1 IP: 192.168.1.102 (heartbeat) 10.0.0.20 ( drbd)
> Cluster Virtual IP: 192.168.1.160
> 
> * _crm configure show_
> node $id="2b68511d-b96f-4b56-9f66-70262e3e2c46" mysqldrbd01 \
>     attributes standby="off"
> node $id="d86dc58b-2309-43d9-af96-6519127e83d7" mysqldrbd02 \
>     attributes standby="off"

These ids are from Heartbeat CCM ... but you attached a Corosync
configuration? Decide for one ... either Heartbeat or Corosync.

Regards,
Andreas

-- 
Need help with Pacemaker or DRBD?
http://www.hastexo.com/now

> primitive res_Filesystem_QD_FS_DRBD ocf:heartbeat:Filesystem \
>     params device="/dev/drbd0" directory="/replication/" fstype="ext3" \
>     operations $id="res_Filesystem_QD_FS_DRBD-operations" \
>     op start interval="0" timeout="60" \
>     op stop interval="0" timeout="60" \
>     op monitor interval="20" timeout="40" start-delay="0" \
>     op notify interval="0" timeout="60" \
>     meta target-role="started"
> primitive res_IPaddr2_QD_IP_CLUSTER ocf:heartbeat:IPaddr2 \
>     params ip="192.168.1.160" \
>     operations $id="res_IPaddr2_QD_IP_CLUSTER-operations" \
>     op start interval="0" timeout="20" \
>     op stop interval="0" timeout="20" \
>     op monitor interval="10" timeout="20" start-delay="0" \
>     meta target-role="started"
> primitive res_drbd_1 ocf:linbit:drbd \
>     params drbd_resource="repdata" \
>     operations $id="res_drbd_1-operations" \
>     op start interval="0" timeout="240" \
>     op promote interval="0" timeout="90" \
>     op demote interval="0" timeout="90" \
>     op stop interval="0" timeout="100" \
>     op monitor interval="10" timeout="20" start-delay="1min" \
>     op notify interval="0" timeout="90" \
>     meta target-role="started"
> primitive res_mysqld_QD_SQL_SERVICE lsb:mysqld \
>     operations $id="res_mysqld_QD_SQL_SERVICE-operations" \
>     op start interval="0" timeout="15" \
>     op stop interval="0" timeout="15" \
>     op monitor interval="15" timeout="15" start-delay="15" \
>     meta target-role="started"
> group QD_GROUP res_Filesystem_QD_FS_DRBD res_IPaddr2_QD_IP_CLUSTER
> res_mysqld_QD_SQL_SERVICE \
>     meta target-role="started"
> ms ms_drbd_1 res_drbd_1 \
>     meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> colocation QD_MYSQL_DRBD inf: QD_GROUP ms_drbd_1:Master
> order QD_MYSQL_AFTER_DRBD inf: ms_drbd_1:promote QD_GROUP:start
> property $id="cib-bootstrap-options" \
>     dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
>     cluster-infrastructure="Heartbeat" \
>     no-quorum-policy="ignore" \
>     stonith-enabled="false" \
>     last-lrm-refresh="980666220" \
>     expected-quorum-votes="2"
> rsc_defaults $id="rsc-options" \
>     resource-stickiness="100"
> 
> *_/etc/corosync/corosync.conf_*
> ## generated by drbd-gui 0.9.9
> 
> aisexec {
>         user: root
>         group: root
> }
> 
> corosync {
>         user: root
>         group: root
> }
> 
> amf {
>         mode: disabled
> }
> 
> logging {
>         to_stderr: yes
>         debug: off
>         timestamp: on
>         to_file: no
>         to_syslog: yes
>         syslog_facility: daemon
> }
> 
> totem {
>         version: 2
>         token: 3000
>         token_retransmits_before_loss_const: 10
>         join: 60
>         consensus: 4000
>         vsftype: none
>         max_messages: 20
>         clear_node_high_bit: yes
>         secauth: on
>         threads: 0
>         # nodeid: 1234
>         rrp_mode: active
> 
> #       interface {
> #               ringnumber: 0
> #               bindnetaddr: 10.0.0.0
> #               mcastaddr: 226.94.1.1
> #               mcastport: 5405
> #       }
> 
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 192.168.1.0
>                 mcastaddr: 226.94.1.1
>                 mcastport: 5405
>         }
> }
> 
> service {
>         ver: 0
>         name: pacemaker
>         use_mgmtd: no
> }
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111028/259fcd9c/attachment-0004.sig>