[Pacemaker] Pacemaker not failing over correctly (DRDB/Heartbeat/Pacemaker/Mysql) on Centos 5.5
Brian Cavanagh
brian at designedtoscale.com
Fri Jan 28 19:22:22 UTC 2011
Hi,
I am having this issue where it appears that everything is working
correctly, but when I simulate failure the failover fails to work correctly.
the Migrate command works fine, I can transfer the service, and the error I
get when a node is put into standby or a server goes down is
Any help would be greatly appreciated
Brian Cavanagh
PS Disregard if this double posted
Working fine .
============
Last updated: Fri Jan 28 12:17:24 2011
Stack: Heartbeat
Current DC: mdb4 (050fc65c-29ad-4333-93c4-34d98405b952) - partition with
quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 1 expected votes
2 Resources configured.
============
Online: [ mdb4 mdb3 ]
Master/Slave Set: ms_drbd_mysql
Masters: [ mdb4 ]
Slaves: [ mdb3 ]
Resource Group: mysql
ip1 (ocf::heartbeat:IPaddr2): Started mdb4
ip1arp (ocf::heartbeat:SendArp): Started mdb4
ip2 (ocf::heartbeat:IPaddr2): Started mdb4
ip2arp (ocf::heartbeat:SendArp): Started mdb4
fs_mysql (ocf::heartbeat:Filesystem): Started mdb4
mysqld (ocf::heartbeat:mysql): Started mdb4
Crm resource migrate mysql
============
Last updated: Fri Jan 28 12:18:58 2011
Stack: Heartbeat
Current DC: mdb4 (050fc65c-29ad-4333-93c4-34d98405b952) - partition with
quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 1 expected votes
2 Resources configured.
============
Online: [ mdb4 mdb3 ]
Master/Slave Set: ms_drbd_mysql
Masters: [ mdb3 ]
Slaves: [ mdb4 ]
Resource Group: mysql
ip1 (ocf::heartbeat:IPaddr2): Started mdb3
ip1arp (ocf::heartbeat:SendArp): Started mdb3
ip2 (ocf::heartbeat:IPaddr2): Started mdb3
ip2arp (ocf::heartbeat:SendArp): Started mdb3
fs_mysql (ocf::heartbeat:Filesystem): Started mdb3
mysqld (ocf::heartbeat:mysql): Started mdb3
Crm resource unmove mysql
Crm node standby mdb3
============
Last updated: Fri Jan 28 12:20:40 2011
Stack: Heartbeat
Current DC: mdb4 (050fc65c-29ad-4333-93c4-34d98405b952) - partition with
quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 1 expected votes
2 Resources configured.
============
Node mdb3 (5f4014cd-472e-4ab3-95e3-759152f16f52): standby
Online: [ mdb4 ]
Master/Slave Set: ms_drbd_mysql
drbd_mysql:0 (ocf::linbit:drbd): Slave mdb4 (unmanaged) FAILED
drbd_mysql:1 (ocf::linbit:drbd): Slave mdb3 (unmanaged) FAILED
Failed actions:
drbd_mysql:0_stop_0 (node=mdb4, call=67, rc=6, status=complete): not
configured
drbd_mysql:1_stop_0 (node=mdb3, call=65, rc=6, status=complete): not
configured
Error logs don't say much
Tail n 30 /var/log/messages
Jan 28 12:20:31 mdb3 IPaddr2[9506]: INFO: ip -f inet addr delete
192.168.162.12/17 dev eth0
Jan 28 12:20:31 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation
ip1_stop_0 (call=61, rc=0, cib-update=69, confirmed=true) ok
Jan 28 12:20:32 mdb3 crmd: [2781]: info: do_lrm_rsc_op: Performing
key=13:8:0:dc2c6518-0d45-4ecc-ac70-c7044d59c1c8 op=drbd_mysql:1_demote_0 )
Jan 28 12:20:32 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:62: demote
Jan 28 12:20:32 mdb3 kernel: block drbd0: role( Primary -> Secondary )
Jan 28 12:20:32 mdb3 lrmd: [2778]: info: RA output:
(drbd_mysql:1:demote:stdout)
Jan 28 12:20:32 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation
drbd_mysql:1_demote_0 (call=62, rc=0, cib-update=70, confirmed=true) ok
Jan 28 12:20:34 mdb3 crmd: [2781]: info: do_lrm_rsc_op: Performing
key=69:8:0:dc2c6518-0d45-4ecc-ac70-c7044d59c1c8 op=drbd_mysql:1_notify_0 )
Jan 28 12:20:34 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:63: notify
Jan 28 12:20:34 mdb3 lrmd: [2778]: info: RA output:
(drbd_mysql:1:notify:stdout)
Jan 28 12:20:34 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation
drbd_mysql:1_notify_0 (call=63, rc=0, cib-update=71, confirmed=true) ok
Jan 28 12:20:36 mdb3 crmd: [2781]: info: do_lrm_rsc_op: Performing
key=63:8:0:dc2c6518-0d45-4ecc-ac70-c7044d59c1c8 op=drbd_mysql:1_notify_0 )
Jan 28 12:20:36 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:64: notify
Jan 28 12:20:36 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation
drbd_mysql:1_notify_0 (call=64, rc=0, cib-update=72, confirmed=true) ok
Jan 28 12:20:37 mdb3 crmd: [2781]: info: do_lrm_rsc_op: Performing
key=14:8:0:dc2c6518-0d45-4ecc-ac70-c7044d59c1c8 op=drbd_mysql:1_stop_0 )
Jan 28 12:20:37 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:65: stop
Jan 28 12:20:37 mdb3 drbd[9631]: ERROR: you really should enable notify when
using this RA
Jan 28 12:20:37 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation
drbd_mysql:1_stop_0 (call=65, rc=6, cib-update=73, confirmed=true) not
configured
Jan 28 12:20:39 mdb3 attrd: [2780]: info: attrd_ha_callback: Update relayed
from mdb4
Jan 28 12:20:39 mdb3 attrd: [2780]: info: find_hash_entry: Creating hash
entry for fail-count-drbd_mysql:1
Jan 28 12:20:39 mdb3 attrd: [2780]: info: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-drbd_mysql:1 (INFINITY)
Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_perform_update: Sent update
21: fail-count-drbd_mysql:1=INFINITY
Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_ha_callback: Update relayed
from mdb4
Jan 28 12:20:40 mdb3 attrd: [2780]: info: find_hash_entry: Creating hash
entry for last-failure-drbd_mysql:1
Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-drbd_mysql:1 (1296235239)
Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_perform_update: Sent update
24: last-failure-drbd_mysql:1=1296235239
Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_ha_callback: flush message
from mdb4
Jan 28 12:20:40 mdb3 attrd: [2780]: info: find_hash_entry: Creating hash
entry for fail-count-drbd_mysql:0
Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_ha_callback: flush message
from mdb4
Jan 28 12:20:40 mdb3 attrd: [2780]: info: find_hash_entry: Creating hash
entry for last-failure-drbd_mysql:0
/* configurations */
Crm configure
node $id="050fc65c-29ad-4333-93c4-34d98405b952" mdb4 \
attributes standby="off"
node $id="5f4014cd-472e-4ab3-95e3-759152f16f52" mdb3 \
attributes standby="on"
primitive drbd_mysql ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="15s"
primitive fs_mysql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql"
fstype="ext3" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="120"
primitive ip1 ocf:heartbeat:IPaddr2 \
params ip="192.168.162.12" nic="eth0:0" cidr_netmask="17" \
op monitor interval="5s"
primitive ip1arp ocf:heartbeat:SendArp \
params ip="192.168.162.12" nic="eth0:0"
primitive ip2 ocf:heartbeat:IPaddr2 \
params ip="97.107.136.62" nic="eth0:2" cidr_netmask="24" \
op monitor interval="5s"
primitive ip2arp ocf:heartbeat:SendArp \
params ip="97.107.136.62" nic="eth0:2"
primitive mysqld ocf:heartbeat:mysql \
params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf"
user="mysql" group="mysql" log="/var/log/mysql_safe.log"
pid="/var/lib/mysql/mysqld.pid" datadir="/var/lib/mysql" \
op monitor interval="30s" timeout="30s" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
group mysql ip1 ip1arp ip2 ip2arp fs_mysql mysqld \
meta target-role="Started"
ms ms_drbd_mysql drbd_mysql \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
location cli-standby-mysql mysql \
rule $id="cli-standby-rule-mysql" -inf: #uname eq mdb4
colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
property $id="cib-bootstrap-options" \
dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
cluster-infrastructure="Heartbeat" \
expected-quorum-votes="1" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
/etc/drbd.conf
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C;
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 --
-c 16k";
after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
}
disk {
# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
# no-disk-drain no-md-flushes max-bio-bvecs
}
net {
# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout
max-buffers
# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
}
syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
}
}
resource r0 {
protocol C;
syncer {
rate 4M;
}
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
cram-hmac-alg sha1;
shared-secret "[snip]";
}
on mdb3 {
device /dev/drbd0;
disk /dev/xvdc;
address 192.168.156.171:7788;
meta-disk internal;
}
on mdb4 {
device /dev/drbd0;
disk /dev/xvdc;
address 192.168.140.133:7788;
meta-disk internal;
}
}
/etc/ha.d/ha.cf mdb3
logfile /var/log/heartbeat.log
logfacility local0
keepalive 2
deadtime 15
warntime 5
initdead 120
udpport 694
ucast eth0 173.255.238.128
auto_failback on
node mdb3
node mdb4
use_logd no
crm respawn
/etc/ha.d/ha.cf mdb4
logfile /var/log/heartbeat.log
logfacility local0
keepalive 2
deadtime 15
warntime 5
initdead 120
udpport 694
ucast eth0 173.255.238.191
auto_failback on
node mdb3
node mdb4
use_logd no
crm respawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110128/64635527/attachment-0001.html>
More information about the Pacemaker
mailing list