[Pacemaker] DRBD monitor time out in high I/O situations
Sebastian Kaps
sebastian.kaps at imail.de
Tue Jul 12 08:37:47 UTC 2011
Hi!
We have set up a 2-node Pacemaker cluster using SLES 11 SP1 +
HA-Extension.
Each machine has two DRBD resources, on is called 'mysql' and the other
'wwwdata'.
The mysql resource has an XFS filesystem; wwwdata is using an OCFS2 1.4
FS.
Our goal is to create an Active/Standby MySQL cluster with the
databases being
on the XFS filesystem. The OCFS2 FS is supposed to store data that is
created by
scripts that access the MySQL server database.
The primitive resources are setup as follows:
----- snip -----
primitive p_controld ocf:pacemaker:controld \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s"
primitive p_drbd_mysql ocf:linbit:drbd \
params drbd_resource="mysql" \
op monitor interval="20" role="Master" timeout="20" \
op monitor interval="30" role="Slave" timeout="20" \
op notify interval="0" timeout="90" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="100s"
primitive p_drbd_wwwdata ocf:linbit:drbd \
params drbd_resource="wwwdata" \
op monitor interval="20" role="Master" timeout="20" \
op monitor interval="30" role="Slave" timeout="20" \
op notify interval="0" timeout="90" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="360s"
primitive p_fs_mysql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/mysql" directory="/data/mysql"
fstype="xfs" options="rw,noatime" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s" \
meta is-managed="true"
primitive p_fs_wwwdata ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/wwwdata" directory="/data/www"
fstype="ocfs2"
options="rw,noatime,noacl,nouser_xattr,commit=30,data=writeback" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="300s"
primitive p_ip_float_cluster ocf:heartbeat:IPaddr2 \
params ip="1.2.3.4" nic="bond0" cidr_netmask="24"
flush_routes="true" \
meta target-role="Started"
primitive p_o2cb ocf:ocfs2:o2cb \
op monitor interval="120s" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s" \
meta target-role="Started"
----- snip -----
The problem with the setup is that the DRBD monitor operation seem to
time out in situations with high I/O load,
triggering a Failover-attempt followed by one node getting STONITH'd
since the file system is still busy running
the operation that caused this in the first place. For example, this is
what happened yesterday when I did a
"chmod -R" on a directory-tree containing about 4.5 million rather
small files on the OCFS2 fs:
----- snip -----
Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_mysql:0:39:
monitor
Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_wwwdata:0:38:
monitor
Jul 11 11:06:29 node01 mysql[6665]: INFO: MySQL monitor succeeded
Jul 11 11:07:37 node01 lrmd: [25011]: WARN: p_drbd_wwwdata:0:monitor
process (PID 6776) timed out (try 1). Killing with signal SIGTERM (15).
Jul 11 11:07:37 node01 lrmd: [25011]: WARN: operation monitor[38] on
ocf::drbd::p_drbd_wwwdata:0 for client 25014, its parameters:
CRM_meta_clone=[0] CRM_meta_role=[Master]
CRM_meta_notify_slave_resource=[ ] CRM_meta_notify_active_resource=[ ]
CRM_meta_notify_demote_uname=[ ] drbd_resource=[wwwdata]
CRM_meta_notify_inactive_resource=[p_drbd_wwwdata:0 p_drbd_wwwdata:1 ]
CRM_meta_master_node_max=[1] CRM_meta_notify_stop_resource=[ ]
CRM_meta_notify_master_resource=[ ] CRM_meta_clone_node_max=[1]
CRM_meta_notify=[true] CRM_meta_notify_demote_resource=[: pid [6776]
timed out
Jul 11 11:07:37 node01 crmd: [25014]: ERROR: process_lrm_event: LRM
operation p_drbd_wwwdata:0_monitor_20000 (38) Timed Out
(timeout=20000ms)
Jul 11 11:07:37 node01 crmd: [25014]: info: process_graph_event:
Detected action p_drbd_wwwdata:0_monitor_20000 from a different
transition: 11 vs. 135
Jul 11 11:07:37 node01 crmd: [25014]: info: abort_transition_graph:
process_graph_event:477 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=p_drbd_wwwdata:0_monitor_20000,
magic=2:-2;15:11:8:6f0304c9-522b-4582-a26b-cffe24afe9e2, cib=0.349.10) :
Old event
Jul 11 11:07:37 node01 crmd: [25014]: WARN: update_failcount: Updating
failcount for p_drbd_wwwdata:0 on node01 after failed monitor: rc=-2
(update=value++, time=1310375257)
----- snip -----
The operation would have taken a few minutes to complete, but shouldn't
have had any
larger impact on the rest of the system. Increasing the monitor timeout
indefinitely
doesn't look like the way to go here.
Is there a way to ensure that the monitor operations return within a
reasonable
time-frame even in high load situations?
Or is there something fundamentally flawed in our setup?
Thanks in advance!
--
Sebastian Kaps
More information about the Pacemaker
mailing list