[Pacemaker] Upgraded mysql from 5.0 to 5.1 - And changed to OCF RA
Jake Bogie
jbogie at SureSource.com
Thu Jul 8 12:53:40 UTC 2010
Dan,
THANK YOU!
It's working!!
- Jake
From: Dan Frincu [mailto:dfrincu at streamwide.ro]
Sent: Thursday, July 08, 2010 4:20 AM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Upgraded mysql from 5.0 to 5.1 - And changed to
OCF RA
I think I didn't explain enough of the config, therefore the confusion.
There are 2 ways the script /usr/lib/ocf/resource.d/heartbeat/mysql is
called. First is by the cluster resource manager, Pacemaker, and second
is manually by you.
When Pacemaker calls the mysql script, it goes through this code:
#######################################################################
# Initialization:
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
#######################################################################
Which (as it says) initializes some environment variables, then it can
work with the mysql RA (Resource Agent).
When you run the script, manually, you don't have to add the export
commands to the script, that _could_ mess it up when Pacemaker calls it,
you run the export commands from the shell, creating (in the current
shell) the environment that you need to test the mysql script. So,
remove the export lines from the script, as I can see from your output,
it seems that the script works OK with the mysql server.
But, when running a resource manually, either via mysql RA script or via
LSB init script, it doesn't mean Pacemaker is aware of this, therefore
when running "crm status", the resource doesn't show up. You check the
script manually to see if there are any issues in running it, then you
use the cluster resource manager to start the resource and check it from
"crm status".
I've also previously said "Then take step by step each action and check
it's exit code, see if it matches the OCF RA specification, and also
check to see if it actually starts the resource or not". The
specification draft for the RA exit codes can be found at
http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.tx
t?rev=HEAD
How to test them is simple, follow the
http://www.linux-ha.org/LSBResourceAgent guideline, just reference the
exit codes from the first link.
>From the logs, I see that the mysql-server primitive returned rc=6 (exit
6), which means "program is not configured".
Jul 7 11:47:58 qad01 pengine: [4359]: ERROR: unpack_rsc_op: Hard error
- mysql-server_start_0 failed with rc=6: Preventing mysql-server from
re-starting anywhere in the cluster
Because the mysql-server primitive and/or the mysql RA were not properly
configured at the time the error message was written, it led to:
Jul 7 11:47:58 qad01 pengine: [4359]: WARN: common_apply_stickiness:
Forcing mysql-server away from qad01 after 1000000 failures
(max=1000000)
Therefore, export the variables manually, check the exit codes on each
operation of the script, they should match normal operation as described
in the RA specification draft, if everything is OK, move on the the crm
shell, cleanup the mysql-server resource, cleanup the max failures (the
cluster resource manager keeps track of these, I've had to manually
remove everything from /var/lib/heartbeat/crm/*, haven't found a way to
clear this option, yet, and loaded the saved config with crm load
config.crm to have the cluster functional again).
Also keep in mind that the mysql RA script needs to be the same (as well
as /etc/my.cnf) on all cluster nodes, and even if Pacemaker propagates
the changes to all nodes via multicast, /var/lib/heartbeat/crm/* files
are placed on all cluster nodes as well. And the mysql RA script needs
to work the same way on all nodes.
And I'll stop, I've said enough already.
Cheers.
Jake Bogie wrote:
So I took Dan's advice this time and cleaned up my resource
configuration, updated the script, and verified...however I'm still not
getting the resource online...
[root at qad01 heartbeat]# crm resource start mysql-server
[root at qad01 heartbeat]# crm status
============
Last updated: Wed Jul 7 11:49:20 2010
Stack: openais
Current DC: qad01 - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
3 Resources configured.
============
Online: [ qad02 qad01 ]
Resource Group: mysql
fs_mysql (ocf::heartbeat:Filesystem): Started qad01
ip_mysql (ocf::heartbeat:IPaddr2): Started qad01
Master/Slave Set: ms_drbd_mysql
Masters: [ qad01 ]
Slaves: [ qad02 ]
Failed actions:
mysql-server_start_0 (node=qad01, call=6, rc=6, status=complete):
not configured
[root at qad01 heartbeat]# ./mysql start
mysql[5750]: DEBUG: MySQL is not running
mysql[5750]: DEBUG: MySQL is not running
100707 11:49:55 [Warning] option 'group_concat_max_len': unsigned value
0 adjusted to 4
100707 11:49:55 [Note] Plugin 'FEDERATED' is disabled.
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3
100707 11:49:55 InnoDB: highest supported file format is Barracuda.
100707 11:49:55 InnoDB: Warning: allocated tablespace 1, old maximum
was 0
100707 11:49:55 InnoDB Plugin 1.0.9 started; log sequence number
28732335894
100707 11:49:55 [Note] Event Scheduler: Loaded 0 events
100707 11:49:55 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.48-community' socket: '/var/lib/mysql/mysql.sock' port:
3306 MySQL Community Server (GPL)
mysql[5750]: INFO: MySQL started
[root at qad01 heartbeat]# ./mysql status
[root at qad01 heartbeat]# ./mysql monitor
[root at qad01 heartbeat]# ./mysql validate-all
[root at qad01 heartbeat]# crm status
============
Last updated: Wed Jul 7 11:50:23 2010
Stack: openais
Current DC: qad01 - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
3 Resources configured.
============
Online: [ qad02 qad01 ]
Resource Group: mysql
fs_mysql (ocf::heartbeat:Filesystem): Started qad01
ip_mysql (ocf::heartbeat:IPaddr2): Started qad01
Master/Slave Set: ms_drbd_mysql
Masters: [ qad01 ]
Slaves: [ qad02 ]
Failed actions:
mysql-server_start_0 (node=qad01, call=6, rc=6, status=complete):
not configured
[root at qad01 heartbeat]# ./mysql stop
100707 11:50:31 [Note] /usr/sbin/mysqld: Normal shutdown
./mysql: line 426: (/1000)-5: syntax error: operand expected (error
token is "/1000)-5")
100707 11:50:31 [Note] Event Scheduler: Purging the queue. 0 events
100707 11:50:31 InnoDB: Starting shutdown...
[root at qad01 heartbeat]# 100707 11:50:36 InnoDB: Shutdown completed; log
sequence number 28732335904
100707 11:50:36 [Note] /usr/sbin/mysqld: Shutdown complete
[root at qad01 heartbeat]#
[root at qad01 heartbeat]# crm configure show mysql-server
primitive mysql-server ocf:heartbeat:mysql \
op monitor interval="30s" timeout="30s" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
params binary="/usr/sbin/mysqld" config="/etc/my.cnf"
datadir="/drbd/mysql/data/" user="mysql" group="mysql"
log="/var/log/mysqld.log" pid="/drbd/mysql/data/qadb.pid"
socket="/var/lib/mysql/mysql.sock" test_user="qaclus"
test_passwd="isitup" test_table="cluster_check.connectioncheck" \
meta target-role="Started"
[root at qad01 heartbeat]# cat mysql
#!/bin/sh
#
#
# MySQL
#
# Description: Manages a MySQL database as Linux-HA resource
#
# Author: Alan Robertson : DB2 Script
# Author: Jakub Janczak : Rewrite as MySQL
# Author: Andrew Beekhof : Cleanup and import
# Author: Sebastian Reitenbach : add OpenBSD defaults, more
cleanup
# Author: Narayan Newton : Add Gentoo/Debian defaults
#
# Support: linux-ha at lists.linux-ha.org
# License: GNU General Public License (GPL)
# Copyright: (C) 2002 - 2005 International Business Machines, Inc.
#
# An example usage in /etc/ha.d/haresources:
# node1 10.0.0.170 mysql
#
# See usage() function below for more details...
#
# OCF instance parameters:
# OCF_RESKEY_binary
# OCF_RESKEY_config
# OCF_RESKEY_datadir
# OCF_RESKEY_user
# OCF_RESKEY_group
# OCF_RESKEY_test_table
# OCF_RESKEY_test_user
# OCF_RESKEY_test_passwd
# OCF_RESKEY_enable_creation
# OCF_RESKEY_additional_parameters
# OCF_RESKEY_log
# OCF_RESKEY_pid
# OCF_RESKEY_socket
#######################################################################
# Initialization:
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
#######################################################################
# Added exports
export OCF_ROOT=/usr/lib/ocf/
export OCF_RESKEY_binary_default="/usr/sbin/mysqld"
export OCF_RESKEY_config_default="/etc/my.cnf"
export OCF_RESKEY_datadir_default="/drbd/mysql/data"
export OCF_RESKEY_user_default="mysql"
export OCF_RESKEY_group_default="mysql"
export OCF_RESKEY_log_default="/var/log/mysqld.log"
export OCF_RESKEY_pid_default="/drbd/mysql/data/qadb.pid"
export OCF_RESKEY_socket_default="/var/lib/mysql/mysql.sock"
export OCF_RESKEY_test_user_default="qaclus"
export OCF_RESKEY_test_table_default="cluster_check.connectioncheck"
export OCF_RESKEY_test_passwd_default="isitup"
# Fill in some defaults if no values are specified
HOSTOS=`uname`
if [ "X${HOSTOS}" = "XOpenBSD" ];then
OCF_RESKEY_binary_default="/usr/local/bin/mysqld_safe"
OCF_RESKEY_config_default="/etc/my.cnf"
OCF_RESKEY_datadir_default="/var/mysql"
OCF_RESKEY_user_default="_mysql"
OCF_RESKEY_group_default="_mysql"
OCF_RESKEY_log_default="/var/log/mysqld.log"
OCF_RESKEY_pid_default="/var/mysql/mysqld.pid"
OCF_RESKEY_socket_default="/var/run/mysql/mysql.sock"
OCF_RESKEY_test_user_default="root"
OCF_RESKEY_test_table_default="mysql.user"
OCF_RESKEY_test_passwd_default=""
OCF_RESKEY_enable_creation_default=0
OCF_RESKEY_additional_parameters_default=""
else
OCF_RESKEY_binary_default="/usr/sbin/mysqld"
OCF_RESKEY_config_default="/etc/my.cnf"
OCF_RESKEY_datadir_default="/drbd/mysql/data"
OCF_RESKEY_user_default="mysql"
OCF_RESKEY_group_default="mysql"
OCF_RESKEY_log_default="/var/log/mysqld.log"
OCF_RESKEY_pid_default="/drbd/mysql/data/qadb.pid"
OCF_RESKEY_socket_default="/var/lib/mysql/mysql.sock"
OCF_RESKEY_test_user_default="qaclus"
OCF_RESKEY_test_table_default="cluster_check.connectioncheck"
OCF_RESKEY_test_passwd_default="isitup"
OCF_RESKEY_enable_creation_default=0
OCF_RESKEY_additional_parameters_default=""
Fi
[root at qad01 heartbeat]# cat /var/log/messages | grep mysql-server
Jul 7 11:43:38 qad01 pengine: [4359]: ERROR: unpack_rsc_op: Hard error
- mysql-server_start_0 failed with rc=6: Preventing mysql-server from
re-starting anywhere in the cluster
Jul 7 11:43:38 qad01 pengine: [4359]: WARN: unpack_rsc_op: Processing
failed op mysql-server_start_0 on qad01: not configured (6)
Jul 7 11:43:38 qad01 pengine: [4359]: notice: native_print:
mysql-server (ocf::heartbeat:mysql): Stopped
Jul 7 11:43:38 qad01 pengine: [4359]: info: get_failcount: mysql-server
has failed INFINITY times on qad01
Jul 7 11:43:38 qad01 pengine: [4359]: WARN: common_apply_stickiness:
Forcing mysql-server away from qad01 after 1000000 failures
(max=1000000)
Jul 7 11:43:38 qad01 pengine: [4359]: info: native_color: Resource
mysql-server cannot run anywhere
Jul 7 11:43:38 qad01 pengine: [4359]: notice: LogActions: Leave
resource mysql-server (Stopped)
Jul 7 11:47:58 qad01 crmd: [4360]: info: abort_transition_graph:
te_update_diff:267 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=mysql-server_monitor_0,
magic=0:7;7:0:7:e87a73c4-97b8-4f63-9e69-89ec59fce708, cib=0.287.3) :
Resource op removal
Jul 7 11:47:58 qad01 pengine: [4359]: ERROR: unpack_rsc_op: Hard error
- mysql-server_start_0 failed with rc=6: Preventing mysql-server from
re-starting anywhere in the cluster
Jul 7 11:47:58 qad01 pengine: [4359]: WARN: unpack_rsc_op: Processing
failed op mysql-server_start_0 on qad01: not configured (6)
Jul 7 11:47:58 qad01 pengine: [4359]: notice: native_print:
mysql-server (ocf::heartbeat:mysql): Stopped
Jul 7 11:47:58 qad01 pengine: [4359]: info: get_failcount: mysql-server
has failed INFINITY times on qad01
Jul 7 11:47:58 qad01 pengine: [4359]: WARN: common_apply_stickiness:
Forcing mysql-server away from qad01 after 1000000 failures
(max=1000000)
Jul 7 11:47:58 qad01 pengine: [4359]: info: native_color: Resource
mysql-server cannot run anywhere
Jul 7 11:47:58 qad01 attrd: [4358]: info: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-mysql-server (INFINITY)
Jul 7 11:47:58 qad01 attrd: [4358]: info: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-mysql-server (1278516515)
Jul 7 11:47:58 qad01 pengine: [4359]: notice: LogActions: Leave
resource mysql-server (Stopped)
Jul 7 11:47:58 qad01 crmd: [4360]: info: te_rsc_command: Initiating
action 7: monitor mysql-server_monitor_0 on qad02
Jul 7 11:47:58 qad01 crmd: [4360]: info: match_graph_event: Action
mysql-server_monitor_0 (7) confirmed on qad02 (rc=0)
Jul 7 11:47:58 qad01 pengine: [4359]: ERROR: unpack_rsc_op: Hard error
- mysql-server_start_0 failed with rc=6: Preventing mysql-server from
re-starting anywhere in the cluster
Jul 7 11:47:58 qad01 pengine: [4359]: WARN: unpack_rsc_op: Processing
failed op mysql-server_start_0 on qad01: not configured (6)
Jul 7 11:47:58 qad01 pengine: [4359]: notice: native_print:
mysql-server (ocf::heartbeat:mysql): Stopped
Jul 7 11:47:58 qad01 pengine: [4359]: info: get_failcount: mysql-server
has failed INFINITY times on qad01
Jul 7 11:47:58 qad01 pengine: [4359]: WARN: common_apply_stickiness:
Forcing mysql-server away from qad01 after 1000000 failures
(max=1000000)
Jul 7 11:47:58 qad01 pengine: [4359]: info: native_color: Resource
mysql-server cannot run anywhere
Jul 7 11:47:58 qad01 pengine: [4359]: notice: LogActions: Leave
resource mysql-server (Stopped)
Jul 7 11:48:01 qad01 pengine: [4359]: ERROR: unpack_rsc_op: Hard error
- mysql-server_start_0 failed with rc=6: Preventing mysql-server from
re-starting anywhere in the cluster
Jul 7 11:48:01 qad01 pengine: [4359]: WARN: unpack_rsc_op: Processing
failed op mysql-server_start_0 on qad01: not configured (6)
Jul 7 11:48:01 qad01 pengine: [4359]: notice: native_print:
mysql-server (ocf::heartbeat:mysql): Stopped
Jul 7 11:48:01 qad01 pengine: [4359]: info: get_failcount: mysql-server
has failed INFINITY times on qad01
Jul 7 11:48:01 qad01 pengine: [4359]: WARN: common_apply_stickiness:
Forcing mysql-server away from qad01 after 1000000 failures
(max=1000000)
Jul 7 11:48:01 qad01 pengine: [4359]: info: native_color: Resource
mysql-server cannot run anywhere
Jul 7 11:48:01 qad01 pengine: [4359]: notice: LogActions: Leave
resource mysql-server (Stopped)
Jul 7 11:48:10 qad01 pengine: [4359]: ERROR: unpack_rsc_op: Hard error
- mysql-server_start_0 failed with rc=6: Preventing mysql-server from
re-starting anywhere in the cluster
Jul 7 11:48:10 qad01 pengine: [4359]: WARN: unpack_rsc_op: Processing
failed op mysql-server_start_0 on qad01: not configured (6)
Jul 7 11:48:10 qad01 pengine: [4359]: notice: native_print:
mysql-server (ocf::heartbeat:mysql): Stopped
Jul 7 11:48:10 qad01 pengine: [4359]: info: get_failcount: mysql-server
has failed INFINITY times on qad01
Jul 7 11:48:10 qad01 pengine: [4359]: WARN: common_apply_stickiness:
Forcing mysql-server away from qad01 after 1000000 failures
(max=1000000)
Jul 7 11:48:10 qad01 pengine: [4359]: info: native_color: Resource
mysql-server cannot run anywhere
Jul 7 11:48:10 qad01 pengine: [4359]: notice: LogActions: Leave
resource mysql-server (Stopped)
Jul 7 11:48:11 qad01 pengine: [4359]: ERROR: unpack_rsc_op: Hard error
- mysql-server_start_0 failed with rc=6: Preventing mysql-server from
re-starting anywhere in the cluster
Jul 7 11:48:11 qad01 pengine: [4359]: WARN: unpack_rsc_op: Processing
failed op mysql-server_start_0 on qad01: not configured (6)
Jul 7 11:48:11 qad01 pengine: [4359]: notice: native_print:
mysql-server (ocf::heartbeat:mysql): Stopped
Jul 7 11:48:11 qad01 pengine: [4359]: info: get_failcount: mysql-server
has failed INFINITY times on qad01
Jul 7 11:48:11 qad01 pengine: [4359]: WARN: common_apply_stickiness:
Forcing mysql-server away from qad01 after 1000000 failures
(max=1000000)
Jul 7 11:48:11 qad01 pengine: [4359]: info: native_color: Resource
mysql-server cannot run anywhere
Jul 7 11:48:11 qad01 pengine: [4359]: notice: LogActions: Leave
resource mysql-server (Stopped)
Jul 7 11:48:26 qad01 pengine: [4359]: ERROR: unpack_rsc_op: Hard error
- mysql-server_start_0 failed with rc=6: Preventing mysql-server from
re-starting anywhere in the cluster
Jul 7 11:48:26 qad01 pengine: [4359]: WARN: unpack_rsc_op: Processing
failed op mysql-server_start_0 on qad01: not configured (6)
Jul 7 11:48:26 qad01 pengine: [4359]: notice: native_print:
mysql-server (ocf::heartbeat:mysql): Stopped
Jul 7 11:48:26 qad01 pengine: [4359]: info: get_failcount: mysql-server
has failed INFINITY times on qad01
Jul 7 11:48:26 qad01 pengine: [4359]: WARN: common_apply_stickiness:
Forcing mysql-server away from qad01 after 1000000 failures
(max=1000000)
Jul 7 11:48:26 qad01 pengine: [4359]: info: native_color: Resource
mysql-server cannot run anywhere
Jul 7 11:48:26 qad01 pengine: [4359]: notice: LogActions: Leave
resource mysql-server (Stopped)
___
Message: 7
Date: Wed, 07 Jul 2010 12:55:51 +0300
From: Dan Frincu <dfrincu at streamwide.ro> <mailto:dfrincu at streamwide.ro>
To: The Pacemaker cluster resource manager
<pacemaker at oss.clusterlabs.org>
<mailto:pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] Upgraded mysql from 5.0 to 5.1
Message-ID: <4C344F27.1060707 at streamwide.ro>
<mailto:4C344F27.1060707 at streamwide.ro>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Have you copied twice this line?
socket="/var/lib/mysql/mysql.sock" binary="/usr/sbin/mysqld"
socket="/var/lib/mysql/mysql.sock" binary="/usr/sbin/mysqld"
I think so. Regardless, to test a resource agent manually requires that
you define some variables and then call the script by hand. Also, check
all the actions (start,stop,restart,promote,etc) and their exit codes,
to see if they match the OCF RA specification. Most of the problems that
you will have with a resource agent and it's resource can be found if
you're manually testing the RA script.
Go to /usr/lib/ocf/resource.d/heartbeat/
Open the mysql RA script. Go to line 63 and starting from that line
update the values in the script to match the contents of /etc/my.cnf.
Then update the crm configure for the primitive mysql-server to match as
well.
From what I remember, the values in
OCF_RESKEY_{binary_default,pid_default,socket_default} are wrong in the
RA script vs what's actually installed.
Then "export OCF_ROOT=/usr/lib/ocf/" and all OCF_RESKEY_* with their
defined values, then call the script with no parameters. It should
provide the usage of the script. Then take step by step each action and
check it's exit code, see if it matches the OCF RA specification, and
also check to see if it actually starts the resource or not. The thing
is, once the script works as it should, all the issues have been
resolved, the cluster will work with the mysql-server resource.
Regards,
Dan
Jake Bogie wrote:
So I took Raoul's advice and ditched the lsb:mysql check and
went for
the ocf:heartbeat version however...
I'm getting this now...
What am I missing? I'm having a hard time finding a document on
how to
setup this resource agent.
============
Last updated: Tue Jul 6 12:44:07 2010
Stack: openais
Current DC: qad02 - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
3 Resources configured.
============
Online: [ qad02 qad01 ]
Resource Group: mysql
fs_mysql (ocf::heartbeat:Filesystem): Started qad02
ip_mysql (ocf::heartbeat:IPaddr2): Started qad02
Master/Slave Set: ms_drbd_mysql
Masters: [ qad02 ]
Slaves: [ qad01 ]
Failed actions:
mysql-server_start_0 (node=qad01, call=6, rc=6,
status=complete):
not configured
mysql-server_start_0 (node=qad02, call=33, rc=5,
status=complete):
not installed
###
primitive mysql-server ocf:heartbeat:mysql \
op monitor interval="30s" timeout="30s" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
params config="/etc/my.cnf" datadir="/drbd/mysql/data/"
socket="/var/lib/mysql/mysql.sock" binary="/usr/sbin/mysqld"
socket="/var/lib/mysql/mysql.sock" binary="/usr/sbin/mysqld"
pid="/drbd/mysql/data/mysql.pid" test_passwd="isitup"
test_table="cluster_check.connectioncheck" test_user="qaclus" \
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
r
--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania
E-mail: dfrincu at streamwide.ro
Phone: +40 (0) 21 320 41 24
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100708/11f71b50/attachment-0001.htm>
More information about the Pacemaker
mailing list