[Pacemaker] pacemaker won't start mysql in the second node
Liang.Ma at asc-csa.gc.ca
Liang.Ma at asc-csa.gc.ca
Wed Feb 2 15:05:30 UTC 2011
Hi,
I did what Michael suggested (included below). When there are only ms_drbd_mysql and fs_mysql, no problem to fail over to node 2. Added ip1, it still fail over to arsvr2 fine when I put node 1 (arsvr1) standby. But when I added mysql in group MySQLDB, it behaved exactly the same. fs_mysql started and mounted on arsvr2, and even ip1 started no problem, but mysql failed to start. Crm_mon shows the error
Failed actions: mysql_start_0 (node=arsvr2, call=32, rc=4, status=complete): insufficient privileges
While the cluster log didn't show anything on mysql start.
Also I tried to run resource script mysql under /usr/lib/ocf/resource.d/heartbeat, it started mysql server no problem.
My guess is the problem is somewhere right before pacemaker calls resource mysql. Maybe related to any permission or authentication problem with mysql as commented by Dejan, but I checked the permission on /var/run/mysqld and /var/lib/mysql, they are the same on both nodes. Anything within /var/lib/mysql is shared by drbd partition, which should be identical, right?
Anyone has any ideas which part of mysql server setting's may cause the problem? My.cnf files under /etc/ in both nodes are identical. I found debian.cnf files were different in password field after upgrading. Then I copied the one from arsvr1 to arsvr2.
Thank you for your any help.
Here is the simplified crm configuration.
node $id="bc6bf61d-6b5f-4307-85f3-bf7bb11531bb" arsvr2 \
attributes standby="off"
node $id="bf0e7394-9684-42b9-893b-5a9a6ecddd7e" arsvr1 \
attributes standby="off"
primitive drbd_mysql ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="15s"
primitive fs_mysql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql" fstype="ext4" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="120" \
meta target-role="Started"
primitive ip1 ocf:heartbeat:IPaddr2 \
params ip="10.10.10.193" nic="eth0" \
op monitor interval="5s" \
meta target-role="Started"
primitive ip1arp ocf:heartbeat:SendArp \
params ip="10.10.10.193" nic="eth0" \
meta target-role="Stopped"
primitive mysql ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf"
user="mysql" group="mysql" log="/var/log/mysql.log"
pid="/var/run/mysqld/mysqld.pid" datadir="/var/lib/mysql"
socket="/var/run/mysqld/mysqld.sock" \
op monitor interval="30s" timeout="30s" \
op start interval="15" timeout="120" \
op stop interval="0" timeout="120" \
meta target-role="Started"
group MySQLDB fs_mysql ip1 mysql \
meta target-role="Started"
ms ms_drbd_mysql drbd_mysql \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation fs_on_drbd inf: fs_mysql ms_drbd_mysql:Master colocation mysql_on_drbd inf: MySQLDB fs_mysql order fs-mysql-after-drbd inf: ms_drbd_mysql:promote fs_mysql:start order ip1-after-fs-mysql inf: fs_mysql:start ip1:start order mysql-after-fs-mysql inf: fs_mysql:start mysql:start property $id="cib-bootstrap-options" \
dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
cluster-infrastructure="Heartbeat" \
expected-quorum-votes="1" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
Liang Ma
Contractuel | Consultant | SED Systems Inc.
Ground Systems Analyst
Agence spatiale canadienne | Canadian Space Agency
6767, Route de l'Aéroport, Longueuil (St-Hubert), QC, Canada, J3Y 8Y9
Tél/Tel : (450) 926-5099 | Téléc/Fax: (450) 926-5083
Courriel/E-mail : [liang.ma at space.gc.ca]
Site web/Web site : [www.space.gc.ca ]
Hi,
first of all, you configuration is, well, unconventional. I'd put all the primitives together in one group and any make the group colocated and ordered in respect to the DRBD's. Perhaps it'd be wise to make two groups.
Googeling through the archives of the list I'd bet this error is caused be the crm trying to mount a secondary DRBD. This might happen by some constraints that somehow end up forming a loop.
Could you please start with a very simple setup like:
primitive resDRBD ocf:linbit:drbd params drbd_resource="r0"
primitive resFS ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/mnt" fstype="ext4"
ms msDRBD resDRBD meta notify="true"
collocation col_FS_DRBD inf: resFS:Started msDRBD:Master order ord_DRBD_FS inf: msDRBD:promote resFS:start
If this works try to add a IP-Address as resource and make a group of both
primitives:
primitive resIP ocf:heartbeat:IPaddr2 \
params ip="10.10.10.193" nic="eth0" cidr_netmask="24"
group groupMySQL resFS resIP
Failover still working? What are the constraints now?
Now add the MySQL database to the group:
primitive mysql ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" \
user="mysql" group="mysql" log="/var/log/mysql.log" \
pid="/var/run/mysqld/mysqld.pid" datadir="/var/lib/mysql" \
socket="/var/run/mysqld/mysqld.sock"
edit group groupMySQL
to Add the Mysql server
and so on,
Hope you are successful taking one step after the other.
Greetings,
--
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München
Tel: (0163) 172 50 98
-----Original Message-----
From: Ma, Liang
Sent: January 31, 2011 9:41 AM
To: The Pacemaker cluster resource manager
Subject: RE: [Pacemaker] pacemaker won't start mysql in the second node
Hi,
Thanks for your hints. I went through the cluster logs more carefully. By comparing the logs from the two nodes, the real different is after the line
info: process_lrm_event: LRM operation fs_mysql_start_0
On node arsvr1, after that line we got a confirmation on Action fs_mysql_start_0 as such
info: match_graph_event: Action fs_mysql_start_0 (8) confirmed on arsvr1
and then went on to Initiating action 9: start mysql_start_0 on arsvr1 (local).
However on node arsvr2, we never see the confirmation from Action fs_mysql_start_0. So mysql_start_0 is never called. But the strange thing is, I can see the drbd partition of fs_mysql is properly mounted on arsvr2. Anyone knows what might stop arsvr2 to run that Action fs_mysql_start_0 (8) confirmed?
Thanks in advance.
Here are the logs from the two nodes.
Logs on Node 2:
Jan 28 14:24:23 arsvr2 lrmd: [919]: info: rsc:fs_mysql:229: start Jan 28 14:24:23 arsvr2 Filesystem[1568]: [1596]: INFO: Running start for /dev/drbd/by-res/r0 on /var/lib/mysql Jan 28 14:24:23 arsvr2 lrmd: [919]: info: RA output:
(fs_mysql:start:stderr) FATAL: Module scsi_hostadapter not found.
Jan 28 14:24:23 arsvr2 Filesystem[1568]: [1606]: INFO: Starting filesystem check on /dev/drbd/by-res/r0 Jan 28 14:24:23 arsvr2 lrmd: [919]: info: RA output:
(fs_mysql:start:stdout) fsck from util-linux-ng 2.17.2 Jan 28 14:24:23 arsvr2 lrmd: [919]: info: RA output:
(fs_mysql:start:stdout) /dev/drbd0: clean, 178/3276800 files,
257999/13106791 blocks
Jan 28 14:24:23 arsvr2 crmd: [922]: info: process_lrm_event: LRM operation fs_mysql_start_0 (call=229, rc=0, cib-update=251,
confirmed=true) ok
Jan 28 14:24:46 arsvr2 cib: [918]: info: cib_stats: Processed 149 operations (0.00us average, 0% utilization) in the last 10min
Logs on Node 1:
Jan 28 14:28:58 arsvr1 lrmd: [1065]: info: rsc:fs_mysql:867: start Jan 28 14:28:58 arsvr1 crmd: [1068]: info: te_rsc_command: Initiating action 31: monitor drbd_mysql:1_monitor_15000 on arsvr2 Jan 28 14:28:58 arsvr1 Filesystem[516]: [544]: INFO: Running start for /dev/drbd/by-res/r0 on /var/lib/mysql Jan 28 14:28:58 arsvr1 lrmd: [1065]: info: RA output:
(fs_mysql:start:stderr) FATAL: Module scsi_hostadapter not found.
Jan 28 14:28:58 arsvr1 Filesystem[516]: [554]: INFO: Starting filesystem check on /dev/drbd/by-res/r0 Jan 28 14:28:58 arsvr1 lrmd: [1065]: info: RA output:
(fs_mysql:start:stdout) fsck from util-linux-ng 2.17.2 Jan 28 14:28:58 arsvr1 lrmd: [1065]: info: RA output:
(fs_mysql:start:stdout) /dev/drbd0: clean, 178/3276800 files,
257999/13106791 blocks
Jan 28 14:28:58 arsvr1 crmd: [1068]: info: process_lrm_event: LRM operation fs_mysql_start_0 (call=867, rc=0, cib-update=1650,
confirmed=true) ok
Jan 28 14:28:58 arsvr1 crmd: [1068]: info: match_graph_event: Action fs_mysql_start_0 (8) confirmed on arsvr1 (rc=0)
Jan 28 14:28:58 arsvr1 crmd: [1068]: info: te_rsc_command: Initiating action 9: start mysql_start_0 on arsvr1 (local)
Jan 28 14:28:58 arsvr1 crmd: [1068]: info: do_lrm_rsc_op: Performing key=9:551:0:9c402121-906c-42de-a18a-68deb24208cb op=mysql_start_0 )
Jan 28 14:28:58 arsvr1 lrmd: [1065]: info: rsc:mysql:868: start
Jan 28 14:28:58 arsvr1 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jan 28 14:28:59 arsvr1 crmd: [1068]: info: match_graph_event: Action drbd_mysql:1_monitor_15000 (31) confirmed on arsvr2 (rc=0)
Jan 28 14:29:02 arsvr1 mysql[576]: [728]: INFO: MySQL started
Jan 28 14:29:02 arsvr1 crmd: [1068]: info: process_lrm_event: LRM operation mysql_start_0 (call=868, rc=0, cib-update=1651,
confirmed=true) ok
Jan 28 14:29:02 arsvr1 crmd: [1068]: info: match_graph_event: Action mysql_start_0 (9) confirmed on arsvr1 (rc=0)
Liang Ma
Contractuel | Consultant | SED Systems Inc.
Ground Systems Analyst
Agence spatiale canadienne | Canadian Space Agency
6767, Route de l'Aéroport, Longueuil (St-Hubert), QC, Canada, J3Y 8Y9
Tél/Tel : (450) 926-5099 | Téléc/Fax: (450) 926-5083
Courriel/E-mail : [liang.ma at space.gc.ca]
Site web/Web site : [www.space.gc.ca ]
-----Original Message-----
From: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm]
Sent: January 28, 2011 11:09 AM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] pacemaker won't start mysql in the second node
On Fri, Jan 28, 2011 at 08:50:45AM -0500, Liang.Ma at asc-csa.gc.ca wrote:
> Hi Dejan, thanks for your reply.
>
> That's one of the problem. I don't see any logs in log file /var/log/mysql/error.log.
I meant the cluster logs.
> I checked the permission of directories /var/run/mysqld and /var/log/mysql. In both nodes they are the same as
>
> drwxr-xr-x 2 mysql root 40 2011-01-27 13:50 /var/run/mysqld/
> drwxr-s--- 2 mysql adm 4096 2011-01-27 11:34 /var/log/mysql
>
> By the way, under which user pacemaker runs, root or someone else?
pacemaker is a collection of programs. At any rate, the RA run
as root, but may su to another user (mysql) depending on the
resource configuration.
Thanks,
Dejan
> Liang Ma
> Contractuel | Consultant | SED Systems Inc.
> Ground Systems Analyst
> Agence spatiale canadienne | Canadian Space Agency
> 6767, Route de l'Aéroport, Longueuil (St-Hubert), QC, Canada, J3Y 8Y9
> Tél/Tel : (450) 926-5099 | Téléc/Fax: (450) 926-5083
> Courriel/E-mail : [liang.ma at space.gc.ca]
> Site web/Web site : [www.space.gc.ca ]
>
>
>
>
> -----Original Message-----
> From: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm]
> Sent: January 28, 2011 8:26 AM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] pacemaker won't start mysql in the second node
>
> Hi,
>
> On Thu, Jan 27, 2011 at 11:51:31AM -0500, Liang.Ma at asc-csa.gc.ca wrote:
> >
> >
> > Hi There,
> >
> > I have set up a pair of ha LAMP servers using heartbeat, pacemaker and
> > drbd on Ubuntu 10.04 LTS. Everything works fine until I upgraded
> > mysql-server from 5.1.41-3ubuntu12.6 to 5.1.41-3ubuntu12.9. Now node 1
> > (arsvr1) works still fine, but mysql on node 2 (arsvr2) won't start
> > when I switch arsvr1 standby. The error message shown from "crm
> > status" is
> >
> > Failed actions:
> > mysql_start_0 (node=arsvr2, call=32, rc=4, status=complete):
> > insufficient privileges
> >
> > No errors logged in /var/log/mysql/error.log at all.
>
> I think that you should check directory permissions. The log
> file should give you a hint.
>
> Thanks,
>
> Dejan
>
>
> > drbd mysql partition mounted properly. If I go to
> > /usr/lib/ocf/resource.d/heartbeat and set the OCF_RESKEY parameters, I
> > have no problem to start mysql server by "./mysql start". But the
> > resource mysql won't show up in crm status.
> >
> > So looks somehow pacemaker fail to start resource mysql even before
> > running the resource script.
> >
> > Here is the configuration
> >
> > node $id="bc6bf61d-6b5f-4307-85f3-bf7bb11531bb" arsvr2 \
> > attributes standby="off"
> > node $id="bf0e7394-9684-42b9-893b-5a9a6ecddd7e" arsvr1 \
> > attributes standby="off"
> > primitive apache2 lsb:apache2 \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="120" start-delay="15" \
> > meta target-role="Started"
> > primitive drbd_mysql ocf:linbit:drbd \
> > params drbd_resource="r0" \
> > op monitor interval="15s"
> > primitive drbd_webfs ocf:linbit:drbd \
> > params drbd_resource="r1" \
> > op monitor interval="15s" \
> > op start interval="0" timeout="240" \
> > op stop interval="0" timeout="100"
> > primitive fs_mysql ocf:heartbeat:Filesystem \
> > params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql" fstype="ext4" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="120" \
> > meta target-role="Started"
> > primitive fs_webfs ocf:heartbeat:Filesystem \
> > params device="/dev/drbd/by-res/r1" directory="/srv" fstype="ext4" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="120" \
> > meta target-role="Started"
> > primitive ip1 ocf:heartbeat:IPaddr2 \
> > params ip="10.10.10.193" nic="eth0" \
> > op monitor interval="5s"
> > primitive ip1arp ocf:heartbeat:SendArp \
> > params ip="10.10.10.193" nic="eth0"
> > primitive mysql ocf:heartbeat:mysql \
> > params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf"
> > user="mysql" group="mysql" log="/var/log/mysql.log"
> > pid="/var/run/mysqld/mysqld.pid" datadir="/var/lib/mysql"
> > socket="/var/run/mysqld/mysqld.sock" \
> > op monitor interval="30s" timeout="30s" \
> > op start interval="0" timeout="120" \
> > op stop interval="0" timeout="120" \
> > meta target-role="Started"
> > group MySQLDB fs_mysql mysql \
> > meta target-role="Started"
> > group WebServices ip1 ip1arp fs_webfs apache2 \
> > meta target-role="Started"
> > ms ms_drbd_mysql drbd_mysql \
> > meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > ms ms_drbd_webfs drbd_webfs \
> > meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true" target-role="Started"
> > colocation apache2_with_ip inf: apache2 ip1
> > colocation apache2_with_mysql inf: apache2 ms_drbd_mysql:Master
> > colocation apache2_with_webfs inf: apache2 ms_drbd_webfs:Master
> > colocation fs_on_drbd inf: fs_mysql ms_drbd_mysql:Master
> > colocation ip_with_ip_arp inf: ip1 ip1arp
> > colocation mysql_on_drbd inf: MySQLDB ms_drbd_mysql:Master
> > colocation web_with_mysql inf: MySQLDB WebServices
> > colocation webfs_on_drbd inf: fs_webfs ms_drbd_webfs:Master
> > colocation webfs_with_fs inf: fs_webfs fs_mysql
> > order apache2-after-arp inf: ip1arp:start apache2:start
> > order arp-after-ip inf: ip1:start ip1arp:start
> > order fs-mysql-after-drbd inf: ms_drbd_mysql:promote fs_mysql:start
> > order fs-webfs-after-drbd inf: ms_drbd_webfs:promote fs_webfs:start
> > order ip-after-mysql inf: mysql:start ip1:start
> > order mysql-after-fs-mysql inf: fs_mysql:start mysql:start
> > property $id="cib-bootstrap-options" \
> > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> > cluster-infrastructure="Heartbeat" \
> > expected-quorum-votes="1" \
> > stonith-enabled="false" \
> > no-quorum-policy="ignore"
> > rsc_defaults $id="rsc-options" \
> > resource-stickiness="100"
> >
> > Any help please?
> >
> > Thanks,
> >
> > Liang Ma
> > Contractuel | Consultant | SED Systems Inc.
> > Ground Systems Analyst
> > Agence spatiale canadienne | Canadian Space Agency
> > 6767, Route de l'Aéroport, Longueuil (St-Hubert), QC, Canada, J3Y 8Y9
> > Tél/Tel : (450) 926-5099 | Téléc/Fax: (450) 926-5083
> > Courriel/E-mail : [liang.ma at space.gc.ca]
> > Site web/Web site : [www.space.gc.ca ]
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
More information about the Pacemaker
mailing list