[Pacemaker] problem: mysql service duplicated due to existing pid file
Oscar Remírez de Ganuza Satrústegui
oscarrdg at unav.es
Tue Feb 9 10:29:10 UTC 2010
Hello everybody,
We have a pacemaker cluster, active-pasive, with a mysql service as a
cluster resource. We are using pacemaker 1.0.6 + corosync 1.1.2 on
RedHat 5.3.
Everything is working fine, except when we reset (power down and start)
a node (through stonith) and want to run the mysql service again on the
node.
Let me explain the problem:
Pacemaker manage the mysql service through the /etc/init.d service
(lsb:mysql), and usually it can start and stop it ok. But when the
active node is "stonithed", there is left on disk the pid file
(node1.pid), as the service was not stopped properly.
So here comes the problem: when pacemaker tries to start the service
again, it start it twice.
From /var/log/messages:
* The order to start the service is done:
Feb 9 10:45:06 herculespre crmd: [2181]: info: do_lrm_rsc_op:
Performing key=30:119:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_start_0 )
Feb 9 10:45:06 herculespre lrmd: [2176]: info:
rsc:mysql-unav-service:13: start
Feb 9 10:45:06 herculespre lrmd: [7602]: WARN: For LSB init script, no
additional parameters are needed.
Feb 9 10:45:06 herculespre lrmd: [2176]: info: RA output:
(mysql-unav-service:start:stdout) Starting MySQL
eb 9 10:45:06 herculespre lrmd: [2176]: info: RA output:
(mysql-unav-service:start:stdout) [ OK ]
Feb 9 10:45:06 herculespre crmd: [2181]: info: process_lrm_event: LRM
operation mysql-unav-service_start_0 (call=13, rc=0, cib-update=19,
confirmed=true) ok
* But then a monitor operation is performed at the same time and it
detects that it is not running:
Feb 9 10:45:06 herculespre crmd: [2181]: info: do_lrm_rsc_op:
Performing key=31:119:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_monitor_20000 )
Feb 9 10:45:06 herculespre crmd: [2181]: info: process_lrm_event: LRM
operation mysql-unav-service_monitor_20000 (call=14, rc=7,
cib-update=20, confirmed=false) not running
Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_ais_dispatch:
Update relayed from jasonpre
Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_local_callback:
Expanded fail-count-mysql-unav-service=value++ to 1
Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-mysql-unav-service (1)
Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_perform_update:
Sent update 26: fail-count-mysql-unav-service=1
Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_ais_dispatch:
Update relayed from jasonpre
Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-mysql-unav-service
(1265708729)
Feb 9 10:45:06 herculespre attrd: [2179]: info: attrd_perform_update:
Sent update 29: last-failure-mysql-unav-service=1265708729
* So it is decided to stop the resource and start it again!!!:
Feb 9 10:45:07 herculespre crmd: [2181]: info: do_lrm_rsc_op:
Performing key=3:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_stop_0 )
Feb 9 10:45:07 herculespre lrmd: [2176]: info:
rsc:mysql-unav-service:15: stop
Feb 9 10:45:07 herculespre lrmd: [7743]: WARN: For LSB init script, no
additional parameters are needed.
Feb 9 10:45:07 herculespre crmd: [2181]: info: process_lrm_event: LRM
operation mysql-unav-service_monitor_20000 (call=14, status=1,
cib-update=0, confirmed=true) Cancelled
Feb 9 10:45:07 herculespre lrmd: [2176]: info: RA output:
(mysql-unav-service:stop:stdout) MySQL manager or server PID file could
not be found![FAILED]
* But the stop operation failed..
Feb 9 10:45:07 herculespre crmd: [2181]: info: process_lrm_event: LRM
operation mysql-unav-service_stop_0 (call=15, rc=0, cib-update=21,
confirmed=true) ok
Feb 9 10:45:07 herculespre crmd: [2181]: info: do_lrm_rsc_op:
Performing key=30:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_start_0 )
Feb 9 10:45:07 herculespre lrmd: [2176]: info:
rsc:mysql-unav-service:16: start
Feb 9 10:45:07 herculespre lrmd: [7762]: WARN: For LSB init script, no
additional parameters are needed.
Feb 9 10:45:07 herculespre lrmd: [2176]: info: RA output:
(mysql-unav-service:start:stdout) Starting MySQL
Feb 9 10:45:07 herculespre lrmd: [2176]: info: RA output:
(mysql-unav-service:start:stdout) .
Feb 9 10:45:08 herculespre lrmd: [2176]: info: RA output:
(mysql-unav-service:start:stdout) [ OK ]
Feb 9 10:45:08 herculespre lrmd: [2176]: info: RA output:
(mysql-unav-service:start:stdout)
Feb 9 10:45:08 herculespre crmd: [2181]: info: process_lrm_event: LRM
operation mysql-unav-service_start_0 (call=16, rc=0, cib-update=22,
confirmed=true) ok
Feb 9 10:45:08 herculespre crmd: [2181]: info: do_lrm_rsc_op:
Performing key=2:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_monitor_20000 )
Feb 9 10:45:08 herculespre crmd: [2181]: info: process_lrm_event: LRM
operation mysql-unav-service_monitor_20000 (call=17, rc=0,
cib-update=23, confirmed=false) ok
* And the start operation is performed ok
So we have now twice the same service running, and logically this is not
working properly, as the two services try to access the same files and
system ports.
From /var/log/mysql
100209 10:45:07 mysqld_safe Starting mysqld daemon with databases from
/usr/local/etc2/mysql-unav/var
100209 10:45:07 mysqld_safe Starting mysqld daemon with databases from
/usr/local/etc2/mysql-unav/var
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
100209 10:45:07 InnoDB: Retrying to lock the first data file
...
If we start the service manually (/etc/init.d/mysql start) whith the pid
file existing, there is no problem.
So, what it is doing pacemaker wrong?
I reckon that maybe we have some timeouts not properly configured.
Right now we have:
Operations timeout: default-action-timeout="120s"
Mysql Resource Monitor: op monitor interval="20s" timeout="20s"
Is there someone with a similar setup (with mysql or with another
service) and experiencing any problems?
Can you tell me some safe values for the timeouts?
Thank you very much for your help!
Regards,
PS: I did not want to spam the list with more logs or configuration
files, but if they are needed i can attach them, of course.
---
Oscar Remírez de Ganuza
Servicios Informáticos
Universidad de Navarra
Ed. de Derecho, Campus Universitario
31080 Pamplona (Navarra), Spain
tfno: +34 948 425600 Ext. 3130
http://www.unav.es/SI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4422 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100209/0e959452/attachment-0003.bin>
More information about the Pacemaker
mailing list