[Pacemaker] problem: mysql service duplicated due to existing pid file

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Feb 9 06:00:01 EST 2010


Hi,

On Tue, Feb 09, 2010 at 11:29:10AM +0100, Oscar Remí­rez de Ganuza Satrústegui wrote:
> Hello everybody,
> 
> We have a pacemaker cluster, active-pasive, with a mysql service as
> a cluster resource. We are using pacemaker 1.0.6 + corosync 1.1.2 on
> RedHat 5.3.
> Everything is working fine, except when we reset (power down and
> start) a node (through stonith) and want to run the mysql service
> again on the node.
> 
> Let me explain the problem:
> Pacemaker manage the mysql service through the /etc/init.d service
> (lsb:mysql), and usually it can start and stop it ok. But when the
> active node is "stonithed", there is left on disk the pid file
> (node1.pid), as the service was not stopped properly.
> So here comes the problem: when pacemaker tries to start the service
> again, it start it twice.

A resource agent must handle that properly.

> From /var/log/messages:
> 
> * The order to start the service is done:
> Feb  9 10:45:06 herculespre crmd: [2181]: info: do_lrm_rsc_op:
> Performing key=30:119:0:6ce08921-3e51-462c-949a-aa540c206e3c
> op=mysql-unav-service_start_0 )
> Feb  9 10:45:06 herculespre lrmd: [2176]: info:
> rsc:mysql-unav-service:13: start
> Feb  9 10:45:06 herculespre lrmd: [7602]: WARN: For LSB init script,
> no additional parameters are needed.
> Feb  9 10:45:06 herculespre lrmd: [2176]: info: RA output:
> (mysql-unav-service:start:stdout) Starting MySQL
> eb  9 10:45:06 herculespre lrmd: [2176]: info: RA output:
> (mysql-unav-service:start:stdout) [  OK  ]
> Feb  9 10:45:06 herculespre crmd: [2181]: info: process_lrm_event:
> LRM operation mysql-unav-service_start_0 (call=13, rc=0,
> cib-update=19, confirmed=true) ok
> 
> * But then a monitor operation is performed at the same time and it
> detects that it is not running:

The resource agent didn't really start mysql.

Please try with the OCF mysql resource agent.

Thanks,

Dejan

> Feb  9 10:45:06 herculespre crmd: [2181]: info: do_lrm_rsc_op:
> Performing key=31:119:0:6ce08921-3e51-462c-949a-aa540c206e3c
> op=mysql-unav-service_monitor_20000 )
> Feb  9 10:45:06 herculespre crmd: [2181]: info: process_lrm_event:
> LRM operation mysql-unav-service_monitor_20000 (call=14, rc=7,
> cib-update=20, confirmed=false) not running
> Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_ais_dispatch:
> Update relayed from jasonpre
> Feb  9 10:45:06 herculespre attrd: [2179]: info:
> attrd_local_callback: Expanded fail-count-mysql-unav-service=value++
> to 1
> Feb  9 10:45:06 herculespre attrd: [2179]: info:
> attrd_trigger_update: Sending flush op to all hosts for:
> fail-count-mysql-unav-service (1)
> Feb  9 10:45:06 herculespre attrd: [2179]: info:
> attrd_perform_update: Sent update 26:
> fail-count-mysql-unav-service=1
> Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_ais_dispatch:
> Update relayed from jasonpre
> Feb  9 10:45:06 herculespre attrd: [2179]: info:
> attrd_trigger_update: Sending flush op to all hosts for:
> last-failure-mysql-unav-service (1265708729)
> Feb  9 10:45:06 herculespre attrd: [2179]: info:
> attrd_perform_update: Sent update 29:
> last-failure-mysql-unav-service=1265708729
> 
> * So it is decided to stop the resource and start it again!!!:
> Feb  9 10:45:07 herculespre crmd: [2181]: info: do_lrm_rsc_op:
> Performing key=3:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
> op=mysql-unav-service_stop_0 )
> Feb  9 10:45:07 herculespre lrmd: [2176]: info:
> rsc:mysql-unav-service:15: stop
> Feb  9 10:45:07 herculespre lrmd: [7743]: WARN: For LSB init script,
> no additional parameters are needed.
> Feb  9 10:45:07 herculespre crmd: [2181]: info: process_lrm_event:
> LRM operation mysql-unav-service_monitor_20000 (call=14, status=1,
> cib-update=0, confirmed=true) Cancelled
> Feb  9 10:45:07 herculespre lrmd: [2176]: info: RA output:
> (mysql-unav-service:stop:stdout) MySQL manager or server PID file
> could not be found![FAILED]
> 
> * But the stop operation failed..
> Feb  9 10:45:07 herculespre crmd: [2181]: info: process_lrm_event:
> LRM operation mysql-unav-service_stop_0 (call=15, rc=0,
> cib-update=21, confirmed=true) ok
> Feb  9 10:45:07 herculespre crmd: [2181]: info: do_lrm_rsc_op:
> Performing key=30:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
> op=mysql-unav-service_start_0 )
> Feb  9 10:45:07 herculespre lrmd: [2176]: info:
> rsc:mysql-unav-service:16: start
> Feb  9 10:45:07 herculespre lrmd: [7762]: WARN: For LSB init script,
> no additional parameters are needed.
> Feb  9 10:45:07 herculespre lrmd: [2176]: info: RA output:
> (mysql-unav-service:start:stdout) Starting MySQL
> Feb  9 10:45:07 herculespre lrmd: [2176]: info: RA output:
> (mysql-unav-service:start:stdout) .
> Feb  9 10:45:08 herculespre lrmd: [2176]: info: RA output:
> (mysql-unav-service:start:stdout) [  OK  ]
> Feb  9 10:45:08 herculespre lrmd: [2176]: info: RA output:
> (mysql-unav-service:start:stdout)
> Feb  9 10:45:08 herculespre crmd: [2181]: info: process_lrm_event:
> LRM operation mysql-unav-service_start_0 (call=16, rc=0,
> cib-update=22, confirmed=true) ok
> Feb  9 10:45:08 herculespre crmd: [2181]: info: do_lrm_rsc_op:
> Performing key=2:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
> op=mysql-unav-service_monitor_20000 )
> Feb  9 10:45:08 herculespre crmd: [2181]: info: process_lrm_event:
> LRM operation mysql-unav-service_monitor_20000 (call=17, rc=0,
> cib-update=23, confirmed=false) ok
> * And the start operation is performed ok
> 
> So we have now twice the same service running, and logically this is
> not working properly, as the two services try to access the same
> files and system ports.
> 
> From /var/log/mysql
> 100209 10:45:07 mysqld_safe Starting mysqld daemon with databases
> from /usr/local/etc2/mysql-unav/var
> 100209 10:45:07 mysqld_safe Starting mysqld daemon with databases
> from /usr/local/etc2/mysql-unav/var
> InnoDB: Unable to lock ./ibdata1, error: 11
> InnoDB: Check that you do not already have another mysqld process
> InnoDB: using the same InnoDB data or log files.
> 100209 10:45:07  InnoDB: Retrying to lock the first data file
> ...
> 
> If we start the service manually (/etc/init.d/mysql start) whith the
> pid file existing, there is no problem.
> So, what it is doing pacemaker wrong?
> 
> I reckon that maybe we have some timeouts not properly configured.
> Right now we have:
> Operations timeout: default-action-timeout="120s"
> Mysql Resource Monitor: op monitor interval="20s" timeout="20s"
> 
> Is there someone with a similar setup (with mysql or with another
> service) and experiencing any problems?
> 
> Can you tell me some safe values for the timeouts?
> 
> Thank you very much for your help!
> 
> Regards,
> 
> PS: I did not want to spam the list with more logs or configuration
> files, but if they are needed i can attach them, of course.
> 
> ---
> Oscar Remírez de Ganuza
> Servicios Informáticos
> Universidad de Navarra
> Ed. de Derecho, Campus Universitario
> 31080 Pamplona (Navarra), Spain
> tfno: +34 948 425600 Ext. 3130
> http://www.unav.es/SI
> 



> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker





More information about the Pacemaker mailing list