[Pacemaker] 1st monitor is too fast after the start

Pavlos Parissis pavlos.parissis at gmail.com
Tue Oct 12 12:58:33 UTC 2010


Hi,

I noticed a race condition while I was integration an application with
Pacemaker and thought to share with you.

The init script of the application is LSB-compliant and passes the
tests mentioned at the Pacemaker documentation. Moreover, the init
script
uses the supplied functions from the system[1] for starting,stopping
and checking the application.

I observed few times that the monitor action was failing after the
startup of the cluster or the movement of the resource group.
Because it was not happening always and manual start/status was always
working, it was quite tricky and difficult to find out the root cause
of the failure.
After few hours of troubleshooting, I found out that the 1st monitor
action after the start action, was executed too fast for the
application to create the pid file. As result monitor action was
receiving error.

I know it sounds a bit strange but it happened on my systems. The fact
that my systems are basically vmware images on a laptop could have a
relation with the issue.

Nevertheless, I would like to ask if you are thinking to implement an
"init_wait" on 1st monitor action. Could be useful.

To solve my issue I put a sleep after the start of the application in
the init script. This gives enough time for the application to create
its pid file and the 1st monitor doesn't fail.


Cheers,
Pavlos


[1] Cent0S 5.4



More information about the Pacemaker mailing list