[Pacemaker] long time to start

Schaefer, Diane E diane.schaefer at unisys.com
Mon Apr 19 08:29:19 EDT 2010


>> Hi,

>>

>> ? I have a resource that sometimes can take 10 minutes to start after a

>> failure due to log records that need to be sync?d. (my own OCF)? I noticed

>> while the start action was being performed, if other resources in my cluster

>> report a ?not running?, no restart will be attempted until my long running

>> started resource returns.? Meanwhile, the crm_mon ?reports the resources as

>> ?started? eventhough they are not running, and may not be for many minutes.



>Does your RA return from the start action immediately or after the

>sync is complete and the service is truly started?

>It _must_ only do the later.

>Doing the former would explain what you're seeing.


Actually this RA waits for the sync to complete.  If it takes longer than the allotted time-out, Pacemaker SIGTERM/SIGKILLs it.  The issue is if it can never complete in the allotted time frame, my cluster is basically not servicing any other resources that may have failed until this original resource can resolve itself or a failover occurs.

Diane Schaefer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100419/00cb71f7/attachment-0001.html>


More information about the Pacemaker mailing list