[Pacemaker] long time to start
Lars Ellenberg
lars.ellenberg at linbit.com
Mon Apr 19 19:39:00 UTC 2010
On Fri, Apr 16, 2010 at 02:28:26PM -0500, Schaefer, Diane E wrote:
> Hi,
> I have a resource that sometimes can take 10 minutes to start after
> a failure due to log records that need to be sync'd. (my own OCF)
>
> I noticed while the start action was being performed, if other
> resources in my cluster report a "not running", no restart will be
> attempted until my long running started resource returns.
>
> Meanwhile, the crm_mon reports the resources as "started"
> eventhough they are not running, and may not be for many minutes.
> Is the lrm process single threaded?
You are saying that while your RA starts (with a long start timeout),
and the start action is not yet complete,
other _independend_ resources are not yet started,
but crm_mon thinks they are running already,
even though "something" (what?) reports "not running" for those?
I think you lost me ;)
please show a "crm configure show"
Can you reproduce this easily?
Can you reproduce this with just a few "Dummy" resources?
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
More information about the Pacemaker
mailing list