[Pacemaker] long time to start

Andrew Beekhof andrew at beekhof.net
Fri Apr 23 04:02:11 EDT 2010


On Wed, Apr 21, 2010 at 5:07 PM, Schaefer, Diane E
<diane.schaefer at unisys.com> wrote:
>>> Hi,
> Yes, I am saying that if a resource (R1) is taking a long time to start and
> another resource (R2) monitor action returns a not running, it will not be
> restarted until the first stuck resource returns or in my case times out.
> Since the stop action has not been run on R2, crm_mon still says “Started”

Ah! Now I understand.
Yes this is unfortunately the case.

When you're calculating the next transition (ie. in response to a
failure) you really dont want the cluster to be in flux.
So we wait for pending operations to complete before  doing the calculation.

I can see though, that this is a problem in your case.
Perhaps if the timeout is longer than some threshold _and_ the
transition has been cancelled (ie. because  of a failure), then we
dont wait for it to complete.

Could you file an enhancement bug for this please?




More information about the Pacemaker mailing list