[Pacemaker] long time to start
Andrew Beekhof
andrew at beekhof.net
Fri Apr 23 08:02:11 UTC 2010
On Wed, Apr 21, 2010 at 5:07 PM, Schaefer, Diane E
<diane.schaefer at unisys.com> wrote:
>>> Hi,
> Yes, I am saying that if a resource (R1) is taking a long time to start and
> another resource (R2) monitor action returns a not running, it will not be
> restarted until the first stuck resource returns or in my case times out.
> Since the stop action has not been run on R2, crm_mon still says “Started”
Ah! Now I understand.
Yes this is unfortunately the case.
When you're calculating the next transition (ie. in response to a
failure) you really dont want the cluster to be in flux.
So we wait for pending operations to complete before doing the calculation.
I can see though, that this is a problem in your case.
Perhaps if the timeout is longer than some threshold _and_ the
transition has been cancelled (ie. because of a failure), then we
dont wait for it to complete.
Could you file an enhancement bug for this please?
More information about the Pacemaker
mailing list