[Pacemaker] Monitor ops do not get cancelled

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Sep 28 06:10:16 EDT 2010


Hi,

On Tue, Sep 28, 2010 at 11:37:02AM +0200, Andrew Beekhof wrote:
> On Thu, Sep 23, 2010 at 8:49 PM, Phil Armstrong <pma at sgi.com> wrote:
> > I posted earlier asking for help because I had a primitive whose monitor
> > operation was not getting canceled at the time that a manual relocation was
> > performed. I updated pacemaker (as was suggested) to pacemaker-1.1.2-0.6.1
> > which is the latest I could find for an IA64 platform without having to
> > build from source. If anyone knows of a later IA64 binary version I would
> > appreciate that information.
> 
> 1.1.3 came out the other day.
> which distro are you using?
> 
> >
> > The monitor problem persisted after the upgrade, though the error messages I
> > was seeing earlier were no longer present. They were apparently unrelated.
> > Painful trial and error lead me to the conclusion that it was the
> > primitive's start-op timeout and monitor-op start-delay values. When I had
> > these values set at 480s, the monitor-op did not get canceled for a manual
> > relocation and so would get rescheduled after the relocation only to find
> > the resource not operational (it had been relocated) and thus set the
> > fail-count to non-zero, fencing the resource. If I set the values to 240s,
> > everything went smoothly and the monitor-op was canceled.
> >
> > As a test, I changed a different primitive's values to 480s and that
> > primitive then displayed the failing behavior.
> >
> > If anyone knows why this might be the case (perhaps there are rules I am
> > unaware of that prohibit larger values) I would appreciate the information.
> > If not, I guess I should will a bug.
> >
> > Thanks for any help in advance.
> 
> Hmmmm, which version of cluster-glue do you have?
> This sounds like it might be related to
> 
> dejan ()	High: LRM: lrmd: don't allow cancelled operations to get back
> to the repeating op list (lf#2417) CS: fc141b7e1e19 On: 2010-06-10
> 
> which first appeared in cluster-glue 1.0.6 IIRC

Yes, it's in 1.0.6. That looks like the most plausible
explanation.

Thanks,

Dejan

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list