[Pacemaker] Critical: Monitor operation of IPaddr2 timing out, taking more than 60s. Fails to recover.
Dejan Muhamedagic
dejanmm at fastmail.fm
Mon Aug 13 16:03:22 UTC 2012
Hi,
On Fri, Aug 10, 2012 at 05:44:47AM +0000, Parshvi wrote:
> T
> Mario Penners <mario.penners at ...> writes:
>
> >
> > Hi Parshvi,
> >
> > just a quick-shot and without analyzing your mail in detail: find
> > attached an edited version of the IPaddr2 RA.
> >
> > I was trying to use the original script a while agho, and basically
> > nothing worked: It did not recognize the link failures (due to the way
> > how the test was implemented it would only work if you have not more
> > than 1 IP per interface), there was no proper support for bonding, the
> > IP addresses would not be shifted ....
> >
> > I did some (very minor) changes to ge the script working for us. Just
> > have a shot at it if you want, maybe replacing the RA will already solve
> > your problem.
> >
> > Cheers,
> > Mario
> >
> > On Thu, 2012-08-09 at 05:44 +0000, Parshvi wrote:
> > > Parshvi <parshvi.17 at ...> writes:
> > >
> > > >
> > > > Hi,
> > > >
> > > > The monitor operation of IPaddr2 rsc agent is timing out.
> > > > Interval: 5s
> > > > Timeout: 60s
> > > > The timeout was increased from an earlier 20s to now 60s. Even then, there
> are
> > > > multiple logs of monitor op. timing out.
> > > >
> > > > 1) What can cause the monitor to take so long ?
> > > > 2) Looking at the pe-input, what contributes to the operation time ? Is it
> > > just
> > > > the exec-time or exec-time + queue-time ?
> > > > 3) Any solution proposed ?
> > > >
> >
>
> Thanks Mario for your input.
>
> The are some more findings:
> 1) The monitor is not timing out in all environments. I have been through some
> of the forum mails, and came across people talking about "heavy load on the
> system" wrt the timeout issue.
> 2) Could somebody explain, what exactly are we referring to when we say "heavy
> load" ? Also, how does it affect an operations execution ?
Heavy load, as in many processes contending for system
resources such as CPU or disk.
> 3) THE OPERATION MONITOR IS TIMING OUT ON OTHER RESOURCES TOO( ALONG WITH
> IPADDR2).
That seems to indicate that indeed there's a load which your
computer cannot sustain. BTW, why uppercase?
> 4) None of these operations were timing out in a local environment.
>
> I added some logging in IPaddr2 resource agent script.
> In func. ip_monitor(),I have printed the date at enter monitor and at exit
> monitor func.
> This is what I observed for :
> interval=5s
> timeout=60s
>
> enter monitor Thu Aug 9 06:26:28 AST 2012
> exit monitor Thu Aug 9 06:26:28 AST 2012
>
> enter monitor Thu Aug 9 06:26:36 AST 2012
> exit monitor Thu Aug 9 06:26:36 AST 2012
>
> [The next monitor was invoked after 71 seconds]
>
> enter monitor Thu Aug 9 06:27:47 AST 2012
> exit monitor Thu Aug 9 06:27:47 AST 2012
>
> enter monitor Thu Aug 9 06:27:52 AST 2012
> exit monitor Thu Aug 9 06:27:52 AST 2012
There's also code preventing more than n (by default 4)
operations running in parallel on a single node. That could be
one explanation of larger intervals between monitors.
Thanks,
Dejan
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list