[Pacemaker] lrmd WARN on high IO load

Wed Aug 11 23:03:24 UTC 2010

Hi,

On Wed, Aug 11, 2010 at 05:17:03PM -0300, Diego Woitasen wrote:
> Hi
> 
> 2010/8/2 Dejan Muhamedagic <dejanmm at fastmail.fm>:
> > Hi,
> >
> > On Mon, Jul 19, 2010 at 07:09:11PM -0300, Diego Woitasen wrote:
> >> 2010/7/16 Diego Woitasen <diego at woitasen.com.ar>:
> >> > Hi,
> >> >  I've installed Heartbeat+Pacemaker (3.0.3 and 1.0.9). I have a
> >> > resource which executes an script to check the service:
> >> >
> >> > primitive kolab_imapd ocf:heartbeat:kolab-service \
> >> >        params service="all" monitor_script="/usr/local/bin/check-imap.py" \
> >> >        meta migration-threshold="3" failure-timeout="300s" is-managed="true" \
> >> >        operations $id="operations-imap" \
> >> >        op monitor interval="20s" timeout="30s" on-fail="restart" \
> >> >        op start interval="0" timeout="120" \
> >> >        op stop interval="0" timeout="120"
> >> >
> >> > I did I/O stress using bonnie++ and I started to see this message:
> >> >
> >> > Jul 16 18:24:38 imapserver lrmd: [4719]: WARN: perform_ra_op: the
> >> > operation operation monitor[21] on ocf::kolab-service::kolab_imapd for
> >> > client 4722, its parameters: CRM_meta_interval=[20000]
> >> > monitor_script=[/usr/local/bin/check-imap.py]
> >> > CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000]
> >> > crm_feature_set=[3.0.1] CRM_meta_name=[monitor] service=[all]  stayed
> >> > in operation list for 32740 ms (longer than 10000 ms)
> >> >
> >> > The problem is that I've got this messages under High I/O without the
> >> > stress testing, for example running backups. If I understand that
> >> > message correctly the monitor operation didn't start, it was waiting
> >> > on some workqueue to start.
> >
> > It was most probably waiting for the previous monitor operation
> > to finish, though that one should have timed out according to
> > your configuration. Or there were at least 4 operations on
> > different resources running on the node. If you expect high load
> > on the server, you should tune timeouts accordingly.
> 
> And what are the correct values for timeout and interval?

Depends on your resources. And the possible load. Perhaps bonnie
is not the right tool to stress the hosts, i.e. I doubt that
you'll run into such a high disk load for such a long period of
time. I don't know what is kolab_imapd and how heavy/deep is the
monitor operation.

> timeout < interval?

The two are independent. The interval countdown starts when the
previous monitor finished.

Thanks,

Dejan

> >
> > Thanks,
> >
> > Dejan
> >
> >> > If I try to execute a command while I'm running the stress it's slow
> >> > (3 seconds aprox.) but it works. For example, I can run "crm configure
> >> > show" and the output appears in 3 o 4 seconds.
> >> >
> >> > The server have 2 quad-core processors, 6 GB of RAM, running RHEL 5.
> >> >
> >> > Regards,
> >> >  Diego
> >> >
> >> > --
> >> > Diego Woitasen
> >> >
> >>
> >>
> >> I've rised the priority of the process to 10 and works now.
> >>
> >> The documentations says that default rtprio is 5. That's wrong it's 1.
> >> At least in my pkgs...
> >>
> >> Regards,
> >>  Diego
> >>
> >> --
> >> Diego Woitasen
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> 
> 
> 
> 
> -- 
> Diego Woitasen
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker