[Pacemaker] Speeding up startup after migration

Mon Apr 1 13:09:14 EDT 2013

----- Original Message -----
> From: "Vladislav Bogdanov" <bubble at hoster-ok.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Monday, April 1, 2013 10:35:39 AM
> Subject: Re: [Pacemaker] Speeding up startup after migration
> 
> 01.04.2013 17:28, David Vossel пишет:
> > 
> > 
> > 
> > 
> > ----- Original Message -----
> >> From: "Vladislav Bogdanov" <bubble at hoster-ok.com>
> >> To: pacemaker at oss.clusterlabs.org
> >> Sent: Friday, March 29, 2013 2:03:27 AM
> >> Subject: Re: [Pacemaker] Speeding up startup after migration
> >>
> >> 29.03.2013 03:31, Andrew Beekhof wrote:
> >>> On Fri, Mar 29, 2013 at 4:12 AM, Benjamin Kiessling
> >>> <mittagessen at l.unchti.me> wrote:
> >>>> Hi,
> >>>>
> >>>> we've got a small pacemaker cluster running which controls an
> >>>> active/passive router. On this cluster we've got a semi-large (~30)
> >>>> number of primitives which are grouped together. On migration it takes
> >>>> quite a long time until each resource is brought up again because they
> >>>> are started sequentially. Is there a way to speed up the process,
> >>>> ideally to execute these resource agents in parallel? They are fully
> >>>> independent so the order in which they finish is of no concern.
> >>>
> >>> I'm guessing you have them in a group?  "Don't do that" and they will
> >>> fail over in parallel.
> >>
> >> Does current lrmd implementation have batch-limit like cluster-glue's
> >> one had? Can't find where is it.
> > 
> > The batch-limit option is still around, but has nothing to do with
> > the lrmd. It does limit how many resources can execute in parallel, but at
> > the transition engine level rather than the lrmd.
> 
> Yep, I know that option, it was there for a very long time.
> 
> So, if I understand correctly, new lrmd runs as many simultaneous jobs
> as possible. Unfortunately, in some circumstances this would result in
> the high node load and timeouts. Is there a way to some-how limit that load?

Isn't that what the batch-limit option does?  or are you saying you want a batch limit type option that is node specific? Why are you concerned about this behavior living in the LRMD instead of at the transition processing level?

I believe if we do any batch limiting type behavior at the LRMD level we're going to run into problems with the transition timers in the crmd.  The LRMD needs to always perform the actions it is given as soon as possible.

-- Vossel

> > 
> > 
> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_available_cluster_options
> > 
> > -- Vossel
> > 
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>