[Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

Andrew Beekhof andrew at beekhof.net
Thu Sep 12 06:56:35 UTC 2013


On 12/09/2013, at 4:44 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:

> On 2013-09-12T14:34:02, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
>>> Well, they're all doing something completely different.
>> No, they're all crude approximations designed to stop the cluster as a whole from using up so much cpu/network/etc that recovery introduces more failures than it resolves.
> 
> OK. Though they do effect the limit on very different levels - which
> sort of makes some sense, because there are limitations on different
> levels, and at best we want to use them all.
> 
>>> The max_children prevent a given node from being overloaded by
>>> concurrent operations.
>> At the expense of introducing other failures... such as "I fired off
>> an action N seconds ago with a timeout < N and still haven't heard
>> back" which was possible if batch-limit and max children were too out
>> of balance.
> 
> Yes. That was very rare, but could happen.
> 
>> Which is why any limiting needs to happen at centrally on the DC.
> 
> On the other hand, the DC cannot possibly limit concurrent monitor
> operations (since it isn't involved). Arguably, for nodes hosting 100+
> resources, there is some value in limiting parallelism on those. But I'd
> be happy if they were smartly staggered.

Yep, the more we can do without pestering the admin the better.

> 
>> As above, the rate limiting needs to happen on the DC which lends
>> itself to being a property of the cib and/or transition graph rather
>> than defined in sysconfig.
> 
> I'd be quite happy with that.
> 
> The most directly equivalent solution would be to number the per-node
> in-flight operations similar to what migration-threshold does. (I think
> we can safely continue to treat all resources as equal to start with.)

Agreed.  Perhaps even repurpose/rename migration-threshold for the task?
Or is this typically set much lower than max children?

> 
> Though the transition from an environment variable to a CIB node
> attribute (inherited from a cluster-property, I assume) is going to suck
> for the upgrade path :-/
> 
> 
> Regards,
>    Lars
> 
> -- 
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130912/3f717263/attachment-0004.sig>


More information about the Pacemaker mailing list