[Pacemaker] Load Balancing, Node Scores and Stickiness

Mon Oct 26 06:57:37 EDT 2009

On Fri, Oct 23, 2009 at 2:23 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Fri, Oct 23, 2009 at 10:13 AM, Colin <colin.hch at gmail.com> wrote:
>> On Thu, Oct 22, 2009 at 3:51 PM, Johan Verrept <Johan.Verrept at able.be> wrote:
>>> On Thu, 2009-10-22 at 15:10 +0200, Florian Haas wrote:
>>>> On 10/22/2009 02:37 PM, Andrew Beekhof wrote:
>>>> >> I wondered, does it happen dynamically? If one resource starts using a
>>>> >> lot of resources, are the other migrated to other nodes?
>>>> >
>>>> > Not yet.
>>>> > Such a feature is planned though.
>>>> >
>>>> > At the moment pacemaker purely goes on the number of services it has
>>>> > allocated to the node.
>>>> > Total/Available RAM, CPU, HDD, none of these things are yet taken into account.
>>>>
>>>> Are there any plans on how this feature would look like in more detail?
>>>> A daemon monitoring various performance indicators and updating node
>>>> attributes accordingly? Couldn't that be done today, as a cloneable
>>>> resource agent?
>>>
>>> I can see a few problems with such a feature if you wish to implement it
>>> today.
>>> First of all, you cannot really move services to less loaded nodes if
>>> you cannot determine which resource causes which load. If you pick a
>>> resource at random, you might move a "too heavy" resource to another
>>> less loaded node and cause even more load on that node resulting in
>>> something (else?) being moved back. It will create a pretty unstable
>>> cluster under load.
>>> I am also unsure if it would be wise to mix this directly into the
>>> current node scoring. Load numbers will vary wildly and unless the
>>> resulting attribute values are in some way stabilised over longer
>>> periods, it will also cause unstability. (RRDTool?)
>>> It might be possible, but it will be one hell of a complex RA :). A
>>> daemon might be better, but both will require a LOT of configuration
>>> just to differentiate the load of the different resources.
>>>
>>>> Or are you referring to missing features actually evaluating such
>>>> information, as in, rather than saying "run this resource on a node with
>>>> at load average of X or less", being able to say "run this resource on
>>>> the node with the currently lowest load average"?
>>>
>>> How will that translate into repeatable node states? At this moment, if
>>> you use a timed evaluation of the cluster state, resources should always
>>> be assigned to the same nodes (at least, I've never seen it change
>>> unless it was under direction of a time contraint).
>>>
>>> "run this resource on the node with the currently lowest load average"
>>> is something that is very unlikely to ever return the same answer twice.
>>>
>>> Complex indeed! Someone is going to have a considerable amount of fun
>>> with this :D
>>
>> Perhaps static load balancing could be implemented first, before
>> trying to go dynamic.
>>
>> Suppose you could configure an arbitrary set of "measures" (I'd call
>> them resources, but that word is already taken in this context), like
>> an arbitrary set of keywords. For every such "measure", you can then
>> configure (a) how much of it each node has, and (b) how much of it
>> each cluster resource/service requires. The cluster can then use some
>> heuristics to find a good distribution of resources (perfect could be
>> too hard, this is squarely in NP-complete land; PostgreSQL uses a
>> genetic algorithm for query optimisation...).
>
> Thats basically what we're going for this time around.
> Maybe with enough experience we'll attempt the dynamic version below.
>
>> This is not as good as dynamic balancing, but still better than
>> nothing, for example this could make sure that a resource with a
>> tendency to do I/O runs on the same node with a resource that
>> generally uses much CPU...

The algorithms should be the same, the difference is whether you run
them once on statically configured resource usage, or continuously on
dynamically gathered resource consumption (or a weighted average of
the two). Probably a good idea to test with static input first...

Colin