[Pacemaker] Resource capacity limit

Lars Marowsky-Bree lmb at suse.de
Fri Oct 30 08:20:09 EDT 2009


On 2009-10-30T19:41:35, Yan Gao <ygao at novell.com> wrote:

Hi Yan Gao,

excellent!

Before reviewing the code, lets review the interface/configuration
though.

> User case:
> Xen guests have memory requirements; nodes cannot host more guests than
> the node has physical memory installed.
> 
> 
> Configuration example:
> 
> node yingying \
> 	attributes capacity="100"
> primitive dummy0 ocf:heartbeat:Dummy \
> 	meta weight="90" priority="2"
> primitive dummy1 ocf:heartbeat:Dummy \
> 	meta weight="60" priority="1"
> ..
> property $id="cib-bootstrap-options" \
> 	limit-capacity="true"

First, I would prefer not to contaminate the regular node attribute
namespace; the word "capacity" might already be used. Second, the
"weight" is just one dimension, which is somewhat difficult.

I'd propose to introduce a new XML element, "resource_utilization" (name
to be decided ;-) containing a "nvset", and which can be used in a node
element or a resource primitive.

This creates a new namespace, avoiding clashes, and distinguishes the
utilization parameters from the other various attributes.

Further, it trivially allows for several user-defined metrics.

node hex-0 \
	utilization memory="4096" cpu="8"
...
primitive dummy0 ocf:heartbeat:Dummy \
	meta priority="2"
	utilization memory="2048" cpu="2"
primitive dummy1 ocf:heartbeat:Dummy \
	utilization memory="3012"
primitive dummy2 ocf:heartbeat:Dummy \
	utilization cpu="6"

dummy0 + dummy2 could both be placed on hex-0, or dummy1+dummy2, but not
dummy0 + dummy1.

"Placement allowed where none of the utilization parameters would become
negative." (ie, iterate over the utilization attributes specified for
the resource.)

> If we don't want to enable capacity limit. We could set property
> "limit-capacity" to "false", or default it.

Right, a cluster property to globally disable/enable this is a very good
idea.

> I also noticed a likely similar planned feature described in
> http://clusterlabs.org/wiki/Planned_Features
> 
> "Implement adaptive service placement (based on the RAM, CPU etc.
> required by the service and made available by the nodes) "
> 
> Indeed, this try only supports single kind of capacity, and it's not
> adaptive... Do you already have a thorough consideration about this
> feature?

I think this is a two phase feature for the PE: The first phase is what
you propose - make sure we do not overload any given node, basically
implementing hard limits.

The second phase would be for the PE to actually try to "optimize"
placement, and try to solve the constraints imposed by the utilization
versus capacity scores to a) place as many resources as possible
successfully, and b) to either spread them thinly (load distribution) or
condensed (load concentration, think power savings by being able to put
some nodes to sleep).

The first phase should, IMHO, be quite easy to implement. The second one
is significantly more difficult, and we'd need to pull in an
optimization library to solve this for us. It's conceivable that for
this to happen, we'd need to disable the normal "rsc_location" rules
altogether because they'd interfere badly. (And interesting to note that
the rsc_collocation constraints can be mapped into this scheme and
entirely handled by this solver.)

There is the "adaptive" bit, of course, where the utilization of the
resources and the nodes is automatically determined and adjusted based
on utilization monitoring. This is even more challenging and frequently
considered a research problem.


In summary, I think phase one is urgently needed; thankfully, it is
straightforward to solve too, and the admin can influence placement with
priorities and scoring sufficiently to avoid resources being offlined
due to resource collisions too frequently.

Phase two is a "solved problem" from an algorithmic point of view, but
implementing it is probably not quite as trivial. I'd welcome to see
this happening too.

Adaptive placement ... anyone who wants to write a master or phd thesis
around? ;-)


Best,
    Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde





More information about the Pacemaker mailing list