[Pacemaker] Announce: Making Resource Utilization Dynamic

Wed Jun 12 11:10:44 EDT 2013

On 2013-06-12T11:31:43, Michael Schwartzkopff <misch at clusterbau.com> wrote:

> > Pacemaker is not necessarily the best tool to implement quick reaction
> > to changing load, though. The utilization feature is concerned with
> > *correctness* first - namely, don't overcommit resources severely, e.g.,
> > the case of Xen/VMs in general, don't overcommit physical memory
> > (which could even prevent resources from starting at all), or making
> > sure there's at least 0.5 CPU cores available per VM, etc.
> > 
> > Without having the admin having to figure out the node scores
> > manually. Ease of configuration and all that.
> 
> Well. At the moment I did not find a better solution.

Well, hard requirements could possibly be populated in monitor_0, or on
the first start - if the resource couldn't start otherwise anyway,
rescheduling it at this point makes sense, and it'd only happen once.

(If added in monitor_0, we'd populate them as part of the probe when the
resource is added for the first time, most likely, which is also OK.)

> > I think what you want instead are thresholds; only if the resource
> > utilization stays above XXX for YYY times, update the CIB, so that the
> > service can be moved to a more powerful server. Lower requirements again
> > if the system is below a fall-back threshold for a given period. You
> > want to minimize movement. And to add a scale factor so you can allow
> > for some overcommit if desired. [*]
> 
> thresholds a definitely a better approach. Basically I just wanted to
> start a discussion and provide a proof of concept. There is plenty of
> space for improvement. The most important point was that I did not
> want to keep a history record on disk but wanted to calculate the new
> value from the existing value.

Sure. But you may need to keep track of the min/max/avg resource
consumption over 5-60 minutes for the table too, and I don't think you
can store that in the CIB.

If you don't want to store that on disk, you need to store it in a
daemon.

> > You also want that because you want to avoid needless PE runs. In your
> > current example, you're going to cause a PE run for *every* *single*
> > monitor operation on *any* VM.

> I see. This is a good point. Perhaps it is better to use a more coarse
> parameter i.e. full CPUs and to write only to CIB if the value really
> changed. 

That's one of the effects that thresholds would provide too. Only if the
deviation from the current assignment is large enough would the value be
adjusted either up or down.

> > While the data gathering (* as outlined above) could happen in the RA, I
> > think you need to involve at least something like attrd in dampening
> > them. You don't want each RA to implement a threshold/stepping logic
> > independently.
> The problem ist, that attrd_updater, as far as I know, is not able to write 
> utilizations into the CIB.

I'm pretty sure that can be fixed and attrd_updater either be enhanced
or a new tool written to handle this. You wanted to get back into
coding, right? ;-)

> Yes. I heard a OpenStack talk the other day. Nice project but the high
> availabilty is missing. Perhaps you could run all the components of
> OpenStack in a pacemaker cluster. Or at least the virtual machines.
> But up to now I have too few knowledge of openstack to think about an
> integration.

OpenStack is trying to gain HA features as well, though they've got a
way to go. There have been many discussions about how to enhance that,
but they only see "Oh, limited to 32 nodes" and no longer want to talk
;-)

Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde