[Pacemaker] Announce: Making Resource Utilization Dynamic

Wed Jun 12 05:31:43 EDT 2013

Am Mittwoch, 12. Juni 2013, 11:01:18 schrieb Lars Marowsky-Bree:
> On 2013-06-05T20:44:56, Michael Schwartzkopff <misch at clusterbau.com> wrote:
> 
> Hi Michael,
> 
> yes, the idea to make utilization more dynamic was something Andrew and
> I looked into ages ago.
> 
> Especially, there's still the open issue that it somewhat sucks that one
> has to configure them at all. It'd be nice if monitor_0 would "discover"
> the memory/CPU values from the VM (for example) and populate the CIB
> accordingly. And to keep those in-sync.
> 
> Pacemaker is not necessarily the best tool to implement quick reaction
> to changing load, though. The utilization feature is concerned with
> *correctness* first - namely, don't overcommit resources severely, e.g.,
> the case of Xen/VMs in general, don't overcommit physical memory
> (which could even prevent resources from starting at all), or making
> sure there's at least 0.5 CPU cores available per VM, etc.
> 
> Without having the admin having to figure out the node scores
> manually. Ease of configuration and all that.

Well. At the moment I did not find a better solution.

> Some constructive feedback:
> 
> The dampening in your approach isn't sufficient. This could potentially
> cause a reshuffling of resources with every update; even taking into
> account that this is possible using live migration, it's going to be a
> major performance impact.
>
> I think what you want instead are thresholds; only if the resource
> utilization stays above XXX for YYY times, update the CIB, so that the
> service can be moved to a more powerful server. Lower requirements again
> if the system is below a fall-back threshold for a given period. You
> want to minimize movement. And to add a scale factor so you can allow
> for some overcommit if desired. [*]

thresholds a definitely a better approach. Basically I just wanted to start a 
discussion and provide a proof of concept. There is plenty of space for 
improvement. The most important point was that I did not want to keep a 
history record on disk but wanted to calculate the new value from the existing 
value.

> You also want that because you want to avoid needless PE runs. In your
> current example, you're going to cause a PE run for *every* *single*
> monitor operation on *any* VM.

I see. This is a good point. Perhaps it is better to use a more coarse 
parameter i.e. full CPUs and to write only to CIB if the value really changed. 
attrd_updater would be the tool to use.

> And, of course, this should be optional and protected via a
> configuration parameter.

Fully agreed.

> But, the real issue: CPU utilization raising is only a problem if the
> service performance suffers in turn. Basically, you don't want to move
> resources because their CPU utilization rises, but when the performance
> of the services hosted on a node degrade.
> 
> Hence, I'd agree that the dynamic load adjustment best should live
> outside Pacemaker. At the very least, you'd want to synchronize updating
> the load factors of all the VMs at once, so that the PE can shuffle them
> once, not repeatedly.
> 
> While the data gathering (* as outlined above) could happen in the RA, I
> think you need to involve at least something like attrd in dampening
> them. You don't want each RA to implement a threshold/stepping logic
> independently.

The problem ist, that attrd_updater, as far as I know, is not able to write 
utilizations into the CIB.

> Note that all our normal probes - including the nagios ones - are
> concerned with a "healthy"/"failed" dichotomoy only too. They don't
> really offer SLA/response time data, short of 'well duh I timed out'.
> This could be something worth adding to a consolidated framework
> ("yellow" - move me somewhere else, I'm out of resources here). I have
> this impression you'd quickly end up implementing something close to
> heat/openstack then. Not that I'm opposed to that ;-)

Yes. I heard a OpenStack talk the other day. Nice project but the high 
availabilty is missing. Perhaps you could run all the components of OpenStack 
in a pacemaker cluster. Or at least the virtual machines. But up to now I have 
too few knowledge of openstack to think about an integration.

Greetings,

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130612/b06e86b1/attachment-0003.html>