[Pacemaker] Enable remote monitoring

Thu Dec 6 13:28:06 EST 2012

Hi,

On 12/06/12 19:42, Lars Marowsky-Bree wrote:
> On 2012-12-06T22:25:40, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
>> But any failures of the nagios agents would count against the VM's
>> migration-threshold.
>> So if moving were the right thing to do, it would have done it already.
> 
> OK. I think this was due to me still being stuck on the workings of an
> order constraint, but of course if the failures are instead attributed
> to the container, this would happen automatically already. True.
> 
> (Incidentally, I like "attribute", "ascribe" better than "delegate"
> because to me, they better fit what's going on, if we sticked with
> "delegate-failures". Just saying. ;-)
> 
>>> We already have on-fail settings. How would these play together?
>> Good question. My initial thought was that it would be up to on-fail
>> settings in the VM.
> 
> I'd prefer to keep that separate (as proposed below). Because if an
> action of the *VM* really fails, I may want an admin to look into it
> (why could the bloody hypervisor not start/stop it?), which is different
> from restarting the VM if one of the resources within it needs that.
> 
>>> Would it even make sense to have on-fail="restart-container"? (Or a
>>> nicer wording.)
>>>
>>> Hmmm. That might work. We allow a "container" to be specified as a meta
>>> attribute.
>>>
>>> If set, on-fail would default to restart container for most actions. But
>>> admins could actually modify it - say, they might want to set
>>> monitor on-fail="ignore" to just get notified. And when we move forward
>>> to whiteboxes, we could have start/monitor/promote/demote
>>> on-fail="restart" (like now) and stop on-fail="restart-container".
>>>
>>> That appears reasonably neat?
>> It does actually.
>> I wasn't originally thinking it was necessary but it makes sense now
>> that you point it out.
> 
> Yes, I think I like this too now.
I like it too. Here comes the drafted code:
https://github.com/gao-yan/pacemaker/commit/4f7b80baa42f3801c1fb8186aef076877f34dfea

It works in my simple test. Although failures of resources hasn't
counted against container's migration-threshold yet, it shows you the
basic idea. I'd appreciate if you can take a look first. It's very
likely I'm really on the right track this time. ;-)

> 
> Uhm. Would "container" imply ordering + colocation, or would we still
> need them grouped (resource_set'ed, whatever)?
> 
> My, design is hard. ;-)
:-)

Regards,
  Gao,Yan
-- 
Gao,Yan <ygao at suse.com>
Software Engineer
China Server Team, SUSE.