[Pacemaker] Enable remote monitoring

Thu Dec 6 18:19:27 EST 2012

----- Original Message -----
> From: "Yan Gao" <ygao at suse.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Thursday, December 6, 2012 12:28:06 PM
> Subject: Re: [Pacemaker] Enable remote monitoring
> 
> Hi,
> 
> On 12/06/12 19:42, Lars Marowsky-Bree wrote:
> > On 2012-12-06T22:25:40, Andrew Beekhof <andrew at beekhof.net> wrote:
> > 
> >> But any failures of the nagios agents would count against the VM's
> >> migration-threshold.
> >> So if moving were the right thing to do, it would have done it
> >> already.
> > 
> > OK. I think this was due to me still being stuck on the workings of
> > an
> > order constraint, but of course if the failures are instead
> > attributed
> > to the container, this would happen automatically already. True.
> > 
> > (Incidentally, I like "attribute", "ascribe" better than "delegate"
> > because to me, they better fit what's going on, if we sticked with
> > "delegate-failures". Just saying. ;-)
> > 
> >>> We already have on-fail settings. How would these play together?
> >> Good question. My initial thought was that it would be up to
> >> on-fail
> >> settings in the VM.
> > 
> > I'd prefer to keep that separate (as proposed below). Because if an
> > action of the *VM* really fails, I may want an admin to look into
> > it
> > (why could the bloody hypervisor not start/stop it?), which is
> > different
> > from restarting the VM if one of the resources within it needs
> > that.
> > 
> >>> Would it even make sense to have on-fail="restart-container"? (Or
> >>> a
> >>> nicer wording.)
> >>>
> >>> Hmmm. That might work. We allow a "container" to be specified as
> >>> a meta
> >>> attribute.
> >>>
> >>> If set, on-fail would default to restart container for most
> >>> actions. But
> >>> admins could actually modify it - say, they might want to set
> >>> monitor on-fail="ignore" to just get notified. And when we move
> >>> forward
> >>> to whiteboxes, we could have start/monitor/promote/demote
> >>> on-fail="restart" (like now) and stop
> >>> on-fail="restart-container".
> >>>
> >>> That appears reasonably neat?
> >> It does actually.
> >> I wasn't originally thinking it was necessary but it makes sense
> >> now
> >> that you point it out.
> > 
> > Yes, I think I like this too now.
> I like it too. Here comes the drafted code:
> https://github.com/gao-yan/pacemaker/commit/4f7b80baa42f3801c1fb8186aef076877f34dfea
> 
> It works in my simple test. Although failures of resources hasn't
> counted against container's migration-threshold yet, it shows you the
> basic idea. I'd appreciate if you can take a look first. It's very
> likely I'm really on the right track this time. ;-)

I've thought about your implementation some more.  Have we discussed the possibility of implicitly setting the order constraint internally when the container attribute is set?  Also, it seems like now that we are mapping a resource to a container resource in the meta-attributes, we could find a shortcut to build the colocation relationship there as well.

What about something like this for the meta-attributes.

container="vm"  --- Internally this means 'on-fail=restart-container' and 'order start vm then start rsc'
with-container="true"  --- this means if container is set, go ahead and colocate this rsc with the container.

With something like the above, we can fully express the container and child relationship without multiple (any) resource and colocation constraint sets.

Anyway, just an idea... I drastically like this container meta-attribute idea and the failure-delagate idea over the restart-origin one now.  restart-origin seemed good at first, but it doesn't really express what we are doing completely, these other ideas seem represent the relationship between the resources better.  Great discussion everyone :)

-- Vossel

> > 
> > Uhm. Would "container" imply ordering + colocation, or would we
> > still
> > need them grouped (resource_set'ed, whatever)?
> > 
> > My, design is hard. ;-)
> :-)
> 
> 
> Regards,
>   Gao,Yan
> --
> Gao,Yan <ygao at suse.com>
> Software Engineer
> China Server Team, SUSE.
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>