[Pacemaker] Enable remote monitoring

Thu Dec 6 13:42:12 EST 2012


----- Original Message -----
> From: "Yan Gao" <ygao at suse.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Thursday, December 6, 2012 12:28:06 PM
> Subject: Re: [Pacemaker] Enable remote monitoring
> 
> Hi,
> 
> On 12/06/12 19:42, Lars Marowsky-Bree wrote:
> > On 2012-12-06T22:25:40, Andrew Beekhof <andrew at beekhof.net> wrote:
> > 
> >> But any failures of the nagios agents would count against the VM's
> >> migration-threshold.
> >> So if moving were the right thing to do, it would have done it
> >> already.
> > 
> > OK. I think this was due to me still being stuck on the workings of
> > an
> > order constraint, but of course if the failures are instead
> > attributed
> > to the container, this would happen automatically already. True.
> > 
> > (Incidentally, I like "attribute", "ascribe" better than "delegate"
> > because to me, they better fit what's going on, if we sticked with
> > "delegate-failures". Just saying. ;-)
> > 
> >>> We already have on-fail settings. How would these play together?
> >> Good question. My initial thought was that it would be up to
> >> on-fail
> >> settings in the VM.
> > 
> > I'd prefer to keep that separate (as proposed below). Because if an
> > action of the *VM* really fails, I may want an admin to look into
> > it
> > (why could the bloody hypervisor not start/stop it?), which is
> > different
> > from restarting the VM if one of the resources within it needs
> > that.
> > 
> >>> Would it even make sense to have on-fail="restart-container"? (Or
> >>> a
> >>> nicer wording.)
> >>>
> >>> Hmmm. That might work. We allow a "container" to be specified as
> >>> a meta
> >>> attribute.
> >>>
> >>> If set, on-fail would default to restart container for most
> >>> actions. But
> >>> admins could actually modify it - say, they might want to set
> >>> monitor on-fail="ignore" to just get notified. And when we move
> >>> forward
> >>> to whiteboxes, we could have start/monitor/promote/demote
> >>> on-fail="restart" (like now) and stop
> >>> on-fail="restart-container".
> >>>
> >>> That appears reasonably neat?
> >> It does actually.
> >> I wasn't originally thinking it was necessary but it makes sense
> >> now
> >> that you point it out.
> > 
> > Yes, I think I like this too now.
> I like it too. Here comes the drafted code:
> https://github.com/gao-yan/pacemaker/commit/4f7b80baa42f3801c1fb8186aef076877f34dfea
> 
> It works in my simple test. Although failures of resources hasn't
> counted against container's migration-threshold yet, it shows you the
> basic idea. I'd appreciate if you can take a look first. It's very
> likely I'm really on the right track this time. ;-)

+1, I like where this is going  :)

-- Vossel


> > 
> > Uhm. Would "container" imply ordering + colocation, or would we
> > still
> > need them grouped (resource_set'ed, whatever)?
> > 
> > My, design is hard. ;-)
> :-)
> 
> 
> Regards,
>   Gao,Yan
> --
> Gao,Yan <ygao at suse.com>
> Software Engineer
> China Server Team, SUSE.
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>