[Pacemaker] Enable remote monitoring
Lars Marowsky-Bree
lmb at suse.com
Thu Dec 6 10:24:05 UTC 2012
On 2012-12-06T20:04:20, Andrew Beekhof <andrew at beekhof.net> wrote:
> >> Does that make sense though?
> >> You've not achieved anything a restart wouldn't have done.
> >> The choice to move the VM should be up to the VM.
> > If the fail-count of a nagios resource reaches its own
> > migration-threshold, the colocated VM should migrate with it anyway,
> > shouldn't it?
>
> But moving a nagios resource makes no sense.
Exactly; we would want to move the container/parent.
> Because its running inside the guest, which would have already moved
> if it was the right thing to do.
No, that's not a given. The VM might be "healthy" (as in, the kernel is
running), but a service being monitored within it may not have
sufficient resources/CPU/IO/network or even connectivity problems on a
given host, to the point where trying to restart it on another
hypervisor makes sense.
But migration-threshold on the nagios primitive combined with a
mandatory colocation constraint will take care of that already, if an
admin wants to configure such.
I agree that, for the most part, people will not do that but keep
restarting VMs.
> > I like the concept of "failure-delegate". If we introduce it, it sounds
> > more like a resource's meta/op attribute to me, rather than into order
> > constraint or group. What do you think?
> Yes. It would be a resource meta attribute.
Hmmm. OK, I think I see where this is going.
We already have on-fail settings. How would these play together?
Would it even make sense to have on-fail="restart-container"? (Or a
nicer wording.)
Hmmm. That might work. We allow a "container" to be specified as a meta
attribute.
If set, on-fail would default to restart container for most actions. But
admins could actually modify it - say, they might want to set
monitor on-fail="ignore" to just get notified. And when we move forward
to whiteboxes, we could have start/monitor/promote/demote
on-fail="restart" (like now) and stop on-fail="restart-container".
That appears reasonably neat?
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list