[Pacemaker] Enable remote monitoring

Thu Dec 6 11:25:40 UTC 2012

On Thu, Dec 6, 2012 at 9:24 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:
> On 2012-12-06T20:04:20, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> >> Does that make sense though?
>> >> You've not achieved anything a restart wouldn't have done.
>> >> The choice to move the VM should be up to the VM.
>> > If the fail-count of a nagios resource reaches its own
>> > migration-threshold, the colocated VM should migrate with it anyway,
>> > shouldn't it?
>>
>> But moving a nagios resource makes no sense.
>
> Exactly; we would want to move the container/parent.
>
>> Because its running inside the guest, which would have already moved
>> if it was the right thing to do.
>
> No, that's not a given. The VM might be "healthy" (as in, the kernel is
> running), but a service being monitored within it may not have
> sufficient resources/CPU/IO/network or even connectivity problems on a
> given host, to the point where trying to restart it on another
> hypervisor makes sense.

But any failures of the nagios agents would count against the VM's
migration-threshold.
So if moving were the right thing to do, it would have done it already.

>
> But migration-threshold on the nagios primitive combined with a
> mandatory colocation constraint will take care of that already, if an
> admin wants to configure such.
>
> I agree that, for the most part, people will not do that but keep
> restarting VMs.
>
>> > I like the concept of "failure-delegate". If we introduce it, it sounds
>> > more like a resource's meta/op attribute to me, rather than into order
>> > constraint or group. What do you think?
>> Yes. It would be a resource meta attribute.
>
> Hmmm. OK, I think I see where this is going.
>
> We already have on-fail settings. How would these play together?

Good question. My initial thought was that it would be up to on-fail
settings in the VM.

> Would it even make sense to have on-fail="restart-container"? (Or a
> nicer wording.)
>
> Hmmm. That might work. We allow a "container" to be specified as a meta
> attribute.
>
> If set, on-fail would default to restart container for most actions. But
> admins could actually modify it - say, they might want to set
> monitor on-fail="ignore" to just get notified. And when we move forward
> to whiteboxes, we could have start/monitor/promote/demote
> on-fail="restart" (like now) and stop on-fail="restart-container".
>
> That appears reasonably neat?

It does actually.
I wasn't originally thinking it was necessary but it makes sense now
that you point it out.

>
>
>
> Regards,
>     Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org