[Pacemaker] Enable remote monitoring

Gao,Yan ygao at suse.com
Thu Dec 6 23:19:55 EST 2012


On 12/07/12 12:09, Andrew Beekhof wrote:
> On Fri, Dec 7, 2012 at 3:00 PM, Gao,Yan <ygao at suse.com> wrote:
>> On 12/07/12 07:38, Andrew Beekhof wrote:
>>>
>>> On 06/12/2012, at 10:42 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:
>>>
>>>> On 2012-12-06T22:25:40, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>
>>>>> But any failures of the nagios agents would count against the VM's
>>>>> migration-threshold.
>>>>> So if moving were the right thing to do, it would have done it already.
>>>>
>>>> OK. I think this was due to me still being stuck on the workings of an
>>>> order constraint, but of course if the failures are instead attributed
>>>> to the container, this would happen automatically already. True.
>>>>
>>>> (Incidentally, I like "attribute", "ascribe" better than "delegate"
>>>> because to me, they better fit what's going on, if we sticked with
>>>> "delegate-failures". Just saying. ;-)
>>>
>>> My use of "delegate" comes from my time with ObjectiveC where its common practice to use them for "I'm not going to handle X but here is something that does" style functionality.
>>> Which fits nicely with what we're doing here.
>>>
>>> container="vm"  also works though.
>>>
>>>>
>>>>>> We already have on-fail settings. How would these play together?
>>>>> Good question. My initial thought was that it would be up to on-fail
>>>>> settings in the VM.
>>>>
>>>> I'd prefer to keep that separate (as proposed below). Because if an
>>>> action of the *VM* really fails, I may want an admin to look into it
>>>> (why could the bloody hypervisor not start/stop it?), which is different
>>>> from restarting the VM if one of the resources within it needs that.
>>>>
>>>>>> Would it even make sense to have on-fail="restart-container"? (Or a
>>>>>> nicer wording.)
>>>>>>
>>>>>> Hmmm. That might work. We allow a "container" to be specified as a meta
>>>>>> attribute.
>>>>>>
>>>>>> If set, on-fail would default to restart container for most actions. But
>>>>>> admins could actually modify it - say, they might want to set
>>>>>> monitor on-fail="ignore" to just get notified. And when we move forward
>>>>>> to whiteboxes, we could have start/monitor/promote/demote
>>>>>> on-fail="restart" (like now) and stop on-fail="restart-container".
>>>>>>
>>>>>> That appears reasonably neat?
>>>>> It does actually.
>>>>> I wasn't originally thinking it was necessary but it makes sense now
>>>>> that you point it out.
>>>>
>>>> Yes, I think I like this too now.
>>>>
>>>> Uhm. Would "container" imply ordering + colocation, or would we still
>>>> need them grouped (resource_set'ed, whatever)?
>>>
>>> Ordering: absolutely
>> Would any user not like the implied order? Instead want an asymmetrical
>> or some curious one?
> 
> Conceptually it doesn't make any sense IMHO.
> By definition things cant be in/on the container if the container
> doesn't exist yet.
Right.

> 
> The one thing we've not addressed yet is probing, thats going to be fun :)
I guess there should be some way for the nagios RAs to return
NOT_RUNNING if there's nothing yet, no?

Regards,
  Gao,Yan
-- 
Gao,Yan <ygao at suse.com>
Software Engineer
China Server Team, SUSE.




More information about the Pacemaker mailing list