[Pacemaker] Enable remote monitoring

Wed Dec 5 19:48:33 EST 2012

On 12/06/12 04:52, David Vossel wrote:
>>>>
>>>> Hi,
>>>> This is the first step - the support of "restart-origin" for order
>>>> constraint along with the test cases:
>>>>
>>>> https://github.com/gao-yan/pacemaker/commits/restart-origin
>>>>
>>>> It looks straight-forward to me. Hope I didn't miss anything ;-)
>>>
>>> I had made some in-line comments for you in git-hub.  It looks like
>>> you are on the right track.
>> Thanks!
>>
>>> I'm just not sure about the symmetrical=false use case for order
>>> constraints.
>> A "symmetrical=false" implies we don't care about the inverse order.
>>
>> AFAICS, we shouldn't still restart the origin for this case.
> 
> Yeah, I suppose you are right.  I wouldn't have thought of these two options as being related, but we need that inverse constraint to force the restart of A.  Utilizing the inverse order constraint internally makes the implementation of this option much cleaner than it would be otherwise.
> 
> I have no idea why someone would want to do this... but what would happen with the following.
> 
> start A then promote B restart-origin=true
> 
> would A get restarted when B is demoted... or when B fails/stops?
Hmm, you are right. I missed that somehow. We should rethink how to
implement it in a more proper way.

> 
>>>
>>>>
>>>> If restart-origin="true" combines with kind="Optional", it just
>>>> means
>>>> "Optional". So that a failed nagios resource would not affect the
>>>> vm.
>>>
>>> I agree, restart-origin is a no-op for advisory ordering.
>>>
>>>>
>>>> I'm not sure if we should relate the restarts count with the
>>>> migration-threshold of the basic resource.
>>>
>>> I don't know what the "basic resource" refers to here.
>> The "origin".
>>
>>>  If we are talking about counting the restarts of the vm towards
>>>  the migration-threshold,
>> Yep
>>
>>> I'd expect the vm to have the same behavior as whatever happens to
>>> 'B' right now for the use-case below.
>>>
>>> Start A then Start B. When A fails restart B.
>>>
>>> Start vm then Start nagios. When nagios fails restart vm.
>> Sure, we have the behaviors with the code. I think we are talking
>> about
>> the failure count of the VM should only affected by its own monitor,
>> or
>> also by the resources within it.
> 
> I see.  Mapping the failcount of one resource to another resource seems like it would be difficult for us to represent in the configuration without using some sort of container group like object where the parent resource inherited failures from the children.
Indeed.

> 
>>
>>>
>>>
>>>> Even without this, users
>>>> can
>>>> specify  how many failures of a particular nagios resource they
>>>> can
>>>> tolerate on a node, the vm will migrate with it anyway.
>>>> And probably we
>>>> could have one of the nagios resources, no matter how many times
>>>> it
>>>> fails, we just don't want the vm to migrate because of it.
>>>
>>> I don't understand this last sentence.
>> If we didn't set a migration-threshold for a nagios resource, that
>> means
>> we could always allow it to recover on a node if possible.
>>
>> BTW,
>>> I believe we usually put new options into the 1.2.rng to settle for
>>> a
>>> bit before promoting them into the 1.1 scheme.
>> We changed the rule? We used to put them in 1.1 first and promote
>> into
>> 1.2 later when I did the other features. AFAIK, validate-with is
>> initially set to "pacemaker-1.2", which means users would get the
>> feature immediately, no?
> 
> Ah, I got the whole thing backwards. You are correct.
> Sorry :)
No problem :-)

Regards,
  Gao,Yan
-- 
Gao,Yan <ygao at suse.com>
Software Engineer
China Server Team, SUSE.