[Pacemaker] Enable remote monitoring

Wed Dec 5 20:46:23 EST 2012

On 06/12/2012, at 5:00 AM, "Gao,Yan" <ygao at suse.com> wrote:

> On 12/06/12 00:36, David Vossel wrote:
>> 
>> 
>> ----- Original Message -----
>>> From: "Yan Gao" <ygao at suse.com>
>>> To: pacemaker at oss.clusterlabs.org
>>> Sent: Wednesday, December 5, 2012 6:27:05 AM
>>> Subject: Re: [Pacemaker] Enable remote monitoring
>>> 
>>> Hi,
>>> This is the first step - the support of "restart-origin" for order
>>> constraint along with the test cases:
>>> 
>>> https://github.com/gao-yan/pacemaker/commits/restart-origin
>>> 
>>> It looks straight-forward to me. Hope I didn't miss anything ;-)
>> 
>> I had made some in-line comments for you in git-hub.  It looks like you are on the right track.  
> Thanks!
> 
>> I'm just not sure about the symmetrical=false use case for order constraints.
> A "symmetrical=false" implies we don't care about the inverse order.
> AFAICS, we shouldn't still restart the origin for this case.

symmetrical=false makes no sense here.  
If we stay with the restart-origin approach, then is should override symmetrical=false.

> 
>> 
>>> 
>>> If restart-origin="true" combines with kind="Optional", it just means
>>> "Optional". So that a failed nagios resource would not affect the vm.
>> 
>> I agree, restart-origin is a no-op for advisory ordering. 
>> 
>>> 
>>> I'm not sure if we should relate the restarts count with the
>>> migration-threshold of the basic resource.
>> 
>> I don't know what the "basic resource" refers to here. 
> The "origin".
> 
>> If we are talking about counting the restarts of the vm towards the migration-threshold, 
> Yep
> 
>> I'd expect the vm to have the same behavior as whatever happens to 'B' right now for the use-case below.
>> 
>> Start A then Start B. When A fails restart B.  
>> 
>> Start vm then Start nagios. When nagios fails restart vm.
> Sure, we have the behaviors with the code. I think we are talking about
> the failure count of the VM should only affected by its own monitor, or
> also by the resources within it.
> 
>> 
>> 
>>> Even without this, users
>>> can
>>> specify  how many failures of a particular nagios resource they can
>>> tolerate on a node, the vm will migrate with it anyway.
>>> And probably we
>>> could have one of the nagios resources, no matter how many times it
>>> fails, we just don't want the vm to migrate because of it.
>> 
>> I don't understand this last sentence.
> If we didn't set a migration-threshold for a nagios resource, that means
> we could always allow it to recover on a node if possible.
> 
> BTW,
>> I believe we usually put new options into the 1.2.rng to settle for a
>> bit before promoting them into the 1.1 scheme.
> We changed the rule?

No, David has it backwards :)

> We used to put them in 1.1 first and promote into
> 1.2 later when I did the other features. AFAIK, validate-with is
> initially set to "pacemaker-1.2", which means users would get the
> feature immediately, no?
> 
> Regards,
>  Gao,Yan
> -- 
> Gao,Yan <ygao at suse.com>
> Software Engineer
> China Server Team, SUSE.
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org