[Pacemaker] Enable remote monitoring

Wed Dec 5 13:00:57 EST 2012

On 12/06/12 00:36, David Vossel wrote:
> 
> 
> ----- Original Message -----
>> From: "Yan Gao" <ygao at suse.com>
>> To: pacemaker at oss.clusterlabs.org
>> Sent: Wednesday, December 5, 2012 6:27:05 AM
>> Subject: Re: [Pacemaker] Enable remote monitoring
>>
>> Hi,
>> This is the first step - the support of "restart-origin" for order
>> constraint along with the test cases:
>>
>> https://github.com/gao-yan/pacemaker/commits/restart-origin
>>
>> It looks straight-forward to me. Hope I didn't miss anything ;-)
> 
> I had made some in-line comments for you in git-hub.  It looks like you are on the right track.  
Thanks!

> I'm just not sure about the symmetrical=false use case for order constraints.
A "symmetrical=false" implies we don't care about the inverse order.
AFAICS, we shouldn't still restart the origin for this case.

> 
>>
>> If restart-origin="true" combines with kind="Optional", it just means
>> "Optional". So that a failed nagios resource would not affect the vm.
> 
> I agree, restart-origin is a no-op for advisory ordering. 
> 
>>
>> I'm not sure if we should relate the restarts count with the
>> migration-threshold of the basic resource.
> 
> I don't know what the "basic resource" refers to here. 
The "origin".

>  If we are talking about counting the restarts of the vm towards the migration-threshold, 
Yep

> I'd expect the vm to have the same behavior as whatever happens to 'B' right now for the use-case below.
> 
> Start A then Start B. When A fails restart B.  
> 
> Start vm then Start nagios. When nagios fails restart vm.
Sure, we have the behaviors with the code. I think we are talking about
the failure count of the VM should only affected by its own monitor, or
also by the resources within it.

> 
> 
>> Even without this, users
>> can
>> specify  how many failures of a particular nagios resource they can
>> tolerate on a node, the vm will migrate with it anyway.
>> And probably we
>> could have one of the nagios resources, no matter how many times it
>> fails, we just don't want the vm to migrate because of it.
> 
> I don't understand this last sentence.
If we didn't set a migration-threshold for a nagios resource, that means
we could always allow it to recover on a node if possible.

BTW,
> I believe we usually put new options into the 1.2.rng to settle for a
> bit before promoting them into the 1.1 scheme.
We changed the rule? We used to put them in 1.1 first and promote into
1.2 later when I did the other features. AFAIK, validate-with is
initially set to "pacemaker-1.2", which means users would get the
feature immediately, no?

Regards,
  Gao,Yan
-- 
Gao,Yan <ygao at suse.com>
Software Engineer
China Server Team, SUSE.