[Pacemaker] ping resource polling skew

Wed Mar 20 09:07:39 UTC 2013

On 2013-03-20 04:11, Quentin Smith wrote:
> On Wed, 20 Mar 2013, Andreas Kurz wrote:
> 
>> On 2013-03-19 17:02, Quentin Smith wrote:
>>> Hi-
>>>
>>> I have my cluster configured to use a cloned ping resource, such that I
>>> can write a constraint that I prefer resources to run on a node that has
>>> network connectivity. That works fine if a machine loses its network
>>> connection (the ping attribute goes to 0, resources migrate to another
>>> machine, etc.).
>>>
>>> However, if instead what happens is the ping /target/ goes offline, it
>>> seems that Pacemaker will bounce resources around the cluster, as each
>>> node notices that the ping target is unreachable at a slightly different
>>> time.
>>
>> using more than one targets is always a good idea and choose targets
>> that are also highly available
> 
> Sure, and we have. No choice of ping targets is going to help us if the
> servers are partitioned from the rest of the network, through.

In that case typically no-one can access the services anyways, then not
having resources bouncing around is more a cosmetic issue ... assuming
it is not extremely expensive to start/stop them e.g. because of cold
caches.

> 
>>> Is there any way to get Pacemaker to delay resource transitions until at
>>> least one full polling cycle has happened, so that in the event of an
>>> outage of the ping target, resources stay put where they are running?
>>
>> there is the "dampen" parameter .... use a high value like 3 or more
>> times the monitor-interval to give all nodes the chance to detect the
>> dead target(s), that should help.
> 
> Does that actually help in this case? My understanding is that the
> dampen parameter will delay the attribute change for each host, but
> those delays will still tick down separately for each node, resulting in
> exactly the same behavior, just delayed by dampen seconds.

if the dampen time-out is reached and there was a permanent change of
that attribute on one node, all nodes flush their current value ... so
yes, that should actually help.

Regards,
Andreas

> 
> --Quentin
> 
>>
>> Regards,
>> Andreas
>>
>> -- 
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>>
>>> --Quentin
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 287 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130320/96647e23/attachment-0004.sig>