[Pacemaker] resource moving unnecessarily due to ping race condition

Andrew Beekhof andrew at beekhof.net
Mon Sep 19 23:51:21 EDT 2011


On Sun, Sep 11, 2011 at 2:30 AM, Vadym Chepkov <vchepkov at gmail.com> wrote:
>
> On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:
>
>>>> On 09/08/11 20:59, Brad Johnson wrote:
>>>>> We have a 2 node cluster with a single resource. The resource must run
>>>>> on only a single node at one time. Using the pacemaker:ocf:ping RA we
>>>>> are pinging a WAN gateway and a LAN host on each node so the resource
>>>>> runs on the node with the greatest connectivity. The problem is when a
>>>>> ping host goes down (so both nodes lose connectivity to it), the
>>>>> resource moves to the other node due to timing differences in how fast
>>>>> they update the score attribute. The dampening value has no effect,
>>>>> since it delays both nodes by the same amount. These unnecessary
>>>>> fail-overs aren't acceptable since they are disruptive to the network
>>>>> for no reason.
>>>>> Is there a way to dampen the ping update by different amounts on the
>>>>> active and passive nodes? Or some other way to configure the cluster to
>>>>> try to keep the resource where it is during these tie score scenarios?
>>
>> location pingd-constraint group_1 \
>>  rule $id="pingd-constraint-rule" pingd: defined pingd
>>
>> May I suggest that you simply change this constraint to
>>
>> location pingd-constraint group_1 \
>>  rule $id="pingd-constraint-rule" \
>>    -inf: not_defined pingd or pingd lte 0
>>
>> That way, only a host that definitely has _no_ connectivity carries a
>> -INF score for that resource group. And I believe that is what you
>> really want, rather than take the actual ping score as a placement
>> weight (your "best connectivity" approach).
>>
>> Just my 2 cents, though.
>>
>
> Even though this approach was recommended many times, there is a problem with it.
> What if all nodes for some reason are not able to ping ?
> This rule would cause a resource to be brought down completely, whereas if you use "best connectivity" approach it will stay up where it was before network failed.

If the outside[1] world can't reach the cluster, is there much benefit
in having it running?

[1] Substitute "outside" for wherever your users are, hopefully you
picked a ping node from the same area.

>
> Vadym
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




More information about the Pacemaker mailing list