[Pacemaker] resource moving unnecessarily due to ping race condition

Tue Sep 27 02:57:50 UTC 2011

On Mon, Sep 26, 2011 at 10:57 PM, Brad Johnson <bjohnson at ecessa.com> wrote:
> I agree that the patch assumes the use of "pingd" for the attribute name,
> and there may be a better way of coding that. However, I don't see how
> setting dampen=0 fixes our problem.  The problem occurs when a ping node
> becomes inaccessible to all nodes in the cluster (it is rebooted for
> example). Without giving any timing advantage to the currently active node,

The patch doesn't do this though.

> it is essentially just a race between the nodes to see who notices the
> outage first and can update the attribute fastest.

Now I'm confused.

You said "we do not want the other node to be able to challenge us to
an immediate score comparison".
dampen=0 does the same thing as the patch... it tells attrd to update
the CIB immediately, without waiting to give everyone a chance to
notice the change in connectivity too.

> The result is we see
> fail-over when the ping node goes down, and fail-back when it comes back up.
> The fact is that dampening alone does not solve this. Which is why we use a
> resource agent that uses selective dampening based on where the resource is
> running.
>
> On 09/25/2011 08:58 PM, Andrew Beekhof wrote:
>>
>> On Fri, Sep 23, 2011 at 9:53 PM, Brad Johnson<bjohnson at ecessa.com>  wrote:
>>>
>>> Yes, but the patch only affects the pingd attribute.
>>
>> Use of the name 'pingd' isnt mandatory though.
>>
>>> And we do not want the
>>> other node to be able to challenge us to an immediate score comparison.
>>> That
>>> is the whole idea behind the fping OCF resource agent we are using, to
>>> give
>>> the timing advantage to the node currently running the resource by
>>> delaying
>>> rising scores on the idle, and falling scores on the active node.
>>
>> Why not just set dampen=0?
>>
>>> On 09/22/2011 09:10 PM, Andrew Beekhof wrote:
>>>>
>>>> On Tue, Sep 20, 2011 at 10:34 PM, Brad Johnson<bjohnson at ecessa.com>
>>>>  wrote:
>>>>>
>>>>> It is not necessarily the case that the outside world can't reach the
>>>>> cluster. Ours is a multi-homed device connecting to multiple WANs and
>>>>> LANs.
>>>>> We want the device with the best connectivity to be the active device.
>>>>> To
>>>>> get around the problem of failovers occurring when a ping node reboots
>>>>> for
>>>>> example, I have written an fping OCF RA that uses different dampening
>>>>> delays
>>>>> based on if it is running on the active or idle device. I have also
>>>>> patched
>>>>> pacemaker attrd.c to fix it so it doesn't send an immediate update when
>>>>> it
>>>>> receives a flush message from the other node. This was causing it to
>>>>> ignore
>>>>> any running delay timer.
>>>>
>>>> Thats the point of the flush message though.  So that all nodes write
>>>> their current value at the same time.
>>>>
>>>>> Here is that patch:
>>>>>
>>>>> --- tools/attrd.orig.c    2011-09-13 08:29:46.946820348 -0500
>>>>> +++ tools/attrd.c    2011-09-14 13:33:59.606894754 -0500
>>>>> @@ -348,10 +348,14 @@
>>>>>         attrd_local_callback(xml);
>>>>>
>>>>>     } else if(ignore == NULL || safe_str_neq(from, attrd_uname)) {
>>>>> +        const char *attr  = crm_element_value(xml, F_ATTRD_ATTRIBUTE);
>>>>> +        /* Don't send update for score if msg is from other node */
>>>>> +        if(safe_str_eq(from, attrd_uname) || safe_str_neq(attr,
>>>>> "pingd")) {
>>>>>         crm_info("%s message from %s", op, from);
>>>>>         hash_entry = find_hash_entry(xml);
>>>>>         stop_attrd_timer(hash_entry);
>>>>>         attrd_perform_update(hash_entry);
>>>>> +        }
>>>>>     }
>>>>>     free_xml(xml);
>>>>>  }
>>>>>
>>>>>
>>>>> On 09/19/2011 10:51 PM, Andrew Beekhof wrote:
>>>>>>
>>>>>> On Sun, Sep 11, 2011 at 2:30 AM, Vadym Chepkov<vchepkov at gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:
>>>>>>>
>>>>>>>>>> On 09/08/11 20:59, Brad Johnson wrote:
>>>>>>>>>>>
>>>>>>>>>>> We have a 2 node cluster with a single resource. The resource
>>>>>>>>>>> must
>>>>>>>>>>> run
>>>>>>>>>>> on only a single node at one time. Using the pacemaker:ocf:ping
>>>>>>>>>>> RA
>>>>>>>>>>> we
>>>>>>>>>>> are pinging a WAN gateway and a LAN host on each node so the
>>>>>>>>>>> resource
>>>>>>>>>>> runs on the node with the greatest connectivity. The problem is
>>>>>>>>>>> when
>>>>>>>>>>> a
>>>>>>>>>>> ping host goes down (so both nodes lose connectivity to it), the
>>>>>>>>>>> resource moves to the other node due to timing differences in how
>>>>>>>>>>> fast
>>>>>>>>>>> they update the score attribute. The dampening value has no
>>>>>>>>>>> effect,
>>>>>>>>>>> since it delays both nodes by the same amount. These unnecessary
>>>>>>>>>>> fail-overs aren't acceptable since they are disruptive to the
>>>>>>>>>>> network
>>>>>>>>>>> for no reason.
>>>>>>>>>>> Is there a way to dampen the ping update by different amounts on
>>>>>>>>>>> the
>>>>>>>>>>> active and passive nodes? Or some other way to configure the
>>>>>>>>>>> cluster
>>>>>>>>>>> to
>>>>>>>>>>> try to keep the resource where it is during these tie score
>>>>>>>>>>> scenarios?
>>>>>>>>
>>>>>>>> location pingd-constraint group_1 \
>>>>>>>>  rule $id="pingd-constraint-rule" pingd: defined pingd
>>>>>>>>
>>>>>>>> May I suggest that you simply change this constraint to
>>>>>>>>
>>>>>>>> location pingd-constraint group_1 \
>>>>>>>>  rule $id="pingd-constraint-rule" \
>>>>>>>>    -inf: not_defined pingd or pingd lte 0
>>>>>>>>
>>>>>>>> That way, only a host that definitely has _no_ connectivity carries
>>>>>>>> a
>>>>>>>> -INF score for that resource group. And I believe that is what you
>>>>>>>> really want, rather than take the actual ping score as a placement
>>>>>>>> weight (your "best connectivity" approach).
>>>>>>>>
>>>>>>>> Just my 2 cents, though.
>>>>>>>>
>>>>>>> Even though this approach was recommended many times, there is a
>>>>>>> problem
>>>>>>> with it.
>>>>>>> What if all nodes for some reason are not able to ping ?
>>>>>>> This rule would cause a resource to be brought down completely,
>>>>>>> whereas
>>>>>>> if you use "best connectivity" approach it will stay up where it was
>>>>>>> before
>>>>>>> network failed.
>>>>>>
>>>>>> If the outside[1] world can't reach the cluster, is there much benefit
>>>>>> in having it running?
>>>>>>
>>>>>> [1] Substitute "outside" for wherever your users are, hopefully you
>>>>>> picked a ping node from the same area.
>>>>>>
>>>>>>> Vadym
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs:
>>>>>>>
>>>>>>>
>>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs:
>>>>>>
>>>>>>
>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs:
>>>>>
>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs:
>>>>
>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>