[Pacemaker] resource moving unnecessarily due to ping race condition

Thu Sep 29 05:43:12 UTC 2011

On Tue, Sep 27, 2011 at 10:40 PM, Brad Johnson <bjohnson at ecessa.com> wrote:
> The patch alone does not give an advantage to the active node. But remember
> I said we are using an fping resource agent we wrote that varies the
> dampening based on which node it is running on and whether the score is
> rising or falling. But the dampening it sets was being over-ridden by the
> attrd flush message, since it stops the timer and sends the score
> immediately.

Have the RA wait for the custom time before calling attrd_updater with -d 0?

> That RA, along with the patch, solves our problem.

I appreciate that, but there's no way I can include the patch.

> I can now
> reboot ping nodes all day long without our resource failing back and forth,
> while still allowing legitimate fail-overs when a node truly has better
> network connectivity than the other.
>
>
> On 09/26/2011 09:57 PM, Andrew Beekhof wrote:
>>
>> On Mon, Sep 26, 2011 at 10:57 PM, Brad Johnson<bjohnson at ecessa.com>
>>  wrote:
>>>
>>> I agree that the patch assumes the use of "pingd" for the attribute name,
>>> and there may be a better way of coding that. However, I don't see how
>>> setting dampen=0 fixes our problem.  The problem occurs when a ping node
>>> becomes inaccessible to all nodes in the cluster (it is rebooted for
>>> example). Without giving any timing advantage to the currently active
>>> node,
>>
>> The patch doesn't do this though.
>>
>>> it is essentially just a race between the nodes to see who notices the
>>> outage first and can update the attribute fastest.
>>
>> Now I'm confused.
>>
>> You said "we do not want the other node to be able to challenge us to
>> an immediate score comparison".
>> dampen=0 does the same thing as the patch... it tells attrd to update
>> the CIB immediately, without waiting to give everyone a chance to
>> notice the change in connectivity too.
>>
>>> The result is we see
>>> fail-over when the ping node goes down, and fail-back when it comes back
>>> up.
>>> The fact is that dampening alone does not solve this. Which is why we use
>>> a
>>> resource agent that uses selective dampening based on where the resource
>>> is
>>> running.
>>>
>>> On 09/25/2011 08:58 PM, Andrew Beekhof wrote:
>>>>
>>>> On Fri, Sep 23, 2011 at 9:53 PM, Brad Johnson<bjohnson at ecessa.com>
>>>>  wrote:
>>>>>
>>>>> Yes, but the patch only affects the pingd attribute.
>>>>
>>>> Use of the name 'pingd' isnt mandatory though.
>>>>
>>>>> And we do not want the
>>>>> other node to be able to challenge us to an immediate score comparison.
>>>>> That
>>>>> is the whole idea behind the fping OCF resource agent we are using, to
>>>>> give
>>>>> the timing advantage to the node currently running the resource by
>>>>> delaying
>>>>> rising scores on the idle, and falling scores on the active node.
>>>>
>>>> Why not just set dampen=0?
>>>>
>>>>> On 09/22/2011 09:10 PM, Andrew Beekhof wrote:
>>>>>>
>>>>>> On Tue, Sep 20, 2011 at 10:34 PM, Brad Johnson<bjohnson at ecessa.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> It is not necessarily the case that the outside world can't reach the
>>>>>>> cluster. Ours is a multi-homed device connecting to multiple WANs and
>>>>>>> LANs.
>>>>>>> We want the device with the best connectivity to be the active
>>>>>>> device.
>>>>>>> To
>>>>>>> get around the problem of failovers occurring when a ping node
>>>>>>> reboots
>>>>>>> for
>>>>>>> example, I have written an fping OCF RA that uses different dampening
>>>>>>> delays
>>>>>>> based on if it is running on the active or idle device. I have also
>>>>>>> patched
>>>>>>> pacemaker attrd.c to fix it so it doesn't send an immediate update
>>>>>>> when
>>>>>>> it
>>>>>>> receives a flush message from the other node. This was causing it to
>>>>>>> ignore
>>>>>>> any running delay timer.
>>>>>>
>>>>>> Thats the point of the flush message though.  So that all nodes write
>>>>>> their current value at the same time.
>>>>>>
>>>>>>> Here is that patch:
>>>>>>>
>>>>>>> --- tools/attrd.orig.c    2011-09-13 08:29:46.946820348 -0500
>>>>>>> +++ tools/attrd.c    2011-09-14 13:33:59.606894754 -0500
>>>>>>> @@ -348,10 +348,14 @@
>>>>>>>         attrd_local_callback(xml);
>>>>>>>
>>>>>>>     } else if(ignore == NULL || safe_str_neq(from, attrd_uname)) {
>>>>>>> +        const char *attr  = crm_element_value(xml,
>>>>>>> F_ATTRD_ATTRIBUTE);
>>>>>>> +        /* Don't send update for score if msg is from other node */
>>>>>>> +        if(safe_str_eq(from, attrd_uname) || safe_str_neq(attr,
>>>>>>> "pingd")) {
>>>>>>>         crm_info("%s message from %s", op, from);
>>>>>>>         hash_entry = find_hash_entry(xml);
>>>>>>>         stop_attrd_timer(hash_entry);
>>>>>>>         attrd_perform_update(hash_entry);
>>>>>>> +        }
>>>>>>>     }
>>>>>>>     free_xml(xml);
>>>>>>>  }
>>>>>>>
>>>>>>>
>>>>>>> On 09/19/2011 10:51 PM, Andrew Beekhof wrote:
>>>>>>>>
>>>>>>>> On Sun, Sep 11, 2011 at 2:30 AM, Vadym Chepkov<vchepkov at gmail.com>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:
>>>>>>>>>
>>>>>>>>>>>> On 09/08/11 20:59, Brad Johnson wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> We have a 2 node cluster with a single resource. The resource
>>>>>>>>>>>>> must
>>>>>>>>>>>>> run
>>>>>>>>>>>>> on only a single node at one time. Using the pacemaker:ocf:ping
>>>>>>>>>>>>> RA
>>>>>>>>>>>>> we
>>>>>>>>>>>>> are pinging a WAN gateway and a LAN host on each node so the
>>>>>>>>>>>>> resource
>>>>>>>>>>>>> runs on the node with the greatest connectivity. The problem is
>>>>>>>>>>>>> when
>>>>>>>>>>>>> a
>>>>>>>>>>>>> ping host goes down (so both nodes lose connectivity to it),
>>>>>>>>>>>>> the
>>>>>>>>>>>>> resource moves to the other node due to timing differences in
>>>>>>>>>>>>> how
>>>>>>>>>>>>> fast
>>>>>>>>>>>>> they update the score attribute. The dampening value has no
>>>>>>>>>>>>> effect,
>>>>>>>>>>>>> since it delays both nodes by the same amount. These
>>>>>>>>>>>>> unnecessary
>>>>>>>>>>>>> fail-overs aren't acceptable since they are disruptive to the
>>>>>>>>>>>>> network
>>>>>>>>>>>>> for no reason.
>>>>>>>>>>>>> Is there a way to dampen the ping update by different amounts
>>>>>>>>>>>>> on
>>>>>>>>>>>>> the
>>>>>>>>>>>>> active and passive nodes? Or some other way to configure the
>>>>>>>>>>>>> cluster
>>>>>>>>>>>>> to
>>>>>>>>>>>>> try to keep the resource where it is during these tie score
>>>>>>>>>>>>> scenarios?
>>>>>>>>>>
>>>>>>>>>> location pingd-constraint group_1 \
>>>>>>>>>>  rule $id="pingd-constraint-rule" pingd: defined pingd
>>>>>>>>>>
>>>>>>>>>> May I suggest that you simply change this constraint to
>>>>>>>>>>
>>>>>>>>>> location pingd-constraint group_1 \
>>>>>>>>>>  rule $id="pingd-constraint-rule" \
>>>>>>>>>>    -inf: not_defined pingd or pingd lte 0
>>>>>>>>>>
>>>>>>>>>> That way, only a host that definitely has _no_ connectivity
>>>>>>>>>> carries
>>>>>>>>>> a
>>>>>>>>>> -INF score for that resource group. And I believe that is what you
>>>>>>>>>> really want, rather than take the actual ping score as a placement
>>>>>>>>>> weight (your "best connectivity" approach).
>>>>>>>>>>
>>>>>>>>>> Just my 2 cents, though.
>>>>>>>>>>
>>>>>>>>> Even though this approach was recommended many times, there is a
>>>>>>>>> problem
>>>>>>>>> with it.
>>>>>>>>> What if all nodes for some reason are not able to ping ?
>>>>>>>>> This rule would cause a resource to be brought down completely,
>>>>>>>>> whereas
>>>>>>>>> if you use "best connectivity" approach it will stay up where it
>>>>>>>>> was
>>>>>>>>> before
>>>>>>>>> network failed.
>>>>>>>>
>>>>>>>> If the outside[1] world can't reach the cluster, is there much
>>>>>>>> benefit
>>>>>>>> in having it running?
>>>>>>>>
>>>>>>>> [1] Substitute "outside" for wherever your users are, hopefully you
>>>>>>>> picked a ping node from the same area.
>>>>>>>>
>>>>>>>>> Vadym
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs:
>>>>>>>
>>>>>>>
>>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs:
>>>>>>
>>>>>>
>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs:
>>>>>
>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs:
>>>>
>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>