[Pacemaker] pingd

Thu Sep 2 19:33:59 UTC 2010

On Thu, Sep 2, 2010 at 4:05 PM, Lars Ellenberg
<lars.ellenberg at linbit.com> wrote:
> On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote:
>> On Thursday, September 02, 2010, Andrew Beekhof wrote:
>> > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert
>> > > My proposal is to rip out all network code out of pingd and to add
>> > > slightly modified files from 'iputils'.
>> >
>> > Close, but thats not portable.
>> > Instead use ocf:pacemaker:ping which goes a step further and ditches
>> > the daemon piece altogether.
>>
>> Hmm, we are already using that for now temporarily. But I don't think the ping
>> RA is suitable for larger clusters. The ping script RA runs everything
>> serially and only in intervals when called by lrmd. Now lets assume we have a
>> 20 node cluster.
>>
>> nodes = 20
>> timeout = 2
>> attempts = 2
>>
>> Makes 80s for a single run with default already rather small timeouts, which
>> is IMHO a bit large. And with a shell script I don't see a way to improve
>> that. While we could send the pings in parallel, I have no idea how to lock
>> the variable of active nodes (active=`expr $active + 1`). I don't think that
>> the simple sh or even bash have a semaphore or mutex lock. So IMHO, we need a
>> language that supports that, rewriting the pingd RA is one choice, rewriting
>> the ping RA into python is another.
>
> how about an fping RA ?
> active=$(fping -a -i 5 -t 250 -B1 -r1 $host_list 2>/dev/null | wc -l)
>
> terminates in about 3 seconds for a hostlist of 100 (on the LAN, 29 of
> which are alive).

Happy to add if someone writes it :-)

>
>> So in fact my first proposal also only was the first step - first add better
>> network code and then to make it multi-threaded - each ping host gets its own
>> thread.
>
> A working pingd daemon has the additional advantage that it can ask its
> peers for their ping node count, before actually updating the attribute,
> which should help with the "dampen race".

That happens at the attrd level in both cases.  pingd adds nothing here.

>
>> Another reason why I don't like the shell RA too much is that shell takes a
>> considerable amount of CPU time. For a subset of systems where we need ping as
>> replacement for quorum policy (*) CPU time is precious.
>>
>> Thanks,
>> Bernd
>>
>> PS: (*) As you insist ;) on quorum with n/2 + 1 nodes, we use ping as
>> replacement. We simply cannot fulfill n/2 + 1, as controller failure takes
>> down 50% of the systems (virtual machines) and the systems (VMs) of the 2nd
>> controller are then supposed to take over failed services. I see that n/2 + 1
>> is optimal and also required for a few nodes. But if you have a larger set of
>> system (e.g. minimum 6 with the VM systems I have in my mind) n/2 + 1 is
>> sufficient, IMHO.
>
> You meant to say you consider == n/2 sufficient, instead of > n/2 ?
>
>> Therefore I asked before to make the quorum policy
>> configurable. Now with Lustres multiple-mount-protection and additional stop
>> of resources due to ping,  I'm willing to set quorum policy to ignore.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>