[Pacemaker] pingd
Bernd Schubert
bs_lists at aakef.fastmail.fm
Thu Sep 2 09:00:12 UTC 2010
On Thursday, September 02, 2010, Andrew Beekhof wrote:
> On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert
> > My proposal is to rip out all network code out of pingd and to add
> > slightly modified files from 'iputils'.
>
> Close, but thats not portable.
> Instead use ocf:pacemaker:ping which goes a step further and ditches
> the daemon piece altogether.
Hmm, we are already using that for now temporarily. But I don't think the ping
RA is suitable for larger clusters. The ping script RA runs everything
serially and only in intervals when called by lrmd. Now lets assume we have a
20 node cluster.
nodes = 20
timeout = 2
attempts = 2
Makes 80s for a single run with default already rather small timeouts, which
is IMHO a bit large. And with a shell script I don't see a way to improve
that. While we could send the pings in parallel, I have no idea how to lock
the variable of active nodes (active=`expr $active + 1`). I don't think that
the simple sh or even bash have a semaphore or mutex lock. So IMHO, we need a
language that supports that, rewriting the pingd RA is one choice, rewriting
the ping RA into python is another.
So in fact my first proposal also only was the first step - first add better
network code and then to make it multi-threaded - each ping host gets its own
thread.
Another reason why I don't like the shell RA too much is that shell takes a
considerable amount of CPU time. For a subset of systems where we need ping as
replacement for quorum policy (*) CPU time is precious.
Thanks,
Bernd
PS: (*) As you insist ;) on quorum with n/2 + 1 nodes, we use ping as
replacement. We simply cannot fulfill n/2 + 1, as controller failure takes
down 50% of the systems (virtual machines) and the systems (VMs) of the 2nd
controller are then supposed to take over failed services. I see that n/2 + 1
is optimal and also required for a few nodes. But if you have a larger set of
system (e.g. minimum 6 with the VM systems I have in my mind) n/2 + 1 is
sufficient, IMHO. Therefore I asked before to make the quorum policy
configurable. Now with Lustres multiple-mount-protection and additional stop
of resources due to ping, I'm willing to set quorum policy to ignore.
More information about the Pacemaker
mailing list