[Pacemaker] ifstatus OCF RA

Tue Feb 22 11:22:34 CET 2011

On Tue, Feb 22, 2011 at 09:30:00AM +0100, Dejan Muhamedagic wrote:
> Hi,
> 
> On Tue, Feb 22, 2011 at 08:54:14AM +0100, Florian Haas wrote:
> > On 2011-02-22 00:19, Frederik Schüler wrote:
> > > Hi,
> > > 
> > > On Monday 21 February 2011 21:29:19 Florian Haas wrote:
> > >>> as the various ocf:*:ping[d] incarnations don't meet my specific needs,
> > >>
> > >> May I ask why and how?
> > > 
> > > ocf:pacemaker:ping works, but takes approx. 25-30s to react at all, and 
> > > approx. 40s to complete the failover. But I need an immediate failover, 
> > > exactly as it worked ages ago with heartbeat-2 and ipfail.

Then how about reviving the "old ipfail" idea.

the current "ping" resource agent only pings on its monitor intervalls,
and synchronises different views of connectivity of the cluster nodes
via attrd and the dampen interval, which all adds to detection latency
and still can lead to funny effects with bad timing.

The ipfail thing made use of hearbeat "ping nodes", which are checked by
the heartbeat core communication processes every keepalive interval,
triggered on "membership" changes of "ping nodes" (which can be groups
of real ping addresses), immediately communicating with the other side
once more than deadtime pings are lost, and using a default delay of
2 * keepalive to trigger a takeover.

As implemented, that trigger works only with haresources mode of
heartbeat, as it talks directly to heartbeat core,
but that could be changed to instead talk to the cib, natively.

That would be heartbeat specific.

We could also revive "pingd" some way or other, which is the same
concept, agnostic to the cluster communication infrastructure:
running a daemon that continuously checks for connectivity.

I'm a big fan of smokeping, btw, and it should be easy enough to come up
with some resource agent that integrates with it.
It could start a smokeping instance with either a hand crafted config,
or generate a suitable config from resource agent parameters.
Which could then even do "alert"s, instead of waiting for the next
pacemaker monitoring interval.

And would even keep a history of connectivity data for easy review.

http://oss.oetiker.ch/smokeping/
http://oss.oetiker.ch/smokeping/doc/smokeping_examples.en.html
http://oss.oetiker.ch/smokeping/doc/smokeping_config.en.html#I____Alerts____

then configure it with e.g.
to=|/my/alert-script, which takes
  name-of-alert, target, loss-pattern, rtt-pattern, hostname [, raise]
  raise depending on edgetrigger,
and can thus set/clear/do whatever fancy things with attributes and
contstraints in the cib...

[I'm advocating this for a while now, and no-one picket it up yet ;-(]

The main "problem" I see is that ping_group and ipfail used to "publish"
its connectivity information right away, but then act upon it with a
delay to give the other side the chance to detect the same and abort
any action because of balanced connectivity information.

The attrd dampen thingy delays publishing of the connectivity
information, but the cib will act immediately once the information
is published. So that will again trigger useless resource relocations
if one side notices later than the other.

I'm not sure what can be done about that, though.

(correct me if I'm wrong, anyone.)

> > Well how about pushing down your monitor interval to something like 5
> > seconds, set attempts to 1, set timeout to 1, and dampen to 0? Then
> > basically as soon as you lose one ping, you can fail over.

very much increases the time for false triggers.

> > >> Well, what if the link is up but there's an upstream problem?
> > > 
> > > Good point, but this requirement is customer-driven. I have the cluster to 
> > > initiate a failover as quickly as possible within the test cases.
> > 
> > This sounds like a good time to educate the customer that too short
> > failover times are more of a curse rather than a blessing. :)
> 
> Right. But sometimes people need to learn the hard way.

Yep.

> > > I have a working setup with ocf:pacemaker:ping, but this was rejected as being 
> > > "too slow".

See above.

> > >> I've always liked how ocf:pacemaker:ping actually monitors
> > >> connectivity to an upstream IP, which covers both immediate link
> > >> failure and upstream problems. Similar to how in active/backup
> > >> bonding, you can fail over based on the status of an ARP request,
> > >> rather than MII link status.
> > > 
> > > I just checked the redhat cluster suite: the ip.sh RA there has a monitor_link 
> > > option, which does exactly what my ifstatus RA does. 
> > > Maybe this functionality could be added to the IPaddr2 script, but I guess 
> > > that wouldn't have more chances of being added than this one, correct?
> > 
> > Improving an existing resource agent pretty much always stands better
> > chances of being merged than adding a new one. That's my opinion, surely
> > others will correct me if theirs differ.
> 
> I also think that this would fit better into the existing IPaddr2
> RA. After all, it is about the network interface.
> 
> Somebody recently posted a set of monitor improvements for
> IPaddr2. Lars (Ellenberg) was on that. The link check should be

No braces necessary ;-)

> coordinated with that set of patches, so that we end up with a
> consolidated user interface.

There have been several suggestions of additions to the IPaddr2
monitoring.

one of them being to check "ip -o -f link link show" for LOWER_UP in
addition to UP, respectively for "NO-CARRIER" or similar, which should
be the better equivalent to ethtool "Link detected", though I'm not sure
which platforms/ip utils version actually support that.

An other alternative check was to look at the incoming packet counters,
after requesting packets with arp or ping.

Still that's only checked on "monitor interval", and has the overhead of
running the shell scripts in a more or less "tight loop", and has to
wait for timeouts on every invocation, even if nothing happens.

Summary:
 "link definetely down" monitoring will be integrated in IPaddr2 "soonish",
 but that is not sufficient for proper connectivity monitoring,
 as "link up" does not necessarily imply good connectivity.

 To complement the "link down" monitoring,
 some variant of the ping resource agent is necessary.

 For timely action on connectivity changes, we should revive pingd,
 or some other sort of daemon based connectivity monitoring.

Bonus points for anyone picking up the idea of smokeping integration.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.