[Pacemaker] WARN: Rexmit of seq ..........

Lars Ellenberg lars.ellenberg at linbit.com
Wed Mar 24 17:55:15 EDT 2010


On Wed, Mar 24, 2010 at 11:40:54AM +0000, Joseph, Lester wrote:
> Yes they can. That's the confusing bit....
> 
> As I mentioned in previous email, we rebooted the switch and the nodes
> would have lost connectivity briefly. But the switch is back online
> now and the nodes have connectivity as they did before. Yet the
> message is constantly being generated despite my actions.
> 
> I will continue to troubleshoot.

I have this theory that
include/heartbeat.h:195:#define MAXMSGHIST      500
may be too low and may wrap (several times?) on a busy pacemaker
cluster, or if you have very low keepalive set, once you have "flaky"
communication problems for an extended period of time.

If someone has a way to reproduce this behaviour,
we could check if upping that define would "fix" it
(extend the period where heartbeat can cope with "flaky").

How to get out of there?
well...

First, get your comms in order.

Then, maybe restarting the "loudest" node helps?
Or on both? Or on the node that does _not_ log those messages?
Dunno... never seen this myself.
Though there are sporadic reports of similar messages in the archives,
and mystical "workarounds" involving the deletion of some files of which
I very much doubt that have anything to do with this particular "Rexmit"
message...


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.




More information about the Pacemaker mailing list