[Pacemaker] streamed writes fail with migration for NFS v3 over TCP

Wed May 20 12:39:58 EDT 2009

On Tue, May 19, 2009 at 03:15:17PM -0700, Bob Haxo wrote:
> Greetings,
> 
> I find that streamed writes fail with migration for NFS v3 over TCP.
> Not every time, but almost every time.
> 
> Streamed writes continue nicely across many migrations for NFS v3 over
> UDP.
> 
> With TCP, writes continue with migration back to the initial server.
> 
> Does anyone have HA NFS migrations working for NFS over TCP?
> 
> Suggestions?

tcpdump/tshark dump nfs traffic during a switchover.
analyse with wireshark.

suspicions:
 timeo= mount option does a retry of failed requests every x seconds.
 maybe it just needs a long time to recognize the failover?
 do you find "NFS server not responding" in the client logs?

 connection tracking firewall on "new" server may drop tcp packets
 that do not fit into existing connections,
 so on retry you may run into much longer timeouts.
 if you have a firewall, and you only ACCEPT "new" or "established"
 connections, but DROP everything else, consider to instead REJECT
 with tcp-reset NFS traffic from internal clients that connection
 tracking does not know about.

analysing the network dump during a switchover/failover should be enough
to trouble shoot your issue.

btw, what kernel you are on?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.