[Pacemaker] streamed writes fail with migration for NFS v3 over TCP

Wed May 20 21:07:36 EDT 2009

Bob Haxo wrote:
> Anyone have any ideas why NFSv3 over TCP reads should be successful
> across 100s of migrations and failovers, but writes bomb?

You might be suffering from a variant of this:

  http://marc.info/?l=linux-nfs&m=123175640421702&w=2

In particular, note the behaviour described for an NFS client doing
streaming 32K writes over TCP, when the server disappears:

> - unfortunately, because the server was AWOL for all of 8-9 ms, and
>   the client is sending piddly little 32K WRITEs, the client has in
>   the meantime sent quite a few rpcs, and has hit the clientside
>   limit of 16 outstanding rpcs.
> - so the client cannot send more rpcs until the server replies to
>   at least one of the last 16 rpcs...but the server has forgotten
>   all of those rpcs and no replies are forthcoming
> - finally, after about 95 sec the client's rpc timeout triggers
>   and the client retries the 1st of those 16 rpcs
> - the server replies immediately and traffic flows again normally.

The rest of the stuff about "exportfs -i" and sunrpc caching may or may
not apply, depending on exactly what you're doing.

Regards,

Tim