[Pacemaker] DRBD < LVM < EXT4 < NFS performance

Sun May 20 10:05:15 UTC 2012

Hi,

we have a two node setup with drbd below LVM and an Ext4 filesystem that is 
shared vi NFS. The system shows low performance and lots of timeouts 
resulting in unnecessary failovers from pacemaker.

The connection between both nodes is capable of 1 GByte/s as shown by iperf. 
The network between the clients and the nodes is capable of 110 MByte/s. The 
RAID can be filled with 450 MByte/s.

Thus I would expect to have a write performance of about 100 MByte/s. But dd 
gives me only 20 MByte/s.

dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s

While the slow dd runs there are timeouts on the server resulting in a 
restart of some resources. In the logfile I also see:

[329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds.
[329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[329014.593273] nfsd            D 0000000000000007     0  2252      2 
0x00000000
[329014.593278]  ffff88060a847c40 0000000000000046 ffff88060a847bf8 
0000000300000001
[329014.593284]  ffff88060a847fd8 ffff88060a847fd8 ffff88060a847fd8 
0000000000013780
[329014.593290]  ffff8806091416f0 ffff8806085bc4d0 ffff88060a847c50 
ffff88061870c800
[329014.593295] Call Trace:
[329014.593303]  [<ffffffff8165a55f>] schedule+0x3f/0x60
[329014.593309]  [<ffffffff81265085>] jbd2_log_wait_commit+0xb5/0x130
[329014.593315]  [<ffffffff8108aec0>] ? add_wait_queue+0x60/0x60
[329014.593321]  [<ffffffff812111b8>] ext4_sync_file+0x208/0x2d0
[329014.593328]  [<ffffffff811a62dd>] vfs_fsync_range+0x1d/0x40
[329014.593339]  [<ffffffffa0227e51>] nfsd_commit+0xb1/0xd0 [nfsd]
[329014.593349]  [<ffffffffa022f28d>] nfsd3_proc_commit+0x9d/0x100 [nfsd]
[329014.593356]  [<ffffffffa0222a4b>] nfsd_dispatch+0xeb/0x230 [nfsd]
[329014.593373]  [<ffffffffa00e9d95>] svc_process_common+0x345/0x690 
[sunrpc]
[329014.593379]  [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
[329014.593391]  [<ffffffffa00ea1e2>] svc_process+0x102/0x150 [sunrpc]
[329014.593397]  [<ffffffffa02221ad>] nfsd+0xbd/0x160 [nfsd]
[329014.593403]  [<ffffffffa02220f0>] ? nfsd_startup+0xf0/0xf0 [nfsd]
[329014.593407]  [<ffffffff8108a42c>] kthread+0x8c/0xa0
[329014.593412]  [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10
[329014.593416]  [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0
[329014.593420]  [<ffffffff81666bf0>] ? gs_change+0x13/0x13

Has anyone an idea what could cause such problems? I have no idea for 
further analysis.

Is ext4 unsuitable for such a setup? Or is the linux nfs3 implementation 
broken? Are buffers too large such that one has too wait too long for a 
flush?

Thanks
Christoph Bartoschek