[Pacemaker] DRBD < LVM < EXT4 < NFS performance
Christoph Bartoschek
ponto at pontohonk.de
Sun May 20 10:05:15 UTC 2012
Hi,
we have a two node setup with drbd below LVM and an Ext4 filesystem that is
shared vi NFS. The system shows low performance and lots of timeouts
resulting in unnecessary failovers from pacemaker.
The connection between both nodes is capable of 1 GByte/s as shown by iperf.
The network between the clients and the nodes is capable of 110 MByte/s. The
RAID can be filled with 450 MByte/s.
Thus I would expect to have a write performance of about 100 MByte/s. But dd
gives me only 20 MByte/s.
dd if=/dev/zero of=bigfile.10G bs=8192 count=1310720
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
While the slow dd runs there are timeouts on the server resulting in a
restart of some resources. In the logfile I also see:
[329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds.
[329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[329014.593273] nfsd D 0000000000000007 0 2252 2
0x00000000
[329014.593278] ffff88060a847c40 0000000000000046 ffff88060a847bf8
0000000300000001
[329014.593284] ffff88060a847fd8 ffff88060a847fd8 ffff88060a847fd8
0000000000013780
[329014.593290] ffff8806091416f0 ffff8806085bc4d0 ffff88060a847c50
ffff88061870c800
[329014.593295] Call Trace:
[329014.593303] [<ffffffff8165a55f>] schedule+0x3f/0x60
[329014.593309] [<ffffffff81265085>] jbd2_log_wait_commit+0xb5/0x130
[329014.593315] [<ffffffff8108aec0>] ? add_wait_queue+0x60/0x60
[329014.593321] [<ffffffff812111b8>] ext4_sync_file+0x208/0x2d0
[329014.593328] [<ffffffff811a62dd>] vfs_fsync_range+0x1d/0x40
[329014.593339] [<ffffffffa0227e51>] nfsd_commit+0xb1/0xd0 [nfsd]
[329014.593349] [<ffffffffa022f28d>] nfsd3_proc_commit+0x9d/0x100 [nfsd]
[329014.593356] [<ffffffffa0222a4b>] nfsd_dispatch+0xeb/0x230 [nfsd]
[329014.593373] [<ffffffffa00e9d95>] svc_process_common+0x345/0x690
[sunrpc]
[329014.593379] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
[329014.593391] [<ffffffffa00ea1e2>] svc_process+0x102/0x150 [sunrpc]
[329014.593397] [<ffffffffa02221ad>] nfsd+0xbd/0x160 [nfsd]
[329014.593403] [<ffffffffa02220f0>] ? nfsd_startup+0xf0/0xf0 [nfsd]
[329014.593407] [<ffffffff8108a42c>] kthread+0x8c/0xa0
[329014.593412] [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10
[329014.593416] [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0
[329014.593420] [<ffffffff81666bf0>] ? gs_change+0x13/0x13
Has anyone an idea what could cause such problems? I have no idea for
further analysis.
Is ext4 unsuitable for such a setup? Or is the linux nfs3 implementation
broken? Are buffers too large such that one has too wait too long for a
flush?
Thanks
Christoph Bartoschek
More information about the Pacemaker
mailing list