[Pacemaker] DRBD < LVM < EXT4 < NFS performance

Mon May 21 13:24:19 UTC 2012

Florian Haas wrote:

>> Thus I would expect to have a write performance of about 100 MByte/s. But
>> dd gives me only 20 MByte/s.
>>
>> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
>> 1310720+0 records in
>> 1310720+0 records out
>> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
> 
> If you used that same dd invocation for your local test that allegedly
> produced 450 MB/s, you've probably been testing only your page cache.
> Add oflag=dsync or oflag=direct (the latter will only work locally, as
> NFS doesn't support O_DIRECT).
> 
> If your RAID is one of reasonably contemporary SAS or SATA drives,
> then a sustained to-disk throughput of 450 MB/s would require about
> 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
> got? Or are you writing to SSDs?

I used the same invocation with different filenames each time. To which page 
cache to you refer? To the one on the client or on the server side?

We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10 
times with different files in a row:

for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192  
count=1310720; done

The resulting values on a system that is also used by other programs as 
reported by dd are:

515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320 
MB/s, 242 MB/s,  289 MB/s

So I think that the system is capable of more than 200 MB/s which is way 
more what can arrive over the network.

I've done the measurements on the filesystem that sits on top of LVM and 
DRBD. Thus I think that DRBD is not a problem.

However the strange thing is that I get 108 MB/s on the clients as soon as I 
disable the secondary node for DRBD. Maybe there is strange interaction 
between DRBD and NFS.

After reenabling the secondary node the DRBD synchronization is quite slow.

>>
>> Has anyone an idea what could cause such problems? I have no idea for
>> further analysis.
> 
> As a knee-jerk response, that might be the classic issue of NFS
> filling up the page cache until it hits the vm.dirty_ratio and then
> having a ton of stuff to write to disk, which the local I/O subsystem
> can't cope with.

Sounds reasonable but shouldn't the I/O subsystem be capable to write 
anything away that arrives? 

Christoph