[Pacemaker] DRBD < LVM < EXT4 < NFS performance

Thu May 24 12:34:51 UTC 2012

Hi,

On Mon, May 21, 2012 at 4:24 PM, Christoph Bartoschek <bartoschek at gmx.de> wrote:
> Florian Haas wrote:
>
>>> Thus I would expect to have a write performance of about 100 MByte/s. But
>>> dd gives me only 20 MByte/s.
>>>
>>> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
>>> 1310720+0 records in
>>> 1310720+0 records out
>>> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
>>
>> If you used that same dd invocation for your local test that allegedly
>> produced 450 MB/s, you've probably been testing only your page cache.
>> Add oflag=dsync or oflag=direct (the latter will only work locally, as
>> NFS doesn't support O_DIRECT).
>>
>> If your RAID is one of reasonably contemporary SAS or SATA drives,
>> then a sustained to-disk throughput of 450 MB/s would require about
>> 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
>> got? Or are you writing to SSDs?
>
> I used the same invocation with different filenames each time. To which page
> cache to you refer? To the one on the client or on the server side?
>
> We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10
> times with different files in a row:
>
> for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192
> count=1310720; done
>
> The resulting values on a system that is also used by other programs as
> reported by dd are:
>
> 515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320
> MB/s, 242 MB/s,  289 MB/s
>
> So I think that the system is capable of more than 200 MB/s which is way
> more what can arrive over the network.

A bit off-topic maybe.

Whenever you do these kinds of tests regarding performance on disk
(locally) to test actual speed and not some caching, as Florian said,
you should use oflag=direct option to dd and also echo 3 >
/proc/sys/vm/drop_caches and sync.

I usually use echo 3 > /proc/sys/vm/drop_caches && sync && date &&
time dd if=/dev/zero of=whatever bs=1G count=x oflag=direct && sync &&
date

You can assess if there is data being flushed if the results given by
dd differ from those obtained by calculating the amount of data
written between the two date calls. It also helps to push more data
than the controller can store.

Regards,
Dan

>
> I've done the measurements on the filesystem that sits on top of LVM and
> DRBD. Thus I think that DRBD is not a problem.
>
> However the strange thing is that I get 108 MB/s on the clients as soon as I
> disable the secondary node for DRBD. Maybe there is strange interaction
> between DRBD and NFS.
>
> After reenabling the secondary node the DRBD synchronization is quite slow.
>
>
>>>
>>> Has anyone an idea what could cause such problems? I have no idea for
>>> further analysis.
>>
>> As a knee-jerk response, that might be the classic issue of NFS
>> filling up the page cache until it hits the vm.dirty_ratio and then
>> having a ton of stuff to write to disk, which the local I/O subsystem
>> can't cope with.
>
> Sounds reasonable but shouldn't the I/O subsystem be capable to write
> anything away that arrives?
>
> Christoph
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Dan Frincu
CCNA, RHCE