[Pacemaker] KVM live migration and multipath

Vladislav Bogdanov bubble at hoster-ok.com
Fri Jun 21 10:47:28 EDT 2013


21.06.2013 17:23, Sven Arnold wrote:
> Thank you for replying, Vladislav!
> 
> 
>> I think the problem should be unrelated to iSCSI, you have correct setup
>> (of course I did not thoroughly look through all info, but idea is
>> perfectly correct).
> 
> Thank you for confirming.
> 
>> Did you turn caching off for your VMs disks?
> 
> That's a point. Indeed caching was not explicitely turned off and I just
> noticed that the default setting of the cache attribute of the device
> tag in libvirt has changed. [1]
> I would expect that libvirt flushes all caches before finalizing the
> migration process. But it is probably best to turn off caches anyway.
> 
> I have now configured:
> 
> <disk type='block' device='disk'>
>       <driver name='qemu' type='raw' cache='none'/>

I would also switch to a native IO (aio) if your kernel/qemu support
that. Otherwise qemu allocates several dedicated IO threads, and it is
much slower that aio. There were some problems with aio in the past, but
it should work ok for recent enough distros.


>       <source dev='/dev/mapper/1p-lun1-vm_disk'/>
>       <target dev='vda' bus='virtio'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
> function='0x0'/>
> </disk>
> 
> Unfortunately the problem persists :-(
> (To check i run a loop of dd commands while migrating the vm, restarting
> iscsi initiators and iscsitargets. I still get errors in the fs.)

May be that may depend on combination of libvirt/qemu versions and
migration mode used?

And, do you always have fs corruption, independently of IO load?

> 
> 
>>> I have configured iSCSI timeouts rather short (noop_out_timeout 5
>>> seconds) and "no_path_retry queue" on the multipath device.
>>>
>>> My question(s):
>>>
>>> 1) Is it conceptually wrong what I am trying to accomplish?
>>
>> No, I use almost the same setup in production. Except I use IET and I
>> have cLVM on top of luns.
> 
> Good to hear. Could the use of cLVM make a diffenrence (since this layer
> is cluster aware)?

It shouldn't. at least its clustering part, it only coordinates metadata
operations (and activation) across cluster. It has nothing to IO. But,
LVM/DM part actually may some-how influence that, but I'm nit sure.

> 
> 
>>> 5) Is TGT multipath capable?
>>
>> Multipathing is much more an initiator concept, so I cannot see how
>> target side may affect that (unless it has some serious flaws with
>> reordering).
> 
> I think so also, but was a bit irritated that at the webpage of another
> implementation (istgt, [2]) it is explicitly mentioned that it is
> multipath capable.

Frankly speaking I do not know if there are linux target implementations
which do not support MPIO (your case). But, I had a look only on two of
them - IET and LIO.

MC/S is much more exotic.

Did you try to stop all but one iSCSI connection to eliminate multipathing?

> 
> Thanks for your help so far,
> 
> Sven
> 
> [1] http://libvirt.org/formatdomain.html#elementsDevices
> [2] http://www.peach.ne.jp/archives/istgt/
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list