[Pacemaker] KVM live migration and multipath

Sat Jun 22 15:37:09 CEST 2013

Hi,

I am getting closer... Some updates for those who are interested.

>>> Did you turn caching off for your VMs disks?
>>
>> That's a point. Indeed caching was not explicitely turned off and I just
>> noticed that the default setting of the cache attribute of the device
>> tag in libvirt has changed. [1]
>> I would expect that libvirt flushes all caches before finalizing the
>> migration process. But it is probably best to turn off caches anyway.
>>
>> I have now configured:
>>
>> <disk type='block' device='disk'>
>>        <driver name='qemu' type='raw' cache='none'/>
>
> I would also switch to a native IO (aio) if your kernel/qemu support
> that. Otherwise qemu allocates several dedicated IO threads, and it is
> much slower that aio. There were some problems with aio in the past, but
> it should work ok for recent enough distros.
>

This is interesting. After switching to native io out of curiosity:

<driver name='qemu' type='raw' cache='none' io='native'/>

the situation looked much better - to my surprise I did not experience 
further corruptions with this virtual machine.

Then I added a second and third vm to the setup only to get errors again 
on those machines. I noticed that those additional vms had older qemu 
machine types (pc-0.11 and pc-0.12) set. After upgrading the domains to 
machine type pc-1.0:

<os>
     <type arch='x86_64' machine='pc-1.0'>hvm</type>
     <boot dev='hd'/>
</os>

I did not trigger file system corruptions again. So, at this moment it 
looks like it is important to:
- turn caching off
- use native aio
- *and* use an up-to-date machine type

Failure to meet any of these criteria would result in fs corruption.
Does this make sense at all?

>
> May be that may depend on combination of libvirt/qemu versions and
> migration mode used?

qemu is at 1.0 (1.0+noroms-0ubuntu14.8)
libvirt is at 0.9.8 (0.9.8-2ubuntu17.10)

> And, do you always have fs corruption, independently of IO load?
>

I seems so that I have to create some IO to trigger the corruption.

>
> Did you try to stop all but one iSCSI connection to eliminate multipathing?
>

Not exactly. This would be what I would do next if I have still problems.
What I did, was to use one iSCSI path directly (by using 
/dev/disk/by-path/... as the source of the block device). This seemed to 
work - but it is hard to tell if I just did not trigger a bug in my setup.

That everything worked with a single path (or at least seemed so) is not 
consistent with the observations above. Therefore I still do not trust 
the setup and will do some more long time tests.

May I ask a few more questions?

Do you manage the multipath daemon with pacemaker? In my setup multipath 
is started at boot time and not managed by pacemaker.

Where do you loose the dependencies between targets and initiator?
I use two advisory orders:

order o-iscsitarget_before_iscsiinitiator 0: rg-iscsitarget 
clone-iscsiinitiator

order o-iscsiinitiator_before_libvirt 0: clone-iscsiinitiator 
clone-libvirtd

to have the possibility to restart targets (needed for failover) and to 
restart iscsi initiators (to scan for new targets easily). Is this good 
practice?

Thanks a lot and best regards,

Sven