[Pacemaker] KVM live migration and multipath
Vladislav Bogdanov
bubble at hoster-ok.com
Thu Jun 20 05:13:45 UTC 2013
20.06.2013 02:33, Sven Arnold wrote:
> Hi All,
>
> it asked this on linux-ha but had no luck. Is anybody here who has some
> hints for me or could tell me if it is possible (and sensible) to live
> migrate a virtual machine if the disk image is provided by a multipath
> device?
>
> I am not sure if my approach is flawed or if I am using the wrong or
> misconfigured tools.
>
> Thanks a lot and sorry for crossposting,
>
> Sven
>
>
>
> Dear All,
>
> I have set up a three node cluster with shared storage (DRBD
> active/passive) which exports iSCSI Volumes (TGT) containing KVM/QEMU
> disk images.
>
> The iSCSI Target is configured as one resource and accessible on two
> floating ip addresses to allow multipath I/O for speed and redundancy.
>
> The VM hosts are accessing the volumes via open-isci using dm-multipath
> (grouping_policy multibus).
>
> While migrating the iSCSI Target from A to B everything works fine.
> But if I try to live migrate a virtual machine I experience file system
> corruptions inside the virtual machine. So, somehow the switching of the
> iSCSI/Multipath Sessions is not handled properly by the VM hosts.
I think the problem should be unrelated to iSCSI, you have correct setup
(of course I did not thoroughly look through all info, but idea is
perfectly correct).
Did you turn caching off for your VMs disks?
>
> I have configured iSCSI timeouts rather short (noop_out_timeout 5
> seconds) and "no_path_retry queue" on the multipath device.
>
> My question(s):
>
> 1) Is it conceptually wrong what I am trying to accomplish?
No, I use almost the same setup in production. Except I use IET and I
have cLVM on top of luns.
>
> 3) Is it valid to use "no_path_retry queue" in such a setup?
Yes, absolutely.
>
> 4) Did I miss some important configuration options (timings, etc.)?
As you use 'queue', timings are mostly not important.
>
> 5) Is TGT multipath capable?
Multipathing is much more an initiator concept, so I cannot see how
target side may affect that (unless it has some serious flaws with
reordering).
>
>
> Thank you all for any hints,
>
> Sven
>
> ===== Additional Information below: =====
>
> - Cluster Layout
> - Environment
> - multipath configuration
> - iSCSI Timings
> - cib configuration (simplified and sorted)
>
> Cluster Layout:
> ---------------
> A B C
> (active) <--- DRBD ---> (passive)
>
> iSCSI Target
> ip0 ip1 -- failover-->
> (floating IPs)
>
> ---------------- iSCSI Initiator -----------------------
> (two pathes)
>
> ---------------- Multipath I/O -----------------------
>
> ---------------- libvirt/KVM -----------------------
>
> <------------------------------ failover ----- VM1
>
>
> Environment:
> ------------
>
> Ubuntu 12.04.2 LTS
> kernel 3.5.0.34
> corosync 1.4.2-2
> cman 3.1.7-0ubuntu2.1
> pacemaker 1.1.6-2ubuntu3
> resource-agents 1:3.9.2-5ubuntu4.1
> tgt 1:1.0.17-1ubuntu2
> open-iscsi 2.0.871-0ubuntu9.12.04.2
> multipath-tools 0.4.9-3ubuntu5
>
> multipath.conf:
> ---------------
>
> defaults {
> udev_dir /dev
> polling_interval 10
> path_selector "round-robin 0"
> path_grouping_policy multibus
> path_checker readsector0
> rr_min_io 100
> max_fds 8192
> rr_weight priorities
> failback immediate
> no_path_retry queue
> }
>
>
> iscsi timeouts (from /etc/iscsid/iscsi.conf):
> ---------------------------------------------
>
> node.conn[0].timeo.logout_timeout = 15
> node.conn[0].timeo.login_timeout = 15
> node.conn[0].timeo.auth_timeout = 45
> node.conn[0].timeo.noop_out_interval = 5
> node.conn[0].timeo.noop_out_timeout = 5
>
> cib configuration (excerpt, slightly modified):
> -----------------------------------------------
>
> primitive p-drbd-r0 ocf:linbit:drbd \
> params drbd_resource="r0" \
> op monitor interval="15"
> ms ms-drbd-r0 p-drbd-r0 \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> primitive p-vg_drbd ocf:heartbeat:LVM \
> params volgrpname="vg_drbd" \
> op monitor interval="30s" timeout="30s" \
> op start interval="0" timeout="30s" \
> op stop interval="0" timeout="30s"
> primitive p-iscsi-target ocf:heartbeat:iSCSITarget \
> params iqn="<iscsi iqn>" tid="1" implementation="tgt"
> allowed_initiators="<..initiator ips..>" \
> op monitor interval="15s"
> primitive p-lun1-vm_disk ocf:heartbeat:iSCSILogicalUnit \
> params target_iqn="iqn.2013-03.de.localite:storage" lun="1"
> path="/dev/vg_drbd/vm_disk" implementation="tgt" vendor_id="STGT"
> primitive p-iscsiip0 ocf:heartbeat:IPaddr2 \
> params ip="10.223.101.131" nic="eth2" cidr_netmask="26" \
> op monitor interval="20s"
> primitive p-iscsiip1 ocf:heartbeat:IPaddr2 \
> params ip="10.223.101.195" nic="eth3" cidr_netmask="26" \
> op monitor interval="20s"
> group rg-iscsitarget p-iscsi-target p-lun1-vm_disk p-iscsiip0 p-iscsiip1
> primitive p-iscsi-initiator lsb:open-iscsi \
> op monitor interval="30s"
> clone clone-iscsiinitiator p-iscsi-initiator \
> meta interleave="true"
> primitive p-libvirtd lsb:libvirt-bin \
> op monitor interval="30s"
> clone clone-libvirtd p-libvirtd \
> meta interleave="true"
> primitive p-vm ocf:heartbeat:VirtualDomain \
> params config="/etc/libvirt/qemu/vm.xml"
> migration_transport="tls" \
> meta allow-migrate="true" \
> op start interval="0" timeout="250s" \
> op stop interval="0" timeout="300s" \
> op monitor interval="60s" timeout="30s" \
> op migrate_from interval="0" timeout="300s" \
> op migrate_to interval="0" timeout="300s"
> colocation col-iscsitarget_on_drbd inf: rg-iscsitarget ms-drbd-r0:Master
> order o-drbd-r0_before_vg inf: ms-drbd-r0:promote p-vg_drbd:start
> order o-vg-drbd-r0_before_iscsitarget inf: p-vg_drbd rg-iscsitarget
> order o-iscsitarget_before_iscsiinitiator 0: rg-iscsitarget
> clone-iscsiinitiator
> order o-iscsiinitiator_before_libvirt 0: clone-iscsiinitiator
> clone-libvirtd
> order o-libvirt_before_vm inf: clone-libvirtd p-vm
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list