[Pacemaker] VM live migration with pacemaker-1.1.6
Luca Lesinigo
luca at lm-net.it
Wed Jun 20 07:38:46 EDT 2012
Hello list.
I'm an happy user of pacemaker-1.0 + corosync + DRBD with the usual active/passive dual node setup.
It doesn't do live migration (we also run drbd master/slave and not dual master) and gets funny under heavy I/O load (VM check script unresponsive, the cluster thinks a node has problems, starts to migrate things around, etc etc. - could be entirely my config's fault but that's another story), but it is serving me well since some years ago. We were 'early adopters' of pacemaker on Gentoo Linux back when it wasn't even included in the distro (we did collaborate with a Gentoo dev who was putting together what would become the current ebuilds for the cluster stack).
Now we're thinking to upgrade to true external shared storage and our target would be something like the Dell MD3220 array, basically it's a shared SAS unit and it presents LUNs to all (up to 4) attached servers[*]. That also means all LUNs are always available for concurrent read&write to all nodes.
We are also targeting Xen as hypervisor (because that's what we're already using) and Ubuntu 12.04 LTS as the server's operating system / Domain-0 (because we're already familiar with it and because of its 5 year support). Ideally we won't have any other physical server to manage the cluster "from the outside" so a general-purpose operating system on the nodes is a must (as opposed to things like vSphere or maybe XenCluster, but I don't know the latter really well).
I would implement node fencing using the lights-out IPMI management, it's on a separate network and every node in the cluster has access to every IPMI board of the other nodes.
We'd like to have a rock solid system and live migration of virtual machines is a must.
In the past I knew pacemaker wasn't able to live migrate resources but some research suggests that it is now possible.
So I'm reaching out to this list to ask:
- if I can get pacemaker-1.1.6 to live migrate Xen VMs
- if anyone already has experience in doing that over shared SAS infrastructure and how it works in production
- I assume user-commanded live migration (with all nodes up and running) shouldn't pose any problem
- also a failed-node migration (not live, of course) should work ok
- what could happen if all SAS links between a single node and the storage stop working?
(ie, storage array working, storage management IP responding, node working, but node can't access actual LUNs)
Thank you for any help and any experience you have to report, it will be really appreciated.
[*] we'd like to use shared SAS instead of iSCSI because it's simpler and should give better performance, given that we know we won't ever grow over 4 nodes for a single storage array. I'm doing some research on both the SAS vs iSCSI side and on the software stack side (it's actually this email) before getting to the final choice between the two.
--
Luca Lesinigo
More information about the Pacemaker
mailing list