[Pacemaker] VirtualDomain/DRBD live migration with pacemaker...

Mon Jun 14 17:01:24 EDT 2010

On Mon, Jun 14, 2010 at 4:37 PM, Erich Weiler <weiler at soe.ucsc.edu> wrote:
> Hi All,
>
> We have this interesting problem I was hoping someone could shed some light
> on.  Basically, we have 2 servers acting as a pacemaker cluster for DRBD and
> VirtualDomain (KVM) resources under CentOS 5.5.
>
> As it is set up, if one node dies, the other node promotes the DRBD devices
> to "Master", then starts up the VMs there (there is one DRBD device for each
> VM).  This works great.  I set the 'resource-stickiness="100"', and the vm
> resource score is 50, such that if a VM migrates to the other server, it
> will stay there until I specifically move it back manually.
>
> Now...  In the event of a failure of one server, all the VMs go to the other
> server.  When I fix the broken server and bring it back online, the VMs do
> not migrate back automatically because of the scoring I mentioned above.  I
> wanted this because when the VM goes back, it essentially has to shut down,
> then reboot on the other node.  I'm trying to avoid the 'shut down' part of
> it and do a live migration back to the first server.  But, I cannot figure
> out the exact sequence of events to do this in such that pacemaker will not
> reboot the VM somewhere in the process.  This is my configuration, with one
> VM called 'caweb':
>
> node vmserver1
> node vmserver2
> primitive caweb-vd ocf:heartbeat:VirtualDomain \
>        params config="/etc/libvirt/qemu/caweb.xml"
> hypervisor="qemu:///system" \
>        meta allow-migrate="false" target-role="Started" \
>        op start interval="0" timeout="120s" \
>        op stop interval="0" timeout="120s" \
>        op monitor interval="10" timeout="30" depth="0"
> primitive drbd-caweb ocf:linbit:drbd \
>        params drbd_resource="caweb" \
>        op monitor interval="15s" \
>        op start interval="0" timeout="240s" \
>        op stop interval="0" timeout="100s"
> ms ms-drbd-caweb drbd-caweb \
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> location caweb-prefers-vmserver1 caweb-vd 50: vmserver1
> colocation caweb-vd-on-drbd inf: caweb-vd ms-drbd-caweb:Master
> order caweb-after-drbd inf: ms-drbd-caweb:promote caweb-vd:start
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore" \
>        last-lrm-refresh="1276538859"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="100"
>
> One thing I tried, in an effort to do a live migration from vmserver2 to
> vmserver1 and afterward tell pacemaker to 're-acquire' the current state of
> things without a VM reboot, was:
>
> vmserver1# crm resource unmanage caweb-vd
> vmserver1# crm resource unmanage ms-drbd-caweb
> vmserver1# drbdadm primary caweb   <--make dual primary
>
> (then back on vmserver2...)
>
> vmserver2# virsh migrate --live caweb qemu+ssh://hgvmserver1.local/system
> vmserver2# drbdadm secondary caweb  <--disable dual primary
> vmserver2# crm resource manage ms-drbd-caweb
> vmserver2# crm resource manage caweb-vd
> vmserver2# crm resource cleanup ms-drbd-caweb
> vmserver2# crm resource cleanup caweb-vd
> vmserver2# crm resource refresh
> vmserver2# crm resource reprobe
> vmserver2# crm resource start caweb-vd
>
> at this point the VM has live migrated and is still online.
>
> [wait 120 seconds for caweb-vd start timeouts to expire]
>
> For a moment I thought it had worked, but then pacemaker put the device in
> an error mode and it was shut down...  After bringing a resource(s) back
> into 'managed' mode, is there any way to tell pacemaker to 'figure things
> out' without restarting the resources?  Or is this impossible because the VM
> resources is dependent on the DRBD resource, and it has trouble figuring out
> stacked resources without restarting them?
>
> Or - does anyone know another way to manually live migrate a
> pacemaker/VirtualDomain managed VM (with DRBD) without having to reboot the
> VM after the live migrate?
>
> Thanks in advance for any clues!!  BTW, I am using pacemaker 1.0.8 and DRBD
> 83.

I know what the problem is, how to solve it, that's another issue :)
In order to be able to do live migration you have to be able to access
the same storage from two different nodes at the time of migration.
So, you have to add allow-two-primaries to your DRBD definition, and also
options drbd disable_sendpage=1
into /etc/modprobe.conf

You don't have much of a choice here (at least I don't know one), but
to run drbd as primary/primary (master-max="2" master-node-max="1")
all the time
and to hope cluster will prevent running two KVM at the same time.