[Pacemaker] VirtualDomain/DRBD live migration with pacemaker...

Mon Jun 14 20:37:04 UTC 2010

Hi All,

We have this interesting problem I was hoping someone could shed some 
light on.  Basically, we have 2 servers acting as a pacemaker cluster 
for DRBD and VirtualDomain (KVM) resources under CentOS 5.5.

As it is set up, if one node dies, the other node promotes the DRBD 
devices to "Master", then starts up the VMs there (there is one DRBD 
device for each VM).  This works great.  I set the 
'resource-stickiness="100"', and the vm resource score is 50, such that 
if a VM migrates to the other server, it will stay there until I 
specifically move it back manually.

Now...  In the event of a failure of one server, all the VMs go to the 
other server.  When I fix the broken server and bring it back online, 
the VMs do not migrate back automatically because of the scoring I 
mentioned above.  I wanted this because when the VM goes back, it 
essentially has to shut down, then reboot on the other node.  I'm trying 
to avoid the 'shut down' part of it and do a live migration back to the 
first server.  But, I cannot figure out the exact sequence of events to 
do this in such that pacemaker will not reboot the VM somewhere in the 
process.  This is my configuration, with one VM called 'caweb':

node vmserver1
node vmserver2
primitive caweb-vd ocf:heartbeat:VirtualDomain \
         params config="/etc/libvirt/qemu/caweb.xml" 
hypervisor="qemu:///system" \
         meta allow-migrate="false" target-role="Started" \
         op start interval="0" timeout="120s" \
         op stop interval="0" timeout="120s" \
         op monitor interval="10" timeout="30" depth="0"
primitive drbd-caweb ocf:linbit:drbd \
         params drbd_resource="caweb" \
         op monitor interval="15s" \
         op start interval="0" timeout="240s" \
         op stop interval="0" timeout="100s"
ms ms-drbd-caweb drbd-caweb \
         meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" target-role="Started"
location caweb-prefers-vmserver1 caweb-vd 50: vmserver1
colocation caweb-vd-on-drbd inf: caweb-vd ms-drbd-caweb:Master
order caweb-after-drbd inf: ms-drbd-caweb:promote caweb-vd:start
property $id="cib-bootstrap-options" \
         dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
         cluster-infrastructure="openais" \
         expected-quorum-votes="2" \
         stonith-enabled="false" \
         no-quorum-policy="ignore" \
         last-lrm-refresh="1276538859"
rsc_defaults $id="rsc-options" \
         resource-stickiness="100"

One thing I tried, in an effort to do a live migration from vmserver2 to 
vmserver1 and afterward tell pacemaker to 're-acquire' the current state 
of things without a VM reboot, was:

vmserver1# crm resource unmanage caweb-vd
vmserver1# crm resource unmanage ms-drbd-caweb
vmserver1# drbdadm primary caweb   <--make dual primary

(then back on vmserver2...)

vmserver2# virsh migrate --live caweb qemu+ssh://hgvmserver1.local/system
vmserver2# drbdadm secondary caweb  <--disable dual primary
vmserver2# crm resource manage ms-drbd-caweb
vmserver2# crm resource manage caweb-vd
vmserver2# crm resource cleanup ms-drbd-caweb
vmserver2# crm resource cleanup caweb-vd
vmserver2# crm resource refresh
vmserver2# crm resource reprobe
vmserver2# crm resource start caweb-vd

at this point the VM has live migrated and is still online.

[wait 120 seconds for caweb-vd start timeouts to expire]

For a moment I thought it had worked, but then pacemaker put the device 
in an error mode and it was shut down...  After bringing a resource(s) 
back into 'managed' mode, is there any way to tell pacemaker to 'figure 
things out' without restarting the resources?  Or is this impossible 
because the VM resources is dependent on the DRBD resource, and it has 
trouble figuring out stacked resources without restarting them?

Or - does anyone know another way to manually live migrate a 
pacemaker/VirtualDomain managed VM (with DRBD) without having to reboot 
the VM after the live migrate?

Thanks in advance for any clues!!  BTW, I am using pacemaker 1.0.8 and 
DRBD 83.

Cheers,
-erich