[Pacemaker] VirtualDomain/DRBD live migration with pacemaker...

Tue Jun 15 14:25:09 EDT 2010

On 06/14/2010 11:01 PM, Vadym Chepkov wrote:
> On Mon, Jun 14, 2010 at 4:37 PM, Erich Weiler<weiler at soe.ucsc.edu>  wrote:
>> Hi All,
>>
>> We have this interesting problem I was hoping someone could shed some light
>> on.  Basically, we have 2 servers acting as a pacemaker cluster for DRBD and
>> VirtualDomain (KVM) resources under CentOS 5.5.
>>
>> As it is set up, if one node dies, the other node promotes the DRBD devices
>> to "Master", then starts up the VMs there (there is one DRBD device for each
>> VM).  This works great.  I set the 'resource-stickiness="100"', and the vm
>> resource score is 50, such that if a VM migrates to the other server, it
>> will stay there until I specifically move it back manually.
>>
>> Now...  In the event of a failure of one server, all the VMs go to the other
>> server.  When I fix the broken server and bring it back online, the VMs do
>> not migrate back automatically because of the scoring I mentioned above.  I
>> wanted this because when the VM goes back, it essentially has to shut down,
>> then reboot on the other node.  I'm trying to avoid the 'shut down' part of
>> it and do a live migration back to the first server.  But, I cannot figure
>> out the exact sequence of events to do this in such that pacemaker will not
>> reboot the VM somewhere in the process.  This is my configuration, with one
>> VM called 'caweb':
>>
>> node vmserver1
>> node vmserver2
>> primitive caweb-vd ocf:heartbeat:VirtualDomain \
>>         params config="/etc/libvirt/qemu/caweb.xml"
>> hypervisor="qemu:///system" \
>>         meta allow-migrate="false" target-role="Started" \
>>         op start interval="0" timeout="120s" \
>>         op stop interval="0" timeout="120s" \
>>         op monitor interval="10" timeout="30" depth="0"
>> primitive drbd-caweb ocf:linbit:drbd \
>>         params drbd_resource="caweb" \
>>         op monitor interval="15s" \
>>         op start interval="0" timeout="240s" \
>>         op stop interval="0" timeout="100s"
>> ms ms-drbd-caweb drbd-caweb \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Started"
>> location caweb-prefers-vmserver1 caweb-vd 50: vmserver1
>> colocation caweb-vd-on-drbd inf: caweb-vd ms-drbd-caweb:Master
>> order caweb-after-drbd inf: ms-drbd-caweb:promote caweb-vd:start
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
>>         cluster-infrastructure="openais" \
>>         expected-quorum-votes="2" \
>>         stonith-enabled="false" \
>>         no-quorum-policy="ignore" \
>>         last-lrm-refresh="1276538859"
>> rsc_defaults $id="rsc-options" \
>>         resource-stickiness="100"
>>
>> One thing I tried, in an effort to do a live migration from vmserver2 to
>> vmserver1 and afterward tell pacemaker to 're-acquire' the current state of
>> things without a VM reboot, was:
>>
>> vmserver1# crm resource unmanage caweb-vd
>> vmserver1# crm resource unmanage ms-drbd-caweb
>> vmserver1# drbdadm primary caweb<--make dual primary
>>
>> (then back on vmserver2...)
>>
>> vmserver2# virsh migrate --live caweb qemu+ssh://hgvmserver1.local/system
>> vmserver2# drbdadm secondary caweb<--disable dual primary
>> vmserver2# crm resource manage ms-drbd-caweb
>> vmserver2# crm resource manage caweb-vd
>> vmserver2# crm resource cleanup ms-drbd-caweb
>> vmserver2# crm resource cleanup caweb-vd
>> vmserver2# crm resource refresh
>> vmserver2# crm resource reprobe
>> vmserver2# crm resource start caweb-vd
>>
>> at this point the VM has live migrated and is still online.
>>
>> [wait 120 seconds for caweb-vd start timeouts to expire]
>>
>> For a moment I thought it had worked, but then pacemaker put the device in
>> an error mode and it was shut down...  After bringing a resource(s) back
>> into 'managed' mode, is there any way to tell pacemaker to 'figure things
>> out' without restarting the resources?  Or is this impossible because the VM
>> resources is dependent on the DRBD resource, and it has trouble figuring out
>> stacked resources without restarting them?
>>
>> Or - does anyone know another way to manually live migrate a
>> pacemaker/VirtualDomain managed VM (with DRBD) without having to reboot the
>> VM after the live migrate?
>>
>> Thanks in advance for any clues!!  BTW, I am using pacemaker 1.0.8 and DRBD
>> 83.
>
>
> I know what the problem is, how to solve it, that's another issue :)
> In order to be able to do live migration you have to be able to access
> the same storage from two different nodes at the time of migration.
> So, you have to add allow-two-primaries to your DRBD definition, and also
> options drbd disable_sendpage=1
> into /etc/modprobe.conf
>
> You don't have much of a choice here (at least I don't know one), but
> to run drbd as primary/primary (master-max="2" master-node-max="1")
> all the time
> and to hope cluster will prevent running two KVM at the same time.

Has anybody played with this yet:
http://www.linux-kvm.com/content/qemu-kvm-012-adds-block-migration-feature

Technically something like this should make it possible to do a live 
migration event when not using shared storage. I thought this would be big 
news since that could potentially simplify infrastructures a lot and make 
HA setups possible even without throwing lots of money at an expensive SAN.
Unfortunately I haven't read anything about this since the blog entry appeared.

Regards,
   Dennis