[Pacemaker] problem with VM in pacemaker cluster

Yuriy Demchenko demchenko.ya at gmail.com
Wed Apr 10 10:59:28 UTC 2013


Hi,

I've set up 3-node cluster (2 active nodes + 1 standby for quorum), 
cman+pacemaker
Resources - "cxml-clone" gfs2 filesystem (cloned, run on both nodes) and 
"testVM" via heartbeat:VirtualDomain (domain xml located on gfs2 fs, 
cLVM disk backend), set up constraints: "cxml-clone" started first than 
"testVM" (symmetrical, according to description it'll be stopped in 
reverse order).
While manual migration of VM runs fine (pcs resource move testVM 
node-2/node-1) - succesfull live migration, VM runs uninterrupted, but 
when I'm trying to reboot/put in standby node running VM - everything is 
crashing, migration fails and node fenced.

 From logs i can see that resource "cxml" stopped first (or 
simultaneously, at least not waiting for vm migration to complete), then 
migration fails because xml not available.
> Apr 10 14:03:20 node-2 lrmd[2679]:   notice: operation_finished: 
> cxml_stop_0:3282 [ 2013/04/10_14:03:20 INFO: Running stop for 
> /dev/cstore/cxml on /mnt ]
> Apr 10 14:03:20 node-2 lrmd[2679]:   notice: operation_finished: 
> cxml_stop_0:3282 [ 2013/04/10_14:03:20 INFO: Trying to unmount /mnt ]
> Apr 10 14:03:20 node-2 lrmd[2679]:   notice: operation_finished: 
> cxml_stop_0:3282 [ 2013/04/10_14:03:20 INFO: unmounted /mnt successfully ]
> Apr 10 14:03:20 node-2 crmd[2682]:   notice: process_lrm_event: LRM 
> operation cxml_stop_0 (call=77, rc=0, cib-update=37, confirmed=true) ok
> Apr 10 14:03:21 node-2 lrmd[2679]:   notice: operation_finished: 
> testVM_migrate_to_0:3281 [ 2013/04/10_14:03:20 INFO: testvm: Starting 
> live migration to node-1 (using remote hypervisor URI 
> qemu+ssh://node-1/system ). ]
> Apr 10 14:03:21 node-2 lrmd[2679]:   notice: operation_finished: 
> testVM_migrate_to_0:3281 [ error: Requested operation is not valid: 
> domain is already active as 'testvm' ]
> Apr 10 14:03:21 node-2 lrmd[2679]:   notice: operation_finished: 
> testVM_migrate_to_0:3281 [ 2013/04/10_14:03:21 ERROR: testvm: live 
> migration to qemu+ssh://node-1/system  failed: 1 ]
> Apr 10 14:03:21 node-2 crmd[2682]:   notice: process_lrm_event: LRM 
> operation testVM_migrate_to_0 (call=75, rc=1, cib-update=38, 
> confirmed=true) unknown error
> Apr 10 14:03:21 node-2 lrmd[2679]:   notice: operation_finished: 
> testVM_stop_0:3392 [ 2013/04/10_14:03:21 ERROR: Configuration file 
> /mnt/testvm.xml does not exist or is not readable. ]
But wtf?! I've set up constraint, and "testVM" should be stopped/moved 
first, not "cxml"

What is wrong with my configuration, am I missing something?

logs and CIB in attach

-- 
Yuriy Demchenko

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib.xml
Type: text/xml
Size: 4142 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130410/f851ffc2/attachment-0003.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node-1.log
Type: text/x-log
Size: 6422 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130410/f851ffc2/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node-2.log
Type: text/x-log
Size: 6569 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130410/f851ffc2/attachment-0007.bin>


More information about the Pacemaker mailing list