[Pacemaker] Question about behavior of the post-failure during the migrate_to
Kazunori INOUE
kazunori.inoue3 at gmail.com
Wed Dec 18 10:56:20 UTC 2013
Hi,
When a node crashed while VM resource was migrating, the VM started
in two nodes. [1]
Is this the designed behavior?
[1]
Stack: corosync
Current DC: bl460g1n6 (3232261592) - partition with quorum
Version: 1.1.11-0.4.ce5d77c.git.el6-ce5d77c
3 Nodes configured
8 Resources configured
Online: [ bl460g1n6 bl460g1n8 ]
OFFLINE: [ bl460g1n7 ]
Full list of resources:
prmDummy (ocf::pacemaker:Dummy): Started bl460g1n6
prmVM2 (ocf::heartbeat:VirtualDomain): Started bl460g1n8
# ssh bl460g1n6 virsh list --all
Id Name State
----------------------------------------------------
113 vm2 running
# ssh bl460g1n8 virsh list --all
Id Name State
----------------------------------------------------
34 vm2 running
[Steps to reproduce]
1) Before migrate : vm2 running on bl460g1n7 (DC)
Stack: corosync
Current DC: bl460g1n7 (3232261593) - partition with quorum
Version: 1.1.11-0.4.ce5d77c.git.el6-ce5d77c
3 Nodes configured
8 Resources configured
Online: [ bl460g1n6 bl460g1n7 bl460g1n8 ]
Full list of resources:
prmDummy (ocf::pacemaker:Dummy): Started bl460g1n7
prmVM2 (ocf::heartbeat:VirtualDomain): Started bl460g1n7
...snip...
2) Migrate the VM resource,
# crm resource move prmVM2
bl460g1n6 was selected to migration destination.
Dec 18 14:11:36 bl460g1n7 crmd[6928]: notice: te_rsc_command:
Initiating action 47: migrate_to prmVM2_migrate_to_0 on bl460g1n7
(local)
Dec 18 14:11:36 bl460g1n7 lrmd[6925]: info:
cancel_recurring_action: Cancelling operation prmVM2_monitor_10000
Dec 18 14:11:36 bl460g1n7 crmd[6928]: info: do_lrm_rsc_op:
Performing key=47:5:0:ddf348fe-fbad-4abb-9a12-8250f71b075a
op=prmVM2_migrate_to_0
Dec 18 14:11:36 bl460g1n7 lrmd[6925]: info: log_execute:
executing - rsc:prmVM2 action:migrate_to call_id:33
Dec 18 14:11:36 bl460g1n7 crmd[6928]: info: process_lrm_event:
LRM operation prmVM2_monitor_10000 (call=31, status=1, cib-update=0,
confirmed=true) Cancelled
Dec 18 14:11:36 bl460g1n7 VirtualDomain(prmVM2)[7387]: INFO: vm2:
Starting live migration to bl460g1n6 (using remote hypervisor URI
qemu+ssh://bl460g1n6/system ).
3) And then, before migrate_to is completed after "virsh migrate"
in VirtualDomain was completed, I made bl460g1n7 crash.
As a result, vm2 was running in bl460g1n6 already, but it was
even started in bl460g1n8 by pacemaker. [1]
Dec 18 14:11:49 bl460g1n8 crmd[25981]: notice: process_lrm_event:
LRM operation prmVM2_start_0 (call=31, rc=0, cib-update=28,
confirmed=true) ok
Best Regards,
Kazunori INOUE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: VM-double-start.tar.bz2
Type: application/x-bzip2
Size: 140974 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131218/772681e6/attachment-0003.bz2>
More information about the Pacemaker
mailing list