[Pacemaker] Resource stop during migration
Tim Serong
tserong at novell.com
Fri Aug 27 09:50:19 UTC 2010
On 8/27/2010 at 03:22 PM, Michael Smith <msmith at cbnco.com> wrote:
> Hi,
>
> I have a pacemaker setup using the Xen resource agent and I've found
> something weird during migration: if a VM is in the middle of
> live-migrating from node 1 to node 2, and I stop the resource in crm,
> pacemaker forgets about the migration and immediately thinks the resource
> is stopped, although it doesn't actually call the stop action. Meanwhile,
> the migration continues and the VM ends up running on node 2.
I'd actually suggest opening a bug for that:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> This can cause problems: let's say you put both nodes into standby one
> after the other. The cluster starts migrating a VM from node 1 to node 2,
> then thinks it stops the resource when node 2 goes to standby, but the
> migration continues and the VM is left running on node 2.
>
> Later when the nodes are brought out of standby, the cluster starts the VM
> on node 1 and hoses the filesystem.
>
> Is there a way around this? I'm not sure there is a clean way to
> abort a Xen live migration, but even if there were, the cluster isn't
> calling any actions so there'd be no way to trigger the abort.
I don't know offhand if there's a way around this, sorry.
Anyone else?
Regards,
Tim
> I've tried with op_defaults record-pending="false" and "true", and with
> and without the monitor op on the Xen resource. Here's part of the log
> from a run with record-pending="false" and the following Xen primitive:
>
> primitive vm-test2 ocf:heartbeat:Xen \
> meta allow-migrate="true" target-role="Started" \
> op monitor interval="10" \
> params xmfile="/etc/xen/vm/vm-test2"
>
>
> Aug 26 15:55:49 xen-test1 pengine: [5147]: info: complex_migrate_reload:
> Migrating vm-test2 from xen-test1 to xen-test2
> Aug 26 15:55:49 xen-test1 pengine: [5147]: notice: LogActions: Migrate
> resource
> vm-test2 (Started xen-test1 -> xen-test2)
> Aug 26 15:55:52 xen-test1 pengine: [5147]: info: complex_migrate_reload:
> Migrating vm-test2 from xen-test1 to xen-test2
> Aug 26 15:55:52 xen-test1 pengine: [5147]: notice: LogActions: Migrate
> resource
> vm-test2 (Started xen-test1 -> xen-test2)
> Aug 26 15:55:58 xen-test1 lrmd: [5145]: info: rsc:vm-test2:40: migrate_to
>
> Aug 26 15:55:58 xen-test1 crmd: [5148]: info: te_rsc_command: Initiating
> action
> 27: migrate_to vm-test2_migrate_to_0 on xen-test1 (local)
>
> Aug 26 15:55:58 xen-test1 crmd: [5148]: info: process_lrm_event: LRM
> operation vm-test2_monitor_10000 (call=39, status=1, cib-update=0,
> confirmed=true) Cancelled
>
> Aug 26 15:55:58 xen-test1 Xen[17077]: [17109]: INFO: vm-test2: Starting xm
> migrate to xen-test2
>
>
> # "crm resource stop vm-test2" was run at this point
>
> Aug 26 15:56:07 xen-test1 crmd: [5148]: info: abort_transition_graph:
> need_abort:59 - Triggered transition abort (complete=0) : Non-status change
>
> Aug 26 15:56:07 xen-test1 cib: [5144]: info: log_data_element: cib:diff: +
> <nvpair id="vm-test2-meta_attributes-target-role" name="target-role"
> value="Stopped" __crm_diff_marker__="added:top" />
>
> Aug 26 15:56:49 xen-test1 Xen[17077]: [17504]: INFO: vm-test2: xm migrate to
> xen-test2 succeeded.
>
>
> cluster-glue-1.0.5-0.5.1
> corosync-1.2.1-0.5.1
> pacemaker-1.1.2-0.2.1
> resource-agents-1.0.3-0.3.2
>
>
> Thanks,
> Mike
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
More information about the Pacemaker
mailing list