[Pacemaker] Resource stop during migration

Fri Aug 27 09:50:19 UTC 2010

On 8/27/2010 at 03:22 PM, Michael Smith <msmith at cbnco.com> wrote: 
> Hi, 
>  
> I have a pacemaker setup using the Xen resource agent and I've found  
> something weird during migration: if a VM is in the middle of  
> live-migrating from node 1 to node 2, and I stop the resource in crm,  
> pacemaker forgets about the migration and immediately thinks the resource  
> is stopped, although it doesn't actually call the stop action. Meanwhile,  
> the migration continues and the VM ends up running on node 2. 

I'd actually suggest opening a bug for that:

  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

> This can cause problems: let's say you put both nodes into standby one  
> after the other. The cluster starts migrating a VM from node 1 to node 2,  
> then thinks it stops the resource when node 2 goes to standby, but the  
> migration continues and the VM is left running on node 2. 
>  
> Later when the nodes are brought out of standby, the cluster starts the VM  
> on node 1 and hoses the filesystem. 
> 
> Is there a way around this? I'm not sure there is a clean way to  
> abort a Xen live migration, but even if there were, the cluster isn't  
> calling any actions so there'd be no way to trigger the abort. 

I don't know offhand if there's a way around this, sorry.

Anyone else?

Regards,

Tim

> I've tried with op_defaults record-pending="false" and "true", and with  
> and without the monitor op on the Xen resource. Here's part of the log  
> from a run with record-pending="false" and the following Xen primitive: 
>  
> primitive vm-test2 ocf:heartbeat:Xen \ 
> 	meta allow-migrate="true" target-role="Started" \ 
> 	op monitor interval="10" \ 
> 	params xmfile="/etc/xen/vm/vm-test2" 
>  
>  
> Aug 26 15:55:49 xen-test1 pengine: [5147]: info: complex_migrate_reload:  
> Migrating vm-test2 from xen-test1 to xen-test2 
> Aug 26 15:55:49 xen-test1 pengine: [5147]: notice: LogActions: Migrate  
> resource 
> vm-test2        (Started xen-test1 -> xen-test2) 
> Aug 26 15:55:52 xen-test1 pengine: [5147]: info: complex_migrate_reload:  
> Migrating vm-test2 from xen-test1 to xen-test2 
> Aug 26 15:55:52 xen-test1 pengine: [5147]: notice: LogActions: Migrate  
> resource 
> vm-test2        (Started xen-test1 -> xen-test2) 
> Aug 26 15:55:58 xen-test1 lrmd: [5145]: info: rsc:vm-test2:40: migrate_to 
>  
> Aug 26 15:55:58 xen-test1 crmd: [5148]: info: te_rsc_command: Initiating  
> action 
> 27: migrate_to vm-test2_migrate_to_0 on xen-test1 (local) 
>  
> Aug 26 15:55:58 xen-test1 crmd: [5148]: info: process_lrm_event: LRM  
> operation vm-test2_monitor_10000 (call=39, status=1, cib-update=0,  
> confirmed=true) Cancelled 
>  
> Aug 26 15:55:58 xen-test1 Xen[17077]: [17109]: INFO: vm-test2: Starting xm  
> migrate to xen-test2 
>  
>  
> # "crm resource stop vm-test2" was run at this point 
>  
> Aug 26 15:56:07 xen-test1 crmd: [5148]: info: abort_transition_graph:  
> need_abort:59 - Triggered transition abort (complete=0) : Non-status change 
>  
> Aug 26 15:56:07 xen-test1 cib: [5144]: info: log_data_element: cib:diff: + 
>      <nvpair id="vm-test2-meta_attributes-target-role" name="target-role"  
> value="Stopped" __crm_diff_marker__="added:top" /> 
>  
> Aug 26 15:56:49 xen-test1 Xen[17077]: [17504]: INFO: vm-test2: xm migrate to  
> xen-test2 succeeded. 
>  
>  
> cluster-glue-1.0.5-0.5.1 
> corosync-1.2.1-0.5.1 
> pacemaker-1.1.2-0.2.1 
> resource-agents-1.0.3-0.3.2 
>  
>  
> Thanks, 
> Mike 
>  
> _______________________________________________ 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
>  
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs:  
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker 
>