[Pacemaker] Migration of "lower" resource causes dependent resources to restart

Mon Apr 9 21:51:21 UTC 2012

----- Original Message -----
> From: "Vladislav Bogdanov" <bubble at hoster-ok.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Wednesday, April 4, 2012 2:39:53 AM
> Subject: Re: [Pacemaker] Migration of "lower" resource causes dependent resources to restart
> 
> 04.04.2012 02:12, Andrew Beekhof wrote:
> > On Fri, Mar 30, 2012 at 7:10 PM, Florian Haas <florian at hastexo.com>
> > wrote:
> >> On Thu, Mar 29, 2012 at 8:35 AM, Andrew Beekhof
> >> <andrew at beekhof.net> wrote:
> >>> On Thu, Mar 29, 2012 at 5:28 PM, Vladislav Bogdanov
> >>> <bubble at hoster-ok.com> wrote:
> >>>> Hi Andrew, all,
> >>>>
> >>>> Pacemaker restarts resources when resource they depend on
> >>>> (ordering
> >>>> only, no colocation) is migrated.
> >>>>
> >>>> I mean that when I do crm resource migrate lustre, I get
> >>>>
> >>>> LogActions: Migrate lustre#011(Started lustre03-left ->
> >>>> lustre04-left)
> >>>> LogActions: Restart mgs#011(Started lustre01-left)
> >>>>
> >>>> I only have one ordering constraint for these two resources:
> >>>>
> >>>> order mgs-after-lustre inf: lustre:start mgs:start
> >>>>
> >>>> This reminds me what have been with reload in a past (dependent
> >>>> resource
> >>>> restart when "lower" resource is reloaded).
> >>>>
> >>>> Shouldn't this be changed? Migration usually means that service
> >>>> is not
> >>>> interrupted...
> >>>
> >>> Is that strictly true?  Always?
> >>
> >> No. Few things are always true. :) However, see below.
> >>
> >>> My understanding was although A thinks the migration happens
> >>> instantaneously, it is in fact more likely to be
> >>> pause+migrate+resume
> >>> and during that time anyone trying to talk to A during that time
> >>> is
> >>> going to be disappointed.
> >>
> >> I tend to be with Vladislav on this one. The thing that most
> >> people
> >> would expect from a "live migration" is that it's interruption
> >> free.
> >> And what allow-migrate was first implemented for (iirc), live
> >> migrations for Xen, does fulfill that expectation. Same thing is
> >> true
> >> for live migrations in libvirt/KVM, and I think anyone would
> >> expect
> >> essentially the same thing from checkpoint/restore migrations
> >> where
> >> they're available.
> >>
> >> So I guess it's reasonable to assume that if one resource
> >> migrates,
> >> dependent resources need not be restarted.
> > 
> > Ok, could someone file a bug requesting the new behaviour please?
> 
> cl#5055

Hey,

I researched issue 5055, http://bugs.clusterlabs.org/show_bug.cgi?id=5055, today and had difficulty coming up with an elegant solution for the problem.  I am going to brain dump a summary of what I ran into.  I haven't been working on pacemaker long so maybe one of you will see an angle on the issue I don't see.

The root of the problem is that the policy engine does not calculate migration actions, it only calculates stop and start actions.  Right before the final graph is generated the policy engine attempts to inject the migration actions into the final results when possible.  Take a look at pengine/allocate.c's stage7() function to see this happen.

Below are a couple of examples that help illustrate what I mean.

___ Simple migration example
We have resources A which is being moved from node 1 to node 2.  After all the actions, location, and order constraints are processed the graph looks like this.

* A stop on node1
* all_stopped
* A start on node2

The last thing we do right before the graph is generated is attempt to detect reloads and migrations from the current graph state after everything is calculated.  In this case the policy engine detects that A is moving to another node and is capable of using the migrate actions.  The migrate_to and migrate_from actions are injected into the graph right before the stop action.  The final result is this.

* A migrate_to on node1
* A migrate_from on node2
* A stop on node1
* all_stopped
* A start node2 (pseudo action)

___ Migration example with single order constraint.

Take the first example and another resource, 'B', and the order constraint 'A then B'

After all the calculations are done, the graph looks like this.

* B stop on node1
* A stop on node1
* all_stopped
* A start node2
* B start on node1

Just like in the first example we detect the migration is possible and inject it into the graph right before A's stop action.

* B stop on node1
* A migrate_to on node1
* A migrate_from on node2
* A stop on node1
* all_stopped
* A start node2 (pseudo action)
* B start on node1

Given the results above, we'd like to see the order constraint 'A then B' not be exercised since we know the A resource isn't actually starting/stopping. The problem is that migrations are injected after the order constraint calculations are complete.  The only reason it is safe to inject the migration actions into the graph at the end is because we know no more calculations are going to be performed.  Any attempt to re-calculate the order constraints based on migration actions at this point invalidates this rule, which will in turn cause all sorts of complications.

If we want order constraints to behave nicely with the migrate actions, the only solution I see is to look into re-architecting how migration actions are calculated by throwing them into the mix with everything else.  Anyone have a perspective on this I may be missing?

-- Vossel