[Pacemaker] bug in ordering syntax?
Frank DiMeo
Frank.DiMeo at bigbandnet.com
Wed Dec 2 22:26:22 UTC 2009
I turned up the logging level in the pengine during processing of the rsc_order section. This shows the loop being formed between world2 and world1 resources, but only for stopping, not for starting.
-Frank
> -----Original Message-----
> From: Frank DiMeo [mailto:Frank.DiMeo at bigbandnet.com]
> Sent: Wednesday, December 02, 2009 2:59 PM
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] bug in ordering syntax?
>
> Here's a two resource version of the same issue. It's easy to see the
> loop here.
>
> -Frank
>
> > -----Original Message-----
> > From: Frank DiMeo [mailto:Frank.DiMeo at bigbandnet.com]
> > Sent: Wednesday, December 02, 2009 2:13 PM
> > To: pacemaker at oss.clusterlabs.org
> > Subject: Re: [Pacemaker] bug in ordering syntax?
> >
> > Here's the output of ptest for the pe-input-***.bz2 file that's
> > created when I put ubuntu_2 into standby and the cluster tries to
> move
> > my 4 resources from ubuntu_2 to ubuntu_1 (while running the compact
> > ordering syntax with a score of INFINITY).
> >
> > I've converted it to a .png for your viewing pleasure.
> >
> > -Frank
> >
> > > -----Original Message-----
> > > From: Andrew Beekhof [mailto:andrew at beekhof.net]
> > > Sent: Wednesday, December 02, 2009 6:00 AM
> > > To: pacemaker at oss.clusterlabs.org
> > > Subject: Re: [Pacemaker] bug in ordering syntax?
> > >
> > > On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo
> > > <Frank.DiMeo at bigbandnet.com> wrote:
> > > > I'm experimenting with startup sequence and co-location control,
> > and
> > > think I
> > > > may have stumbled across a bug.
> > > >
> > > >
> > > >
> > > > I have two xml files that I use in my testing as my initial
> > > configuration of
> > > > a two node cluster. I start each node with no configuration, and
> > > then use
> > > > cibadmin to "source in" the xml file. Each file defines two
> > > resources as
> > > > well as a startup order and collocation definition. The only
> > > difference
> > > > between the two files is the syntax I use to specify the startup
> > > order.
> > > >
> > > >
> > > >
> > > > When I use the syntax:
> > > >
> > > >
> > > >
> > > > <rsc_order id="order-1" first="world1" then="world2"
> > score="INFINITY"
> > > />
> > > >
> > > >
> > > >
> > > > Everything works fine. I can put either of the two nodes into
> > > standby while
> > > > resources are running there, and the resources move to the other
> > > > node
> > > as
> > > > expected.
> > > >
> > > >
> > > >
> > > > However, when I use the syntax:
> > > >
> > > >
> > > >
> > > > - <<rsc_order id="order-1">
> > >
> > > You're missing a score. Without one it defaults to 0 (which means
> > > optional).
> > > However, IIRC, the 1.0.6 schema won't allow you to set a score
> there
> > > so you'll need to apply the following patch:
> > > http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c
> > >
> > > >
> > > > - < <resource_set id="order-1-set-1" sequential="true">
> > > >
> > > > < <resource_ref id="world1" />
> > > >
> > > > < <resource_ref id="world2" />
> > > >
> > > > </resource_set>
> > > >
> > > > </rsc_order>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Several bad things happen. First, the resources don't move off
> > > > the
> > > node
> > > > that is put into standby, even though the alternate node is
> > > > running
> > > and able
> > > > to run the resources.
> > >
> > > Did you remove the other ordering constraint first?
> > >
> > > > Second, attempting to shut down openais on the node running the
> > > > resources after attempting a forced move (by putting the
> > > node
> > > > into standby) leaves both the lrmd and pengine processes running
> > > > (but children of process 1 (init), and the resources continue to
> > run
> > > > on
> > > the that
> > > > node even after openais is stopped.
> > >
> > > I suspect you've a faulty init script there. See other email.
> > >
> > > > I turned debug on in crmd and in the logs and recorded what
> > > > happens
> > > when I
> > > > force standby, and I notice that using the first syntax causes
> > > > te_rsc_command to be executed to send a shut down message to the
> > > > node
> > > where
> > > > the resources are running (which seems to work), while using the
> > > second
> > > > syntax causes te_pseudo_action to be called in approximately the
> > > > same
> > > place
> > > > in the log, but no shutdown of resources happens (I can't really
> > > > tell
> > > what
> > > > this is supposed to be doing).
> > >
> > > Neither can I - you didnt attach the logs :-)
> > >
> > > _______________________________________________
> > > Pacemaker mailing list
> > > Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pengine_debug.log
Type: application/octet-stream
Size: 11321 bytes
Desc: pengine_debug.log
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091202/7572724f/attachment-0002.obj>
More information about the Pacemaker
mailing list