[Pacemaker] Unexpected resource restarts when node comes online

Jake Smith jsmith at argotec.com
Tue Aug 21 15:56:30 UTC 2012




----- Original Message -----
> From: "Gareth Davis" <Gareth.Davis at ipaccess.com>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Tuesday, August 21, 2012 11:28:53 AM
> Subject: Re: [Pacemaker] Unexpected resource restarts when node comes online
> 
> From the documentation it seem that the default is actually
> interleave=trueŠwhich is I think the desired setting, i.e. Only wait
> for
> the local instance rather than all the clones. I've tried with
> interleave=true & falseŠ doesn't seem to be cause of the problem.
> 
> I'll continue with interleave="true" on all clones.
> 
> I've been playing around with ptest and it I think the fs1_group is
> being
> restarted, which in turn restarts NOSFileSystemCluster etc.
> 

I know it's pretty obvious but the location of your DRBD masters doesn't change between standby and online do they?

Was thinking of a score problem between stickiness and placement/advisory location maybe...

Jake

> Gareth
> 
> On 21/08/2012 15:40, "David Vossel" <dvossel at redhat.com> wrote:
> 
> >----- Original Message -----
> >> From: "Gareth Davis" <Gareth.Davis at ipaccess.com>
> >> To: "The Pacemaker cluster resource manager"
> >><pacemaker at oss.clusterlabs.org>
> >> Sent: Tuesday, August 21, 2012 9:01:39 AM
> >> Subject: [Pacemaker] Unexpected resource restarts when node comes
> >> online
> >> 
> >> Hi,
> >> 
> >> Quick bit of back ground, I've recently updated from pacemaker 1.0
> >> to
> >> 1.1.5 because of an issue where cloned resources be restarted
> >> unexpectedly
> >> when any of the nodes went into standby or failed
> >> (https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2153),
> >> 1.1.5
> >> certainly fixes this issue.
> >> 
> >> But now I've got is all up and running I've noticed that on
> >> returning
> >> a
> >> node from standby to online a restart of my application server is
> >> triggered.
> >
> >I took a quick look at your config.  My guess is that the following
> >order
> >constraint is causing the restart of NOSServiceManager0 when the
> >node
> >comes back on.
> >
> >order order_NOSServiceManager0_after_NOSFileSystemCluster inf:
> >NOSFileSystemCluster NOSServiceManager0
> >
> >I'm thinking the interleave clone resource option might help with
> >this.
> >http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explaine
> >d/ch10s02s02.html
> >
> >-- Vossel
> >
> >> I'm afraid the config is complex involving a couple of DRBD pairs,
> >> four
> >> clones, and a glassfish application server NOSServiceManager0.
> >> 
> >> Output of crm configure show.
> >> https://dl.dropbox.com/u/5427964/config.txt
> >> 
> >> 
> >> There are 2 nodes in the cluster (oamdev-vm11 & oamdev-vm12) all
> >> the
> >> non-cloned resources are running on oamdev-vm12.
> >> 
> >> On putting oamdev-vm11 into standby nothing unexpected happens,
> >> but
> >> on
> >> bringing it back online causes NOSServiceManager0 to be stopped
> >> and
> >> started.
> >> 
> >> crm_report output, the time span should include the standby and
> >> online
> >> events.
> >> https://dl.dropbox.com/u/5427964/pcmk-Tue-21-Aug-2012.tar.bz2
> >> 
> >> I'm at a bit of a loss as to how to debug this, I suspect I've
> >> messed
> >> up
> >> the ordering in some way, any pointers?
> >> 
> >> Gareth Davis
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> This message contains confidential information and may be
> >> privileged.
> >> If you are not the intended recipient, please notify the sender
> >> and
> >> delete the message immediately.
> >> 
> >> ip.access Ltd, registration number 3400157, Building 2020,
> >> Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United
> >> Kingdom
> >> 
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> 
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >> 
> >
> >_______________________________________________
> >Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> >Project Home: http://www.clusterlabs.org
> >Getting started:
> >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> 
> 
> This message contains confidential information and may be privileged.
> If you are not the intended recipient, please notify the sender and
> delete the message immediately.
> 
> ip.access Ltd, registration number 3400157, Building 2020,
> Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United
> Kingdom
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 




More information about the Pacemaker mailing list