[Pacemaker] Unexpected resource restarts when node comes online

Tue Aug 21 10:58:29 EDT 2012

----- Original Message -----
> From: "Gareth Davis" <Gareth.Davis at ipaccess.com>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Tuesday, August 21, 2012 10:01:39 AM
> Subject: [Pacemaker] Unexpected resource restarts when node comes online
> 
> Hi,
> 
> Quick bit of back ground, I've recently updated from pacemaker 1.0 to
> 1.1.5 because of an issue where cloned resources be restarted
> unexpectedly
> when any of the nodes went into standby or failed
> (https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2153),
> 1.1.5
> certainly fixes this issue.
> 
> But now I've got is all up and running I've noticed that on returning
> a
> node from standby to online a restart of my application server is
> triggered.
> 
> I'm afraid the config is complex involving a couple of DRBD pairs,
> four
> clones, and a glassfish application server NOSServiceManager0.
> 
> Output of crm configure show.
> https://dl.dropbox.com/u/5427964/config.txt
> 

Just my opinion but you may be able to make your complex ordering a little clearer to read/understand by using a single order statement with unordered sets rather than multiple order statements with the same final resource.  This will probably not make any difference with your problem though. i.e.:

Instead of:
order order_NOSFileSystemCluster_after_fs1_group inf: fs1_group NOSFileSystemCluster
order order_NOSFileSystemCluster_after_portmapCluster inf: portmapCluster NOSFileSystemCluster

Something like:
order order_NOSFileSystemCluster_after_fs1_group_and_portmapCluster inf: ( fs1_group portmapCluster ) NOSFileSystemCluster

Ordering help - " If there are more than two resources, then the
constraint is called a resource set. Ordered resource sets have an
extra attribute to allow for sets of resources whose actions may run
in parallel. The shell syntax for such sets is to put resources in
parentheses."

See full explanation in the documentation here:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-ordering.html

> 
> There are 2 nodes in the cluster (oamdev-vm11 & oamdev-vm12) all the
> non-cloned resources are running on oamdev-vm12.
> 
> On putting oamdev-vm11 into standby nothing unexpected happens, but
> on
> bringing it back online causes NOSServiceManager0 to be stopped and
> started.

I'm not sure if interleaving is default true for clones.  If it's not true then that could cause the restart because the dependencies are on all instances of the clone being started instead of just the individual local instance.  I would add interleave=true to your clones and see if that clears it up.

HTH

Jake

> 
> crm_report output, the time span should include the standby and
> online
> events.
> https://dl.dropbox.com/u/5427964/pcmk-Tue-21-Aug-2012.tar.bz2
> 
> I'm at a bit of a loss as to how to debug this, I suspect I've messed
> up
> the ordering in some way, any pointers?
> 
> Gareth Davis
> 
> 
> 
> 
> 
> 
> This message contains confidential information and may be privileged.
> If you are not the intended recipient, please notify the sender and
> delete the message immediately.
> 
> ip.access Ltd, registration number 3400157, Building 2020,
> Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United
> Kingdom
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
>