[Pacemaker] Upgrading to Pacemaker 1.1.7. Issue: sticky resources failing back after reboot

Mon Sep 10 08:06:51 UTC 2012

David Vossel <dvossel at ...> writes:
> > Hi,
> > We have upgraded pacemaker version 1.0.12 to 1.1.7
> > The upgrade was done since resources failed to recover after a
> > timeout
> > (monitor|stop[unmanaged]) and logs observed are:
> > 
> > WARN: print_graph: Synapse 6 is pending (priority: 0)
> > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_elem: [Action
> > 103]: Pending
> > (id: SnmpAgent_monitor_5000, loc: CSS-FU-2, priority: 0)
> > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_elem: * [Input
> > 102]: Pending
> > (id: SnmpAgent_start_0, loc: CSS-FU-2, priority: 0)
> > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_graph: Synapse 7
> > is pending
> > (priority: 0)
> > 
> > Reading through the forum mails, it was inferred that this issue is
> > fixed in
> > 1.1.7
> > 
> > Platform OS: OEL 5.8
> > Pacemaker Version: 1.1.7
> > Corosync version: 1.4.3
> > 
> > Pacemaker and all its dependent packages were built from source
> > (tarball:github).
> > glib version used for build: 2.32.2
> > 
> > The following issue is observed in Pacemaker 1.1.7:
> > 1) There is a two-node cluster.
> > 2) When primary node is rebooted/or pacemaker is restarted, the
> > resources fail-
> > over to secondary.
> > 3) There are 4 group of services.
> >    2 group are not sticky.
> >    1 group is master/slave multi-state resource
> >    1 group is STICKY
> > 4) When primary node comes online, even the sticky resources fail
> > back to
> > primary node (Issue)
> > 5) Now, if the secondary node is rebooted, the resources fail over to
> > primary
> > node.
> > 6) Once the secondary node is up, only non-sticky resources
> > fail-back. Sticky
> > resources remain on primary node.
> > 
> > 7) Even if Location preference of sticky resources is set for
> > Node-2(the
> > secondary node), still sticky resources fail-back on Node-1.
> > 
> > We're using pacemaker 1.0.12 on Production. We're facing issues of
> > IPaddr and
> > other resources monitor operation timing out and pacemaker not
> > recovering from
> > it (shared above).
> > 
> > Any help is welcome.
> > 
> > PS: Please mention, if any logs or configuration needs to be shared.
> 
> My guess is that this is an issue with node scores for the resources in 
question.  Stickiness and location
> constraints work in a similar way.  You could really think of resource 
stickiness as a temporary location
> constraint on a resource that changes depending on what node it is on.
> 
> If you have a resource with stickiness enabled and you want the resource to 
stay put, the stickiness score
> has to out weigh all the location constraints for that resource on other 
nodes.  If you are using colocation
> constraints, this becomes increasingly complicated as a resources per node 
location score could change
> based on the location of another resource.
> 
> For specific advice on your scenario, there is little we can offer without 
seeing your exact configuration.
> 
Hi David,
Thanks for a quick response.

I have shared the configuration on the following path:
https://dl.dropbox.com/u/20096935/cib.txt

The issue has been observed for the following group:
1) Rsc_Ms1
2) Rsc_S
3) Rsc_T
4) Rsc_TGroupClusterIP

Colocation: Resources 1) 2) and 3) have been colocated with resource 4)
Location preference: Resource 4) prefers a one of the nodes in the cluster
Ordering: Resources 1) 2) and 3) would be started (no sequential ordering 
between these resources) when rsc 4) is started.

Thanks,
Parshvi