[Pacemaker] Upgrading to Pacemaker 1.1.7. Issue: sticky resources failing back after reboot

Thu Sep 6 11:18:53 EDT 2012

----- Original Message -----
> From: "Parshvi" <parshvi.17 at gmail.com>
> To: pacemaker at clusterlabs.org
> Sent: Thursday, September 6, 2012 2:39:10 AM
> Subject: [Pacemaker] Upgrading to Pacemaker 1.1.7. Issue: sticky resources	failing back after reboot
> 
> Hi,
> We have upgraded pacemaker version 1.0.12 to 1.1.7
> The upgrade was done since resources failed to recover after a
> timeout
> (monitor|stop[unmanaged]) and logs observed are:
> 
> WARN: print_graph: Synapse 6 is pending (priority: 0)
> Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_elem: [Action
> 103]: Pending
> (id: SnmpAgent_monitor_5000, loc: CSS-FU-2, priority: 0)
> Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_elem: * [Input
> 102]: Pending
> (id: SnmpAgent_start_0, loc: CSS-FU-2, priority: 0)
> Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_graph: Synapse 7
> is pending
> (priority: 0)
> 
> Reading through the forum mails, it was inferred that this issue is
> fixed in
> 1.1.7
> 
> Platform OS: OEL 5.8
> Pacemaker Version: 1.1.7
> Corosync version: 1.4.3
> 
> Pacemaker and all its dependent packages were built from source
> (tarball:github).
> glib version used for build: 2.32.2
> 
> The following issue is observed in Pacemaker 1.1.7:
> 1) There is a two-node cluster.
> 2) When primary node is rebooted/or pacemaker is restarted, the
> resources fail-
> over to secondary.
> 3) There are 4 group of services.
>    2 group are not sticky.
>    1 group is master/slave multi-state resource
>    1 group is STICKY
> 4) When primary node comes online, even the sticky resources fail
> back to
> primary node (Issue)
> 5) Now, if the secondary node is rebooted, the resources fail over to
> primary
> node.
> 6) Once the secondary node is up, only non-sticky resources
> fail-back. Sticky
> resources remain on primary node.
> 
> 7) Even if Location preference of sticky resources is set for
> Node-2(the
> secondary node), still sticky resources fail-back on Node-1.
> 
> We're using pacemaker 1.0.12 on Production. We're facing issues of
> IPaddr and
> other resources monitor operation timing out and pacemaker not
> recovering from
> it (shared above).
> 
> Any help is welcome.
> 
> PS: Please mention, if any logs or configuration needs to be shared.

My guess is that this is an issue with node scores for the resources in question.  Stickiness and location constraints work in a similar way.  You could really think of resource stickiness as a temporary location constraint on a resource that changes depending on what node it is on.

If you have a resource with stickiness enabled and you want the resource to stay put, the stickiness score has to out weigh all the location constraints for that resource on other nodes.  If you are using colocation constraints, this becomes increasingly complicated as a resources per node location score could change based on the location of another resource.

For specific advice on your scenario, there is little we can offer without seeing your exact configuration.

-- Vossel

> Thanks,
> Parshvi
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>