[Pacemaker] Nodes appear UNCLEAN (offline) during Pacemaker upgrade to 1.1.7

Andrew Beekhof andrew at beekhof.net
Sun Nov 25 19:11:16 EST 2012


On Fri, Nov 23, 2012 at 11:47 PM, Parshvi <parshvi.17 at gmail.com> wrote:
> Hi,
> We are upgrading to Pacemaker 1.1.7 and Corosync 1.4.3.
> The previous version was:
> Pacemaker: 1.0.12
> Corosync : 1.2.7
> The issues faced in the older version are:
> 1) Numerous, Policy engine and crmd crashes, stopping failed cluster resources
> from recovering.

Did you report any of these?
I can't fix bugs I don't know about.

> 2) pacemaker logs show FSM in pending state, service comes in sync only after a
> restart.

As above.

>
> Environment:
> 1) OS: OEL 5.8
> RPMS(packages) for Pacemaker 1.1.7, Corosync 1.4.3 and other dependent pkgs are
> not available for OEL 5.8. Hence, we have build all pkgs from source (github).

Did you try the ones at: http://clusterlabs.org/rpm-next/

>
> We have a two node cluster. We have installed the build binaries on both cluster
> nodes. crm_mon shows both nodes as online. All processes of corosync and
> pacemaker appear started and running.
>
> Issues faced:
> We have another setup, consisting of two nodes in the cluster(same as above).
> Pkg binaries have been installed on both the nodes.
> One of the nodes appears UNCLEAN (offline) and other node appears (offline).
> crmd process continuously respawns until its max respawn count is reached. DC
> appears NONE in crm_mon.
>
> I have checked selinux, firewall on the nodes(its disabled).
>
> I have an hb_report of the nodes. I can share it if needed.

Yes please. Not much we can do without it.  Or at least without some
sort of description beyond "the crmd respawns".

> I also created another cluster of 2 nodes: One node was from WORKING cluster and
> another node was from NON_WORKING cluster.
> A dump of the o/p of crm_mon of such a cluster is:
>
> Last updated: Sat Nov 17 19:53:37 2012
> Last change: Sat Nov 17 19:53:27 2012 via crmd on node-112
> Stack: openais
> Current DC: node-112 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 0 Resources configured.
> ============
>
> Node node-122: UNCLEAN (offline)
> Online: [ node-112 ]
>
>
> After some time the UNCLEAN(offline) node appears offline:
>
> Last updated: Sat Nov 17 20:26:48 2012
> Last change: Sat Nov 17 20:15:38 2012 via cibadmin on node-112
> Stack: openais
> Current DC: node-112 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 0 Resources configured.
> ============
>
> Online: [ node-112 ]
> OFFLINE: [ node-122 ]
>
> I would request the owners to please respond with some input. The old version is
> a concern at our production.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list