[Pacemaker] Nodes appear UNCLEAN (offline) during Pacemaker upgrade to 1.1.7

Parshvi parshvi.17 at gmail.com
Fri Nov 23 12:47:54 UTC 2012


Hi,
We are upgrading to Pacemaker 1.1.7 and Corosync 1.4.3.
The previous version was:
Pacemaker: 1.0.12
Corosync : 1.2.7
The issues faced in the older version are:
1) Numerous, Policy engine and crmd crashes, stopping failed cluster resources 
from recovering.
2) pacemaker logs show FSM in pending state, service comes in sync only after a 
restart.

Environment:
1) OS: OEL 5.8
RPMS(packages) for Pacemaker 1.1.7, Corosync 1.4.3 and other dependent pkgs are 
not available for OEL 5.8. Hence, we have build all pkgs from source (github).

We have a two node cluster. We have installed the build binaries on both cluster 
nodes. crm_mon shows both nodes as online. All processes of corosync and 
pacemaker appear started and running.

Issues faced:
We have another setup, consisting of two nodes in the cluster(same as above).
Pkg binaries have been installed on both the nodes.
One of the nodes appears UNCLEAN (offline) and other node appears (offline).
crmd process continuously respawns until its max respawn count is reached. DC 
appears NONE in crm_mon.

I have checked selinux, firewall on the nodes(its disabled).

I have an hb_report of the nodes. I can share it if needed.

I also created another cluster of 2 nodes: One node was from WORKING cluster and 
another node was from NON_WORKING cluster.
A dump of the o/p of crm_mon of such a cluster is:

Last updated: Sat Nov 17 19:53:37 2012
Last change: Sat Nov 17 19:53:27 2012 via crmd on node-112
Stack: openais
Current DC: node-112 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Node node-122: UNCLEAN (offline)
Online: [ node-112 ]


After some time the UNCLEAN(offline) node appears offline:

Last updated: Sat Nov 17 20:26:48 2012
Last change: Sat Nov 17 20:15:38 2012 via cibadmin on node-112
Stack: openais
Current DC: node-112 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ node-112 ]
OFFLINE: [ node-122 ]

I would request the owners to please respond with some input. The old version is 
a concern at our production.






More information about the Pacemaker mailing list