[Pacemaker] Nodes appear UNCLEAN (offline) during Pacemaker upgrade to 1.1.7
Parshvi
parshvi.17 at gmail.com
Mon Nov 26 06:48:25 UTC 2012
Thanks Andrew for your input.
Andrew Beekhof <andrew at ...> writes:
>
> On Fri, Nov 23, 2012 at 11:47 PM, Parshvi <parshvi.17 at ...> wrote:
> > Hi,
> > We are upgrading to Pacemaker 1.1.7 and Corosync 1.4.3.
> > The previous version was:
> > Pacemaker: 1.0.12
> > Corosync : 1.2.7
> > The issues faced in the older version are:
> > 1) Numerous, Policy engine and crmd crashes, stopping failed cluster
resources
> > from recovering.
>
> Did you report any of these?
> I can't fix bugs I don't know about.
I have raised the issue on the forum mails. Haven't opened a bug though on
bugzilla. I would file a bug for the issue now.
>
> > 2) pacemaker logs show FSM in pending state, service comes in sync only
after a
> > restart.
>
> As above.
Raised the issue on forum. Will file a bug now.
>
> >
> > Environment:
> > 1) OS: OEL 5.8
> > RPMS(packages) for Pacemaker 1.1.7, Corosync 1.4.3 and other dependent pkgs
are
> > not available for OEL 5.8. Hence, we have build all pkgs from source
(github).
>
> Did you try the ones at: http://clusterlabs.org/rpm-next/
Yes, while working on issue I went to clusterlabs.org for help. I have worked
with the rpms-next for pacemaker 1.1.8 and corosync 1.4.1.
The nodes come ONLINE, as expected.
I am using the old resource-agents version: 1.0.4 (I didn't find the rpms for
latest version on clusterlabs. Can u suggest as to where I can find the rpms for
latest rel. of resource-agents ?)
According to http://upstream-
tracker.org/changelogs/pacemaker/1.1.8/changelog.html crm has become a separate
project. Hence I would be installing the crm/cli now.
>
> >
> > We have a two node cluster. We have installed the build binaries on both
cluster
> > nodes. crm_mon shows both nodes as online. All processes of corosync and
> > pacemaker appear started and running.
> >
> > Issues faced:
> > We have another setup, consisting of two nodes in the cluster(same as
above).
> > Pkg binaries have been installed on both the nodes.
> > One of the nodes appears UNCLEAN (offline) and other node appears (offline).
> > crmd process continuously respawns until its max respawn count is reached.
DC
> > appears NONE in crm_mon.
> >
> > I have checked selinux, firewall on the nodes(its disabled).
> >
> > I have an hb_report of the nodes. I can share it if needed.
>
> Yes please. Not much we can do without it. Or at least without some
> sort of description beyond "the crmd respawns".
Will share the hb_report.
>
> > I also created another cluster of 2 nodes: One node was from WORKING cluster
and
> > another node was from NON_WORKING cluster.
> > A dump of the o/p of crm_mon of such a cluster is:
> >
> > Last updated: Sat Nov 17 19:53:37 2012
> > Last change: Sat Nov 17 19:53:27 2012 via crmd on node-112
> > Stack: openais
> > Current DC: node-112 - partition with quorum
> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> > 2 Nodes configured, 2 expected votes
> > 0 Resources configured.
> > ============
> >
> > Node node-122: UNCLEAN (offline)
> > Online: [ node-112 ]
> >
> >
> > After some time the UNCLEAN(offline) node appears offline:
> >
> > Last updated: Sat Nov 17 20:26:48 2012
> > Last change: Sat Nov 17 20:15:38 2012 via cibadmin on node-112
> > Stack: openais
> > Current DC: node-112 - partition with quorum
> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> > 2 Nodes configured, 2 expected votes
> > 0 Resources configured.
> > ============
> >
> > Online: [ node-112 ]
> > OFFLINE: [ node-122 ]
> >
> > I would request the owners to please respond with some input. The old
version is
> > a concern at our production.
More information about the Pacemaker
mailing list