[Pacemaker] Nodes appear UNCLEAN (offline) during Pacemaker upgrade to 1.1.7

Andrew Beekhof andrew at beekhof.net
Wed Nov 28 19:24:08 EST 2012


On Mon, Nov 26, 2012 at 5:48 PM, Parshvi <parshvi.17 at gmail.com> wrote:
> Thanks Andrew for your input.
> Andrew Beekhof <andrew at ...> writes:
>
>>
>> On Fri, Nov 23, 2012 at 11:47 PM, Parshvi <parshvi.17 at ...> wrote:
>> > Hi,
>> > We are upgrading to Pacemaker 1.1.7 and Corosync 1.4.3.
>> > The previous version was:
>> > Pacemaker: 1.0.12
>> > Corosync : 1.2.7
>> > The issues faced in the older version are:
>> > 1) Numerous, Policy engine and crmd crashes, stopping failed cluster
> resources
>> > from recovering.
>>
>> Did you report any of these?
>> I can't fix bugs I don't know about.
> I have raised the issue on the forum mails.

Odd. I don't recall seeing them. Sorry.

> Haven't opened a bug though on
> bugzilla. I would file a bug for the issue now.

great

>> > 2) pacemaker logs show FSM in pending state, service comes in sync only
> after a
>> > restart.
>>
>> As above.
> Raised the issue on forum. Will file a bug now.
>>
>> >
>> > Environment:
>> > 1) OS: OEL 5.8
>> > RPMS(packages) for Pacemaker 1.1.7, Corosync 1.4.3 and other dependent pkgs
> are
>> > not available for OEL 5.8. Hence, we have build all pkgs from source
> (github).
>>
>> Did you try the ones at: http://clusterlabs.org/rpm-next/
> Yes, while working on issue I went to clusterlabs.org for help. I have worked
> with the rpms-next for pacemaker 1.1.8 and corosync 1.4.1.
> The nodes come ONLINE, as expected.
> I am using the old resource-agents version: 1.0.4 (I didn't find the rpms for
> latest version on clusterlabs. Can u suggest as to where I can find the rpms for
> latest rel. of resource-agents ?)

I try to focus on the bits needed to build/install pacemaker.
You could maybe try:

rpmbuild --rebuild
http://ftp.iinet.net.au/pub/fedora/linux/releases/17/Everything/x86_64/os/Packages/r/resource-agents-3.9.2-2.fc17.1.x86_64.rpm

> According to http://upstream-
> tracker.org/changelogs/pacemaker/1.1.8/changelog.html crm has become a separate
> project. Hence I would be installing the crm/cli now.

Correct. I believe they publish rpms somewhere.

>>
>> >
>> > We have a two node cluster. We have installed the build binaries on both
> cluster
>> > nodes. crm_mon shows both nodes as online. All processes of corosync and
>> > pacemaker appear started and running.
>> >
>> > Issues faced:
>> > We have another setup, consisting of two nodes in the cluster(same as
> above).
>> > Pkg binaries have been installed on both the nodes.
>> > One of the nodes appears UNCLEAN (offline) and other node appears (offline).
>> > crmd process continuously respawns until its max respawn count is reached.
> DC
>> > appears NONE in crm_mon.
>> >
>> > I have checked selinux, firewall on the nodes(its disabled).
>> >
>> > I have an hb_report of the nodes. I can share it if needed.
>>
>> Yes please. Not much we can do without it.  Or at least without some
>> sort of description beyond "the crmd respawns".
> Will share the hb_report.
>>
>> > I also created another cluster of 2 nodes: One node was from WORKING cluster
> and
>> > another node was from NON_WORKING cluster.
>> > A dump of the o/p of crm_mon of such a cluster is:
>> >
>> > Last updated: Sat Nov 17 19:53:37 2012
>> > Last change: Sat Nov 17 19:53:27 2012 via crmd on node-112
>> > Stack: openais
>> > Current DC: node-112 - partition with quorum
>> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> > 2 Nodes configured, 2 expected votes
>> > 0 Resources configured.
>> > ============
>> >
>> > Node node-122: UNCLEAN (offline)
>> > Online: [ node-112 ]
>> >
>> >
>> > After some time the UNCLEAN(offline) node appears offline:
>> >
>> > Last updated: Sat Nov 17 20:26:48 2012
>> > Last change: Sat Nov 17 20:15:38 2012 via cibadmin on node-112
>> > Stack: openais
>> > Current DC: node-112 - partition with quorum
>> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> > 2 Nodes configured, 2 expected votes
>> > 0 Resources configured.
>> > ============
>> >
>> > Online: [ node-112 ]
>> > OFFLINE: [ node-122 ]
>> >
>> > I would request the owners to please respond with some input. The old
> version is
>> > a concern at our production.
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list