[ClusterLabs] Unable to add 'NodeX' to cluster: node is already in a cluster

Scott Greenlese swgreenl at us.ibm.com
Thu Jun 29 16:33:06 EDT 2017


Tomas,

Yes, that was it.

[root at zs95KL corosync]# pcs cluster destroy
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
[root at zs95KL corosync]#


[root at zs93kl corosync]# pcs cluster node add zs95KLpcs1,zs95KLpcs2
zs95kjpcs1: Corosync updated
zs93KLpcs1: Corosync updated
zs95KLpcs1: Succeeded
Synchronizing pcsd certificates on nodes zs95KLpcs1...
zs95KLpcs1: Success

Restaring pcsd on the nodes in order to reload the certificates...
zs95KLpcs1: Success
[root at zs93kl corosync]#

Thank you very much for this quick fix.

- Scott

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
  INTERNET:  swgreenl at us.ibm.com




From:	Tomas Jelinek <tojeline at redhat.com>
To:	users at clusterlabs.org
Date:	06/29/2017 12:13 PM
Subject:	Re: [ClusterLabs] Unable to add 'NodeX' to cluster: node is
            already in a cluster



Hi Scott,

It looks like some of cluster configuration files still exist on your
node 'zs95KLpcs1'. Try running "pcs cluster destroy" on that node. This
will delete all cluster config files on the node. So make sure it is the
right node before running the command. Then you should be able to add
the node to your cluster.


Regards,
Tomas



Dne 29.6.2017 v 17:32 Scott Greenlese napsal(a):
> Hi all...
>
> When I try to add a previously removed cluster node back into my
> pacemaker cluster, I get the following error:
>
> [root at zs93kl]# pcs cluster node add zs95KLpcs1,zs95KLpcs2
> Error: Unable to add 'zs95KLpcs1' to cluster: node is already in a
cluster
>
> The node I am adding was recently removed from the cluster, but
> apparently the removal
> was incomplete.
>
> I am looking for some help to thoroughly remove zs95KLpcs1 from this (or
> any other)
> cluster that this host may be a part of.
>
>
> Background:
>
> I had removed node ( zs95KLpcs1) from my 3 node, single ring protocol
> pacemaker cluster while that node
> (which happens to be a KVM on System Z Linux host), was deactivated /
> shut down due to
> relentless, unsolicited STONITH events. My thought was that there was
> some issue with the ring0
> interface (on vlan1293) causing the cluster to initiate fence (power
> off) actions, just minutes after
> joining the cluster. That's why I went ahead and deactivated that node.
>
> The first procedure I used to remove zs95KLpcs1 was flawed, because I
> forgot that there's an issue with
> attempting to remove an unreachable cluster node on the older pacemaker
> code:
>
> [root at zs95kj ]# date;pcs cluster node remove zs95KLpcs1
> Tue Jun 27 18:28:23 EDT 2017
> Error: pcsd is not running on zs95KLpcs1
>
> I then followed this procedure (courtesy of Tomasand Ken inthis user
> group):
>
> 1. run 'pcs cluster localnode remove <nodename>' on all remaining nodes
> 2. run 'pcs cluster reload corosync' on one node
> 3. run 'crm_node -R <nodename> --force' on one node
>
> My execution:
>
> I made the mistake of manually removing the target node (zs95KLpcs1)
> stanza from corosync.conf file before
> executing the above procedure:
>
> [root at zs95kj ]# vi /etc/corosync/corosync.conf
>
> Removed this stanza:
>
> node {
> ring0_addr: zs95KLpcs1
> nodeid: 3
> }
>
> I then followed the recommended steps ...
>
> [root at zs95kj ]# pcs cluster localnode remove zs95KLpcs1
> Error: unable to remove zs95KLpcs1 ### I assume this was because I
> manually removed the stanza (above)
>
> [root at zs93kl ]# pcs cluster localnode remove zs95KLpcs1
> zs95KLpcs1: successfully removed!
> [root at zs93kl ]#
>
> [root at zs95kj ]# pcs cluster reload corosync
> Corosync reloaded
> [root at zs95kj ]#
>
> [root at zs95kj ]# crm_node -R zs95KLpcs1 --force
> [root at zs95kj ]#
>
>
> [root at zs95kj ]# pcs status |less
> Cluster name: test_cluster_2
> Last updated: Tue Jun 27 18:39:14 2017 Last change: Tue Jun 27 18:38:56
> 2017 by root via crm_node on zs95kjpcs1
> Stack: corosync
> Current DC: zs93KLpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
> partition with quorum
> 45 nodes and 227 resources configured
>
> *Online: [ zs93KLpcs1 zs95kjpcs1 ]*
>
>
> This seemed to work well, at least I'm showing only the two cluster
nodes.
>
> Later on, once I was able to activate zs95KLpcs1 (former cluster member)
> ... I did what I thought
> I should do to tell that node that it's no longer a member of the
cluster:
>
> [root at zs95kj ]# cat neuter.sh
> ssh root at zs95KL "/usr/sbin/pcs cluster localnode *remove *zs95KLpcs1"
> ssh root at zs95KL "/usr/sbin/pcs cluster reload corosync"
> ssh root at zs95KL "/usr/sbin/crm_node -R zs95KLpcs1 --force"
>
> [root at zs95kj ]# ./neuter.sh
> zs95KLpcs1:***successfully removed!*
> Corosync reloaded
> [root at zs95kj ]#
>
>
> Next, I followed a procedure to convert my current 2-node, single ring
> cluster to RRP ... which seems to be running
> well, and the corosync config looks like this:
>
> [root at zs93kl ]# for host in zs95kjpcs1 zs93KLpcs1 ; do ssh $host
> "hostname;corosync-cfgtool -s"; done
> zs95kj
> Printing ring status.
> Local node ID 2
> RING ID 0
> id = 10.20.93.12
> status = ring 0 active with no faults
> RING ID 1
> id = 10.20.94.212
> status = ring 1 active with no faults
>
> zs93kl
> Printing ring status.
> Local node ID 5
> RING ID 0
> id = 10.20.93.13
> status = ring 0 active with no faults
> RING ID 1
> id = 10.20.94.213
> status = ring 1 active with no faults
> [root at zs93kl ]#
>
>
> So now, when I try to add zs95KLpcs1 (and the second ring interface,
> zs95KLpcs2) to the RRP config,
> I get the error:
>
> [root at zs93kl]# pcs cluster node add zs95KLpcs1,zs95KLpcs2
> Error: Unable to add 'zs95KLpcs1' to cluster: node is already in a
cluster
>
>
> I re-ran the node removal procedures, and also deleted
> /etc/corosync/corosync.conf
> on the target node zs95KLpcs1, and nothing I've tried resolves my
problem.
>
> I checked to see if zs95KLpcs1 exists in any "corosync.conf" file on the
> 3 nodes, and it does not.
>
> [root at zs95kj corosync]# grep zs95KLpcs1 *
> [root at zs95kj corosync]#
>
> [root at zs93kl corosync]# grep zs95KLpcs1 *
> [root at zs95kj corosync]#
>
> [root at zs95KL corosync]# grep zs95KLpcs1 *
> [root at zs95kj corosync]#
>
> Thanks in advance ..
>
> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
> INTERNET: swgreenl at us.ibm.com
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170629/a6ded21a/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170629/a6ded21a/attachment-0003.gif>


More information about the Users mailing list