[Pacemaker] Remove a "ghost" node

Sean Lutner sean at rentul.net
Thu Nov 7 12:45:21 EST 2013


I have a confusing situation that I'm hoping to get help with. Last night after configuring STONITH on my two node cluster, I suddenly have a "ghost" node in my cluster. I'm looking to understand the best way to remove this node from the config.

I'm using the fence_ec2 device for for STONITH. I dropped the script on each node, registered the device with stonith_admin -R -a fence_ec2 and confirmed the registration with both

# stonith_admin -I
# pcs stonith list

I then configured STONITH per the Clusters from Scratch doc

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_example.html

Here are my commands:
# pcs cluster cib stonith_cfg
# pcs -f stonith_cfg stonith create ec2-fencing fence_ec2 ec2-home="/opt/ec2-api-tools" pcmk_host_check="static-list" pcmk_host_list="ip-10-50-3-122 ip-10-50-3-251" op monitor interval="300s" timeout="150s" op start start-delay="30s" interval="0"
# pcs -f stonith_cfg stonith
# pcs -f stonith_cfg property set stonith-enabled=true
# pcs -f stonith_cfg property
# pcs cluster push cib stonith_cfg

After that I saw that STONITH appears to be functioning but a new node listed in pcs status output:

# pcs status
Last updated: Thu Nov  7 17:41:21 2013
Last change: Thu Nov  7 04:29:06 2013 via cibadmin on ip-10-50-3-122
Stack: cman
Current DC: ip-10-50-3-122 - partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, unknown expected votes
11 Resources configured.


Node ip-10-50-3-1251: UNCLEAN (offline)
Online: [ ip-10-50-3-122 ip-10-50-3-251 ]

Full list of resources:

 ClusterEIP_54.215.143.166      (ocf::pacemaker:EIP):   Started ip-10-50-3-122
 Clone Set: EIP-AND-VARNISH-clone [EIP-AND-VARNISH]
     Started: [ ip-10-50-3-122 ip-10-50-3-251 ]
     Stopped: [ EIP-AND-VARNISH:2 ]
 ec2-fencing    (stonith:fence_ec2):    Stopped 

I have no idea where the node that is marked UNCLEAN came from, though it's a clear typo is a proper cluster node.

The only command I ran with the bad node ID was:

# crm_resource --resource ClusterEIP_54.215.143.166 --cleanup --node ip-10-50-3-1251

Is there any possible way that could have caused the the node to be added?

I tried running pcs cluster node remove ip-10-50-3-1251 but since there is no node and thus no pcsd that failed. Is there a way I can safely remove this ghost node from the cluster? I can provide logs from pacemaker or corosync as needed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131107/db393af6/attachment-0002.sig>


More information about the Pacemaker mailing list