[Pacemaker] Remove a "ghost" node
Sean Lutner
sean at rentul.net
Fri Nov 8 01:59:35 UTC 2013
On Nov 7, 2013, at 8:34 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>
> On 8 Nov 2013, at 4:45 am, Sean Lutner <sean at rentul.net> wrote:
>
>> I have a confusing situation that I'm hoping to get help with. Last night after configuring STONITH on my two node cluster, I suddenly have a "ghost" node in my cluster. I'm looking to understand the best way to remove this node from the config.
>>
>> I'm using the fence_ec2 device for for STONITH. I dropped the script on each node, registered the device with stonith_admin -R -a fence_ec2 and confirmed the registration with both
>>
>> # stonith_admin -I
>> # pcs stonith list
>>
>> I then configured STONITH per the Clusters from Scratch doc
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_example.html
>>
>> Here are my commands:
>> # pcs cluster cib stonith_cfg
>> # pcs -f stonith_cfg stonith create ec2-fencing fence_ec2 ec2-home="/opt/ec2-api-tools" pcmk_host_check="static-list" pcmk_host_list="ip-10-50-3-122 ip-10-50-3-251" op monitor interval="300s" timeout="150s" op start start-delay="30s" interval="0"
>> # pcs -f stonith_cfg stonith
>> # pcs -f stonith_cfg property set stonith-enabled=true
>> # pcs -f stonith_cfg property
>> # pcs cluster push cib stonith_cfg
>>
>> After that I saw that STONITH appears to be functioning but a new node listed in pcs status output:
>
> Do the EC2 instances have fixed IPs?
> I didn't have much luck with EC2 because every time they came back up it was with a new name/address which confused corosync and created situations like this.
The IPs persist across reboots as far as I can tell. I thought the problem was due to stonith being enabled but not working so I removed the stonith_id and disabled stonith. After that I restarted pacemaker and cman on both nodes and things started as expected but the ghost node it still there.
Someone else working on the cluster exported the CIB, removed the node and then imported the CIB. They used this process http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-config-updates.html
Even after that, the ghost node is still there? Would pcs cluster cib > /tmp/cib-temp.xml and then pcs cluster push cib /tmp/cib-temp.xml after editing the node out of the config?
I may have to go back to the drawing board on a fencing device for the nodes. Are there any other recommendations for a cluster on EC2 nodes?
Thanks very much
>
>>
>> # pcs status
>> Last updated: Thu Nov 7 17:41:21 2013
>> Last change: Thu Nov 7 04:29:06 2013 via cibadmin on ip-10-50-3-122
>> Stack: cman
>> Current DC: ip-10-50-3-122 - partition with quorum
>> Version: 1.1.8-7.el6-394e906
>> 3 Nodes configured, unknown expected votes
>> 11 Resources configured.
>>
>>
>> Node ip-10-50-3-1251: UNCLEAN (offline)
>> Online: [ ip-10-50-3-122 ip-10-50-3-251 ]
>>
>> Full list of resources:
>>
>> ClusterEIP_54.215.143.166 (ocf::pacemaker:EIP): Started ip-10-50-3-122
>> Clone Set: EIP-AND-VARNISH-clone [EIP-AND-VARNISH]
>> Started: [ ip-10-50-3-122 ip-10-50-3-251 ]
>> Stopped: [ EIP-AND-VARNISH:2 ]
>> ec2-fencing (stonith:fence_ec2): Stopped
>>
>> I have no idea where the node that is marked UNCLEAN came from, though it's a clear typo is a proper cluster node.
>>
>> The only command I ran with the bad node ID was:
>>
>> # crm_resource --resource ClusterEIP_54.215.143.166 --cleanup --node ip-10-50-3-1251
>>
>> Is there any possible way that could have caused the the node to be added?
>>
>> I tried running pcs cluster node remove ip-10-50-3-1251 but since there is no node and thus no pcsd that failed. Is there a way I can safely remove this ghost node from the cluster? I can provide logs from pacemaker or corosync as needed.
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131107/e22210da/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131107/e22210da/attachment-0004.sig>
More information about the Pacemaker
mailing list