[Pacemaker] unknown third node added to a 2 node cluster?

Mon Oct 13 01:51:35 UTC 2014

On 11 Oct 2014, at 1:35 am, Brian J. Murrell (brian) <brian at interlinx.bc.ca> wrote:

> On Wed, 2014-10-08 at 12:39 +1100, Andrew Beekhof wrote:
>> On 8 Oct 2014, at 2:09 am, Brian J. Murrell (brian) <brian-SquOHqY54CVWr29BmMi2cA at public.gmane.org> wrote:
>> 
>>> Given a 2 node pacemaker-1.1.10-14.el6_5.3 cluster with nodes "node5"
>>> and "node6" I saw an "unknown" third node being added to the cluster,
>>> but only on node5:
>> 
>> Is either node using dhcp?
> 
> Yes, they both are.  The server is the ISC DHCP server (on EL6) and the
> address pool is much more plentiful than the node count.  That is all
> just to say that the DHCP server serving these nodes abides by the DHCP
> RFC's recommendation to allow clients to continue to use addresses they
> have already been assigned when making a renewal request.  And indeed,
> give them the same address they had previously after a lease expiry, as
> long as the pool is not constrained and address needed to satisfy a
> request from a different machine.
> 
>> I would guess node6 got a new IP address
> 
> These nodes are using the ISC DHCP client.  That DHCP client logs in the
> same log (/var/log/messages) as was posted in my prior message when it
> renews a lease with messages such as:
> 
> Oct 10 05:56:19 node6 dhclient[1026]: DHCPREQUEST on eth0 to 10.14.80.6 port 67 (xid=0x4f11c576)
> Oct 10 05:56:19 node6 dhclient[1026]: DHCPACK from 10.14.80.6 (xid=0x4f11c576)
> Oct 10 05:56:20 node6 dhclient[1026]: bound to 10.14.82.141 -- renewal in 8546 seconds.
> 
> In the logs that I pasted the messages from in my previous message, such
> messages don't even exist because the nodes are not left up long enough
> to even get to a lease expiry.  These are tests nodes and so are
> rebooted frequently.
> 
> TL;DR: I am quite certain the node did not get a new/different address.

Even the same address can be a problem. That brief window where things were getting renewed can screw up corosync.
Never ever use dhcp for a cluster node. Ever. Really, never.

> 
>> (or that corosync decided to bind to a different one)
> 
> Bind to a different what?  Address?

Yes. That is what nodeid's are calculated from.
Different nodeid == different address

>  As in binding to an address that
> was not even configured on the machine?

localhost is the most common one

> 
> b.
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141013/64782306/attachment-0009.sig>