[Pacemaker] getnameinfo() vs uname()

Andrew Beekhof andrew at beekhof.net
Thu Aug 30 22:43:20 EDT 2012


On Wed, Aug 29, 2012 at 8:57 PM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> 29.08.2012 13:33, Andrew Beekhof wrote:
>> On Wed, Aug 29, 2012 at 4:22 PM, Vladislav Bogdanov
>> <bubble at hoster-ok.com> wrote:
>>> Hi,
>>>
>>> It looks like pacemaker (current master)
>>
>> "current master" changes quite rapidly, could you be specific?
>
> c72f5ca
>
>>
>>> does not always work nicely on
>>> top of corosync2 if one doesn't have /etc/hosts with all cluster nodes
>>> in it, where short form of name goes before the long one (so
>>> gethostbyaddr() and getnameinfo() return the short one).
>>
>> I noticed a different issue related to this, but I need to know
>> exactly which version you had before I can answer properly.

Ok...

Pacemaker doesn't actually care about FQDN vs short names.
Short names are arguably nicer to look at, but the only thing that
really matters is that when node A looks up its own name, that the
answer is consistent with the answer /other/ nodes get when they look
up node A.

The problem to date, is that local lookups have used uname(3P) while
remote lookups are using some other method (like getnameinfo(3)) .
So I think the first step to fixing this mess is to have everyone
using the same mechanism - for corosync 2.x clusters[1] that will
almost certainly be the corosync_node_name() function you spotted.

If no nodelist[2] is specified in corosync.conf, we use getnameinfo()
on the address corosync is bound to - possibly with your amendment
below.
If there is a node list, we will look for a name in the 'ring0_addr'
or 'name' fields
If those fields are missing or contain IP addresses, we fall back to
getnameinfo() as per the "no nodelist" case.
If non of those work, I guess we fall back to uname() and hope for the best.


I'm going to make this the first thing I do after 1.1.8 comes out
(we're waiting on http://bugs.clusterlabs.org/show_bug.cgi?id=5044 and
some final CTS runs).
If someone wants to help out before then, I would certainly not complain :)

-- Andrew

[1] We will implement equivalent functions for the other cluster types.
[2] The nodelist section looks something like:
nodelist {
    node {
        nodeid: 1
        ring0_addr: pcmk-1
        quorum_votes: 1
    }
    node {
        nodeid: 2
        ring0_addr: pcmk-2
        quorum_votes: 2
    }
}



>>
>>> I tried to run
>>> test cluster with stub /etc/hosts but fully functional name server, and
>>> I see that pacemaker includes long nodenames (fqdn) into nodelist, while
>>> expecting them to be equal to what uname() returns for the local node.
>>> After I created needed entries in /etc/hosts everything began to work.
>>> From getaddrinfo manpage, NI_NOFQDN flag should help to avoid this
>>> behavior.
>
> s/getaddrinfo/getnameinfo/
>
> Actually it doesn't. At least not always.
> Problem is that hostname (nodename) may be either fqdn (like anaconda
> tries to set) or contain only host part. And getnameinfo() is not
> consistent here (as in EL6), it strips domainname of a local system with
> leading dot if local hostname is FQDN, but returns FQDN which
> corresponds to address being searched if hostname is host-only.
>
> So, I tried following patch and it works perfectly for me (hosnames are
> host-only, and DNS is correctly configured, so hostname -f returns FQDN).
>
> diff -urNp a/lib/cluster/corosync.c b/lib/cluster/corosync.c
> --- a/lib/cluster/corosync.c    2012-08-29 07:32:57.000000000 +0000
> +++ b/lib/cluster/corosync.c    2012-08-29 07:33:54.730099738 +0000
> @@ -207,7 +207,15 @@ static char *corosync_node_name(cmap_han
>                  addrlen = sizeof(struct sockaddr_in);
>              }
>
> -            if (getnameinfo((struct sockaddr *)addrs[0].address,
> addrlen, buf, sizeof(buf), NULL, 0, 0) == 0) {
> +            if (getnameinfo((struct sockaddr *)addrs[0].address,
> addrlen, buf, sizeof(buf), NULL, 0, NI_NAMEREQD) == 0) {
> +                char *p = buf;
> +                while (*p) {
> +                    if (*p == '.') {
> +                        *p = '\0';
> +                        break;
> +                    }
> +                    p++;
> +                }
>                  crm_notice("Inferred node name '%s' for nodeid %u from
> DNS", buf, nodeid);
>
>                  if(corosync_name_is_valid("DNS", buf)) {
>
>
> Now I do not see FQDNs in nodelist.
> Grrr, line wrapping...
>
>>> Additionally, NI_NAMEREQD flag should probably be also used.
>
> This one still applies. Otherwise getnameinfo can return string
> representation of IP address if it cannot resolve it.

Thats not a big deal, corosync_name_is_valid() will detect this and
refuse to use it.

>
> Btw, NI_MAXHOST should be used instead of INET6_ADDRSTRLEN for buf there.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list