[Pacemaker] RFC: Any interesting in 2.0.0 betas?

Mon Nov 5 08:26:46 UTC 2012

05.11.2012 09:28, Andrew Beekhof wrote:
...
>> But you can guess it, as admins usually name nodes the same way. If not
>> - that is problem of admins.
> 
> No, its the problem of developers that get yelled at by admins :)

:)

> 
>>
>>>
>>>> Something says me this would provide better backwards
>>>> compatibility, while visible result for the discussed use-case will be
>>>> exactly the same. I know at least one cluster (not mine) which will be
>>>> broken if just to strip everything at the first dot - it uses long
>>>> hostnames (and this is the default for a fresh-installed redhat/fedora
>>>> if you enter FQDN in the anaconda prompt when installing a node).
>>>
>>> How do they configure corosync.conf / cluster.conf though?
>>
>> That is for corosync1. But when/if they decide to migrate to corosync2 -
> 
> Rephrase?
> 

I mean: imagine that somebody does have working cluster based on
corosync1 and wants to migrate to corosync2 (with quick 5 mins restart).
corosync.conf either has memberlist with node IP addresses in interface
clause with udpu or just uses mcast without explicit node list. Cluster
nodes have unames in FQDN format. CIB has number of location constraints
which refer to uname. Admin changes necessary minimum in corosync.conf
(adds votequorum and probably copy-pastes memberlist to nodelist,
leaving ip addresses there). I would say that is natural way for such
migration.

If you just blindly strip everything after the first dot, then that
setup will be severely broken. Location constraints will not work and
CIB will have duplicate entries for all nodes, one is FQDN (which
remains there from corosync1-based setup) another is a new stripped
name. But with my proposal it should cleanly start after upgrade without
any modifications to CIB. With that proposal you can guess remote node
uname with big chance of being correct (unless cluster members have been
setup differently regarding to uname, but I would say that such setup is
brain-dead). And admins will get expected result - they had it
configured such way and they now have it configured the same way.
Nothing changed, everything works.

And, one more interesting issue arises - what will be used as a node
name for multi-ring clusters? Even if corosync.conf has names in a
nodelist instead of addresses, which one will be used? ring0? I think it
would be natural to look at all ring-specific names/addresses and choose
one of them which matches local uname (with reverse DNS lookup for
addresses and may be double DNS lookup for names - name->address->fqdn).
After that you can guess remote unames based on ring id and domain name
obtained with method I propose from address for that ring and local
uname. That double lookup (but in a reverse order -
address->fqdn->address) is common for mail servers btw.

The more I think about it the more I believe that is the right way to
go. IMHO it is the most universal method.

>> they will have the broken cluster (because of location constraints do
>> not work too).
>>
>>> The stripping only applies when people put IP addresses in there (or
>>> use multicast without a node list).
>>>
>>> If people put node names we will use them unmodified.
>>> Adding a node list is recommended by upstream corosync, so it makes
>>> sense for us to use it as the official way to use a non-default naming
>>> scheme.
>>>
>>>>
>>>> Also, way I propose provides better flexibility
>>>
>>> It feels more fragile to me.  Its going to break really badly if some
>>> nodes use FQDN and some dont.
>>
>> That would be the side effect of having zoo in network instead of
>> well-defined structure. Not your problem.
>>
>> Well, that should not work for 1.1.8 too...
>>
>> And, does that work for corosync1?
> 
> I'm not changing corosync1.  Its still just uname(2)
> 
>>
>>>
>>>> - assume that one sets
>>>> hostname using two lexems from FQDN - node01.cluster01 (or n01.c01)
>>>> instead of just node01 or n01. FQDN itself could be
>>>> n01.c01.some.location.domain.com. That could be done just to add safety
>>>> for shell actions - hostname is usually shown in the shell prompt (I
>>>> recall many cases when I issued command on a different host from I
>>>> thought I do it on). The same applies to cluster commands. If visible
>>>> nodenames have some (administrator controlled) hints, cluster could be
>>>> safer to operate. And this way should not cause any breakage - cluster
>>>> node are usually named the same way and no additional configuration is
>>>> involved.
>>>>
>>>> I can develop patch for that if you want. It would introduce one global
>>>> var (domain name), and will have one extra call to uname() and three or
>>>> less calls to string-handling functions.
>>>>
>>>>>
>>>>> On Fri, Oct 26, 2012 at 9:57 PM, Vladislav Bogdanov
>>>>> <bubble at hoster-ok.com> wrote:
>>>>>> 26.10.2012 13:38, Vladislav Bogdanov wrote:
>>>>>>> 26.10.2012 12:43, Andrew Beekhof wrote:
>>>>>>> ...
>>>>>>>>> May be also set it forcibly to uname if uname contains full lexem found
>>>>>>>>> in dns name?
>>>>>>>>
>>>>>>>> Run that past me again?
>>>>>>>
>>>>>>> I mean that if ip address resolves to fqdn, and that fqdn begins with
>>>>>>> what uname call returns (so both node itself and DNS agree on a node
>>>>>>> name for a node with give IP address), then that value from uname could
>>>>>>> be safely used directly.
>>>>>>
>>>>>> Ah, that is for local node only...
>>>>>> For remote nodes I would strip FQDN part which begins right at that dot
>>>>>> where FQDN of local node and its uname differ.
>>>>>>
>>>>>> my_ring_address == 10.0.0.XXX
>>>>>> my_uname() == "host232"
>>>>>> getaddinfo(my_ring_address) == host232.some.very.long.domain.name.com.
>>>>>>
>>>>>> my_node_name = "host232"
>>>>>> my_domain = "some.very.long.domain.name.com."
>>>>>>
>>>>>> his_ring_address == 10.0.0.YYY
>>>>>> getaddinfo(his_ring_address) == host238.some.very.long.domain.name.com.
>>>>>>
>>>>>> strstr("host238.some.very.long.domain.name.com.", my_domain) != NULL
>>>>>>
>>>>>> his_node_name = "host238"
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> To illustrate:
>>>>>>> ring_address == 10.0.0.XXX
>>>>>>> uname() == "host232"
>>>>>>> getaddinfo(ring_address) == host232.some.very.long.domain.name.com.
>>>>>>>
>>>>>>> then "host232" could be safely used as a node name (but not "host23" and
>>>>>>> not "host232.s")
>>>>>>>
>>>>>>> Of course, it would be even more safe if gentnameinfo("host232") or
>>>>>>> getnameinfo("host232.some.very.long.domain.name.com.") returns
>>>>>>> 10.0.0.XXX, so additional check may be introduced.
>>>>>>>
>>>>>>> That is normal for "correct" static DNS setups, where PTR record is
>>>>>>> consistent with what node has configured as a hostname internally.
>>>>>>>
>>>>>>> That is also what I have for DHCP-based static address assignments
>>>>>>> (central configuration place for a whole cluster network), where node
>>>>>>> usually sets (or at least can be configured to set) its name to what
>>>>>>> DHCP server says. And DHCP server is usually set up to update A and PTR
>>>>>>> records in DNS zone.
>>>>>>>
>>>>>>> Also that should work correctly when FQDN is used as an uname (long
>>>>>>> hostname), like redhat setups usually do.
>>>>>>>
>>>>>>> Anyways, if FQDN does not begin with uname, then DNS info should be used
>>>>>>> for node name (like it is now), possibly with that "strip" hack. That
>>>>>>> could be useful for multi-ring setups I think.
>>>>>>>
>>>>>>> Vladislav
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>