[Pacemaker] CoroSync's UDPu transport for public IP addresses?
Jan Friesse
jfriesse at redhat.com
Wed Jan 14 16:34:53 UTC 2015
Dmitry,
> Yes, now I have the clear experiment. Sorry, I misinformed you about
> "adding new UDPU member" - when I use DNS names in ringX_addr, I don't see
This is good to know
> such messages (for now). But, anyway, DNS names in ringX_addr seem not
> working, and no relevant messages are in default logs. Maybe add some
> validations for ringX_addr?
>
> I'm having resolvable DNS names:
>
> root at node1:/etc/corosync# ping -c1 -W100 node1 | grep from
> 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
>
This is problem. Resolving node1 to localhost (127.0.0.1) is simply
wrong. Names you want to use in corosync.conf should resolve to
interface address. I believe other nodes has similar setting (so node2
resolved on node2 is again 127.0.0.1)
Please try to fix this problem first and let's see if this will solve
issue you are hitting.
Regards,
Honza
> root at node1:/etc/corosync# ping -c1 -W100 node2 | grep from
> 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
>
> root at node1:/etc/corosync# ping -c1 -W100 node3 | grep from
> 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
>
>
> With corosync.conf below, nothing works:
> ...
> nodelist {
> node {
> ring0_addr: node1
> }
> node {
> ring0_addr: node2
> }
> node {
> ring0_addr: node3
> }
> }
> ...
> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync Cluster Engine
> ('2.3.3'): started and ready to provide service.
> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync built-in
> features: dbus testagents rdma watchdog augeas pie relro bindnow
> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transport
> (UDP/IP Unicast).
> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing
> transmit/receive security (NSS) crypto: aes256 hash: sha1
> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] The network interface
> [a.b.c.d] is now up.
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync configuration map access [0]
> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cmap
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync configuration service [1]
> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cfg
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync cluster closed process group service v1.01 [2]
> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cpg
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync profile loading service [4]
> Jan 14 10:47:44 node1 corosync[15062]: [WD ] No Watchdog, try modprobe
> <a watchdog>
> Jan 14 10:47:44 node1 corosync[15062]: [WD ] no resources configured.
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync watchdog service [7]
> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Using quorum provider
> corosync_votequorum
> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Quorum provider:
> corosync_votequorum failed to initialize.
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine
> 'corosync_quorum' failed to load for reason 'configuration error: nodelist
> or quorum.expected_votes must be configured!'
> Jan 14 10:47:44 node1 corosync[15062]: [MAIN ] Corosync Cluster Engine
> exiting with status 20 at service.c:356.
>
>
> But with IP addresses specified in ringX_addr, everything works:
> ...
> nodelist {
> node {
> ring0_addr: 104.236.71.79
> }
> node {
> ring0_addr: 188.166.54.190
> }
> node {
> ring0_addr: 128.199.116.218
> }
> }
> ...
> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync Cluster Engine
> ('2.3.3'): started and ready to provide service.
> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync built-in
> features: dbus testagents rdma watchdog augeas pie relro bindnow
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing transport
> (UDP/IP Unicast).
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing
> transmit/receive security (NSS) crypto: aes256 hash: sha1
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] The network interface
> [a.b.c.d] is now up.
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync configuration map access [0]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cmap
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync configuration service [1]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cfg
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync cluster closed process group service v1.01 [2]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cpg
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync profile loading service [4]
> Jan 14 10:48:28 node1 corosync[15156]: [WD ] No Watchdog, try modprobe
> <a watchdog>
> Jan 14 10:48:28 node1 corosync[15156]: [WD ] no resources configured.
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync watchdog service [7]
> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Using quorum provider
> corosync_votequorum
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync vote quorum service v1.0 [5]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: votequorum
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync cluster quorum service v0.1 [3]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: quorum
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> {a.b.c.d}
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> {e.f.g.h}
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> {i.j.k.l}
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] A new membership
> (m.n.o.p:80) was formed. Members joined: 1760315215
> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Members[1]: 1760315215
> Jan 14 10:48:28 node1 corosync[15156]: [MAIN ] Completed service
> synchronization, ready to provide service.
>
>
> On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>
>> Dmitry,
>>
>>
>>> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
>>> are definitely resolved), but in practice the cluster does not work, as I
>>> said above. So validations of ringX_addr in corosync.conf would be very
>>> helpful in corosync.
>>
>> that's weird. Because as long as DNS is resolved, corosync works only
>> with IP. This means, code path is exactly same with IP or with DNS. Do
>> you have logs from corosync?
>>
>> Honza
>>
>>
>>>
>>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>>>
>>>> Dmitry,
>>>>
>>>>
>>>> No, I meant that if you pass a domain name in ring0_addr, there are no
>>>>> errors in logs, corosync even seems to find nodes (based on its logs),
>> And
>>>>> crm_node -l shows them, but in practice nothing really works. A verbose
>>>>> error message would be very helpful in such case.
>>>>>
>>>>
>>>> This sounds weird. Are you sure that DNS names really maps to correct IP
>>>> address? In logs there should be something like "adding new UDPU member
>>>> {IP_ADDRESS}".
>>>>
>>>> Regards,
>>>> Honza
>>>>
>>>>
>>>>> On Tuesday, December 30, 2014, Daniel Dehennin <
>>>>> daniel.dehennin at baby-gnu.org>
>>>>> wrote:
>>>>>
>>>>> Dmitry Koterov <dmitry.koterov at gmail.com <javascript:;>> writes:
>>>>>>
>>>>>> Oh, seems I've found the solution! At least two mistakes was in my
>>>>>>> corosync.conf (BTW logs did not say about any errors, so my
>> conclusion
>>>>>>> is
>>>>>>> based on my experiments only).
>>>>>>>
>>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
>>>>>>>
>>>>>> simply
>>>>>>
>>>>>>> do not work, "crm status" shows no nodes. And no warnings are in logs
>>>>>>> regarding this.
>>>>>>>
>>>>>>
>>>>>> You can add name like this:
>>>>>>
>>>>>> nodelist {
>>>>>> node {
>>>>>> ring0_addr: <public-ip-address-of-the-first-machine>
>>>>>> name: node1
>>>>>> }
>>>>>> node {
>>>>>> ring0_addr: <public-ip-address-of-the-second-machine>
>>>>>> name: node2
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> I used it on Ubuntu Trusty with udpu.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> --
>>>>>> Daniel Dehennin
>>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list