[Pacemaker] [corosync] CoroSync's UDPu transport for public IP addresses?

Dmitry Koterov dmitry.koterov at gmail.com
Fri Jan 16 19:55:09 UTC 2015


Great, it works! Thank you.

It would be extremely helpful if this information will be included in a
default corosync.conf as comments:
- regarding allowed and even preferred absense of totem.interface in case
of UDPu
- that quorum section must not be empty, and that the default quorum.provider
could be corosync_votequorum (but not empty).

It would help to install and launch corosync instantly by novices.


On Fri, Jan 16, 2015 at 7:31 PM, Jan Friesse <jfriesse at redhat.com> wrote:

> Dmitry Koterov napsal(a):
>
>>
>>>  such messages (for now). But, anyway, DNS names in ringX_addr seem not
>>>> working, and no relevant messages are in default logs. Maybe add some
>>>> validations for ringX_addr?
>>>>
>>>> I'm having resolvable DNS names:
>>>>
>>>> root at node1:/etc/corosync# ping -c1 -W100 node1 | grep from
>>>> 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
>>>>
>>>>
>>> This is problem. Resolving node1 to localhost (127.0.0.1) is simply
>>> wrong. Names you want to use in corosync.conf should resolve to
>>> interface address. I believe other nodes has similar setting (so node2
>>> resolved on node2 is again 127.0.0.1)
>>>
>>>
>> Wow! What a shame! How could I miss it... So you're absolutely right,
>> thanks: that was the cause, an entry in /etc/hosts. On some machines I
>> removed it manually, but on others - didn't. Now I do it automatically
>> by sed -i -r "/^.*[[:space:]]$host([[:space:]]|\$)/d" /etc/hosts in the
>> initialization script.
>>
>> I apologize for the mess.
>>
>> So now I have only one place in corosync.conf where I need to specify a
>> plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
>> 0.0.0.0 there, I'm having a message "Service engine 'corosync_quorum'
>> failed to load for reason 'configuration error: nodelist or
>> quorum.expected_votes must be configured!'" in the logs (BTW it does not
>> say that I mistaked in bindnetaddr). Is there a way to completely untie
>> from IP addresses?
>>
>
> You can just remove whole interface section completely. Corosync will find
> correct address from nodelist.
>
> Regards,
>   Honza
>
>
>
>>
>>
>>  Please try to fix this problem first and let's see if this will solve
>>> issue you are hitting.
>>>
>>> Regards,
>>>    Honza
>>>
>>>  root at node1:/etc/corosync# ping -c1 -W100 node2 | grep from
>>>> 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
>>>>
>>>> root at node1:/etc/corosync# ping -c1 -W100 node3 | grep from
>>>> 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
>>>>
>>>>
>>>> With corosync.conf below, nothing works:
>>>> ...
>>>> nodelist {
>>>>    node {
>>>>      ring0_addr: node1
>>>>    }
>>>>    node {
>>>>      ring0_addr: node2
>>>>    }
>>>>    node {
>>>>      ring0_addr: node3
>>>>    }
>>>> }
>>>> ...
>>>> Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync Cluster Engine
>>>> ('2.3.3'): started and ready to provide service.
>>>> Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync built-in
>>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing transport
>>>> (UDP/IP Unicast).
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing
>>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] The network interface
>>>> [a.b.c.d] is now up.
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>> corosync configuration map access [0]
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cmap
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>> corosync configuration service [1]
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cfg
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>> corosync cluster closed process group service v1.01 [2]
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cpg
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>> corosync profile loading service [4]
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [WD    ] No Watchdog, try
>>>>
>>> modprobe
>>>
>>>> <a watchdog>
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [WD    ] no resources
>>>> configured.
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>> corosync watchdog service [7]
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Using quorum provider
>>>> corosync_votequorum
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Quorum provider:
>>>> corosync_votequorum failed to initialize.
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine
>>>> 'corosync_quorum' failed to load for reason 'configuration error:
>>>>
>>> nodelist
>>>
>>>> or quorum.expected_votes must be configured!'
>>>> Jan 14 10:47:44 node1 corosync[15062]:  [MAIN  ] Corosync Cluster Engine
>>>> exiting with status 20 at service.c:356.
>>>>
>>>>
>>>> But with IP addresses specified in ringX_addr, everything works:
>>>> ...
>>>> nodelist {
>>>>    node {
>>>>      ring0_addr: 104.236.71.79
>>>>    }
>>>>    node {
>>>>      ring0_addr: 188.166.54.190
>>>>    }
>>>>    node {
>>>>      ring0_addr: 128.199.116.218
>>>>    }
>>>> }
>>>> ...
>>>> Jan 14 10:48:28 node1 corosync[15155]:  [MAIN  ] Corosync Cluster Engine
>>>> ('2.3.3'): started and ready to provide service.
>>>> Jan 14 10:48:28 node1 corosync[15155]:  [MAIN  ] Corosync built-in
>>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] Initializing transport
>>>> (UDP/IP Unicast).
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] Initializing
>>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] The network interface
>>>> [a.b.c.d] is now up.
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>> corosync configuration map access [0]
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cmap
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>> corosync configuration service [1]
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cfg
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>> corosync cluster closed process group service v1.01 [2]
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cpg
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>> corosync profile loading service [4]
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [WD    ] No Watchdog, try
>>>>
>>> modprobe
>>>
>>>> <a watchdog>
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [WD    ] no resources
>>>> configured.
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>> corosync watchdog service [7]
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QUORUM] Using quorum provider
>>>> corosync_votequorum
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>> corosync vote quorum service v1.0 [5]
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: votequorum
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>> corosync cluster quorum service v0.1 [3]
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: quorum
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>>> {a.b.c.d}
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>>> {e.f.g.h}
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>>> {i.j.k.l}
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] A new membership
>>>> (m.n.o.p:80) was formed. Members joined: 1760315215
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QUORUM] Members[1]: 1760315215
>>>> Jan 14 10:48:28 node1 corosync[15156]:  [MAIN  ] Completed service
>>>> synchronization, ready to provide service.
>>>>
>>>>
>>>> On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse at redhat.com>
>>>> wrote:
>>>>
>>>>  Dmitry,
>>>>>
>>>>>
>>>>>  Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS
>>>>>> names
>>>>>> are definitely resolved), but in practice the cluster does not work,
>>>>>>
>>>>> as I
>>>
>>>> said above. So validations of ringX_addr in corosync.conf would be very
>>>>>> helpful in corosync.
>>>>>>
>>>>>
>>>>> that's weird. Because as long as DNS is resolved, corosync works only
>>>>> with IP. This means, code path is exactly same with IP or with DNS. Do
>>>>> you have logs from corosync?
>>>>>
>>>>> Honza
>>>>>
>>>>>
>>>>>
>>>>>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse at redhat.com>
>>>>>>
>>>>> wrote:
>>>
>>>>
>>>>>>  Dmitry,
>>>>>>>
>>>>>>>
>>>>>>>   No, I meant that if you pass a domain name in ring0_addr, there are
>>>>>>>
>>>>>> no
>>>
>>>> errors in logs, corosync even seems to find nodes (based on its
>>>>>>>>
>>>>>>> logs),
>>>
>>>> And
>>>>>
>>>>>> crm_node -l shows them, but in practice nothing really works. A
>>>>>>>>
>>>>>>> verbose
>>>
>>>> error message would be very helpful in such case.
>>>>>>>>
>>>>>>>>
>>>>>>> This sounds weird. Are you sure that DNS names really maps to correct
>>>>>>>
>>>>>> IP
>>>
>>>> address? In logs there should be something like "adding new UDPU
>>>>>>>
>>>>>> member
>>>
>>>> {IP_ADDRESS}".
>>>>>>>
>>>>>>> Regards,
>>>>>>>    Honza
>>>>>>>
>>>>>>>
>>>>>>>  On Tuesday, December 30, 2014, Daniel Dehennin <
>>>>>>>> daniel.dehennin at baby-gnu.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>   Dmitry Koterov <dmitry.koterov at gmail.com <javascript:;>> writes:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>   Oh, seems I've found the solution! At least two mistakes was in
>>>>>>>>> my
>>>>>>>>>
>>>>>>>>>> corosync.conf (BTW logs did not say about any errors, so my
>>>>>>>>>>
>>>>>>>>> conclusion
>>>>>
>>>>>> is
>>>>>>>>>> based on my experiments only).
>>>>>>>>>>
>>>>>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames!
>>>>>>>>>> They
>>>>>>>>>>
>>>>>>>>>>  simply
>>>>>>>>>
>>>>>>>>>  do not work, "crm status" shows no nodes. And no warnings are in
>>>>>>>>>>
>>>>>>>>> logs
>>>
>>>> regarding this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> You can add name like this:
>>>>>>>>>
>>>>>>>>>       nodelist {
>>>>>>>>>         node {
>>>>>>>>>           ring0_addr: <public-ip-address-of-the-first-machine>
>>>>>>>>>           name: node1
>>>>>>>>>         }
>>>>>>>>>         node {
>>>>>>>>>           ring0_addr: <public-ip-address-of-the-second-machine>
>>>>>>>>>           name: node2
>>>>>>>>>         }
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> I used it on Ubuntu Trusty with udpu.
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Daniel Dehennin
>>>>>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>>>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>>
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>>
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>>
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/
>>>> doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at corosync.org
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150116/77be1b7d/attachment.htm>


More information about the Pacemaker mailing list