[Pacemaker] [corosync] CoroSync's UDPu transport for public IP addresses?

Wed Jan 14 13:43:41 EST 2015

>
> > such messages (for now). But, anyway, DNS names in ringX_addr seem not
> > working, and no relevant messages are in default logs. Maybe add some
> > validations for ringX_addr?
> >
> > I'm having resolvable DNS names:
> >
> > root at node1:/etc/corosync# ping -c1 -W100 node1 | grep from
> > 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
> >
>
> This is problem. Resolving node1 to localhost (127.0.0.1) is simply
> wrong. Names you want to use in corosync.conf should resolve to
> interface address. I believe other nodes has similar setting (so node2
> resolved on node2 is again 127.0.0.1)
>

Wow! What a shame! How could I miss it... So you're absolutely right,
thanks: that was the cause, an entry in /etc/hosts. On some machines I
removed it manually, but on others - didn't. Now I do it automatically
by sed -i -r "/^.*[[:space:]]$host([[:space:]]|\$)/d" /etc/hosts in the
initialization script.

I apologize for the mess.

So now I have only one place in corosync.conf where I need to specify a
plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
0.0.0.0 there, I'm having a message "Service engine 'corosync_quorum'
failed to load for reason 'configuration error: nodelist or
quorum.expected_votes must be configured!'" in the logs (BTW it does not
say that I mistaked in bindnetaddr). Is there a way to completely untie
from IP addresses?

> Please try to fix this problem first and let's see if this will solve
> issue you are hitting.
>
> Regards,
>   Honza
>
> > root at node1:/etc/corosync# ping -c1 -W100 node2 | grep from
> > 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
> >
> > root at node1:/etc/corosync# ping -c1 -W100 node3 | grep from
> > 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
> >
> >
> > With corosync.conf below, nothing works:
> > ...
> > nodelist {
> >   node {
> >     ring0_addr: node1
> >   }
> >   node {
> >     ring0_addr: node2
> >   }
> >   node {
> >     ring0_addr: node3
> >   }
> > }
> > ...
> > Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync Cluster Engine
> > ('2.3.3'): started and ready to provide service.
> > Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync built-in
> > features: dbus testagents rdma watchdog augeas pie relro bindnow
> > Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing transport
> > (UDP/IP Unicast).
> > Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing
> > transmit/receive security (NSS) crypto: aes256 hash: sha1
> > Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] The network interface
> > [a.b.c.d] is now up.
> > Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
> > corosync configuration map access [0]
> > Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cmap
> > Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
> > corosync configuration service [1]
> > Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cfg
> > Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
> > corosync cluster closed process group service v1.01 [2]
> > Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cpg
> > Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
> > corosync profile loading service [4]
> > Jan 14 10:47:44 node1 corosync[15062]:  [WD    ] No Watchdog, try
> modprobe
> > <a watchdog>
> > Jan 14 10:47:44 node1 corosync[15062]:  [WD    ] no resources configured.
> > Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
> > corosync watchdog service [7]
> > Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Using quorum provider
> > corosync_votequorum
> > Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Quorum provider:
> > corosync_votequorum failed to initialize.
> > Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine
> > 'corosync_quorum' failed to load for reason 'configuration error:
> nodelist
> > or quorum.expected_votes must be configured!'
> > Jan 14 10:47:44 node1 corosync[15062]:  [MAIN  ] Corosync Cluster Engine
> > exiting with status 20 at service.c:356.
> >
> >
> > But with IP addresses specified in ringX_addr, everything works:
> > ...
> > nodelist {
> >   node {
> >     ring0_addr: 104.236.71.79
> >   }
> >   node {
> >     ring0_addr: 188.166.54.190
> >   }
> >   node {
> >     ring0_addr: 128.199.116.218
> >   }
> > }
> > ...
> > Jan 14 10:48:28 node1 corosync[15155]:  [MAIN  ] Corosync Cluster Engine
> > ('2.3.3'): started and ready to provide service.
> > Jan 14 10:48:28 node1 corosync[15155]:  [MAIN  ] Corosync built-in
> > features: dbus testagents rdma watchdog augeas pie relro bindnow
> > Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] Initializing transport
> > (UDP/IP Unicast).
> > Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] Initializing
> > transmit/receive security (NSS) crypto: aes256 hash: sha1
> > Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] The network interface
> > [a.b.c.d] is now up.
> > Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
> > corosync configuration map access [0]
> > Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cmap
> > Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
> > corosync configuration service [1]
> > Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cfg
> > Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
> > corosync cluster closed process group service v1.01 [2]
> > Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cpg
> > Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
> > corosync profile loading service [4]
> > Jan 14 10:48:28 node1 corosync[15156]:  [WD    ] No Watchdog, try
> modprobe
> > <a watchdog>
> > Jan 14 10:48:28 node1 corosync[15156]:  [WD    ] no resources configured.
> > Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
> > corosync watchdog service [7]
> > Jan 14 10:48:28 node1 corosync[15156]:  [QUORUM] Using quorum provider
> > corosync_votequorum
> > Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
> > corosync vote quorum service v1.0 [5]
> > Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: votequorum
> > Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
> > corosync cluster quorum service v0.1 [3]
> > Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: quorum
> > Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
> > {a.b.c.d}
> > Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
> > {e.f.g.h}
> > Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
> > {i.j.k.l}
> > Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] A new membership
> > (m.n.o.p:80) was formed. Members joined: 1760315215
> > Jan 14 10:48:28 node1 corosync[15156]:  [QUORUM] Members[1]: 1760315215
> > Jan 14 10:48:28 node1 corosync[15156]:  [MAIN  ] Completed service
> > synchronization, ready to provide service.
> >
> >
> > On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse at redhat.com> wrote:
> >
> >> Dmitry,
> >>
> >>
> >>> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
> >>> are definitely resolved), but in practice the cluster does not work,
> as I
> >>> said above. So validations of ringX_addr in corosync.conf would be very
> >>> helpful in corosync.
> >>
> >> that's weird. Because as long as DNS is resolved, corosync works only
> >> with IP. This means, code path is exactly same with IP or with DNS. Do
> >> you have logs from corosync?
> >>
> >> Honza
> >>
> >>
> >>>
> >>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse at redhat.com>
> wrote:
> >>>
> >>>> Dmitry,
> >>>>
> >>>>
> >>>>  No, I meant that if you pass a domain name in ring0_addr, there are
> no
> >>>>> errors in logs, corosync even seems to find nodes (based on its
> logs),
> >> And
> >>>>> crm_node -l shows them, but in practice nothing really works. A
> verbose
> >>>>> error message would be very helpful in such case.
> >>>>>
> >>>>
> >>>> This sounds weird. Are you sure that DNS names really maps to correct
> IP
> >>>> address? In logs there should be something like "adding new UDPU
> member
> >>>> {IP_ADDRESS}".
> >>>>
> >>>> Regards,
> >>>>   Honza
> >>>>
> >>>>
> >>>>> On Tuesday, December 30, 2014, Daniel Dehennin <
> >>>>> daniel.dehennin at baby-gnu.org>
> >>>>> wrote:
> >>>>>
> >>>>>  Dmitry Koterov <dmitry.koterov at gmail.com <javascript:;>> writes:
> >>>>>>
> >>>>>>  Oh, seems I've found the solution! At least two mistakes was in my
> >>>>>>> corosync.conf (BTW logs did not say about any errors, so my
> >> conclusion
> >>>>>>> is
> >>>>>>> based on my experiments only).
> >>>>>>>
> >>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
> >>>>>>>
> >>>>>> simply
> >>>>>>
> >>>>>>> do not work, "crm status" shows no nodes. And no warnings are in
> logs
> >>>>>>> regarding this.
> >>>>>>>
> >>>>>>
> >>>>>> You can add name like this:
> >>>>>>
> >>>>>>      nodelist {
> >>>>>>        node {
> >>>>>>          ring0_addr: <public-ip-address-of-the-first-machine>
> >>>>>>          name: node1
> >>>>>>        }
> >>>>>>        node {
> >>>>>>          ring0_addr: <public-ip-address-of-the-second-machine>
> >>>>>>          name: node2
> >>>>>>        }
> >>>>>>      }
> >>>>>>
> >>>>>> I used it on Ubuntu Trusty with udpu.
> >>>>>>
> >>>>>> Regards.
> >>>>>>
> >>>>>> --
> >>>>>> Daniel Dehennin
> >>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> >>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>
> >>>>> Project Home: http://www.clusterlabs.org
> >>>>> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> discuss mailing list
> discuss at corosync.org
> http://lists.corosync.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150114/66845cd3/attachment-0003.html>