[Pacemaker] Multicast pitfalls? corosync [TOTEM ] Retransmit List:

Fri Feb 14 08:23:39 EST 2014

hi stefan,

it seems that's more stable but after 2 minute the issue is back again.
hopefully isn't a bug because it can reproduce it
node2 sents only unicast at sequenz 256...

node1

omping 10.0.0.22 10.0.0.21

10.0.0.22 :   unicast, seq=257, size=69 bytes, dist=0, time=0.666ms

10.0.0.22 : multicast, seq=257, size=69 bytes, dist=0, time=0.677ms

10.0.0.22 :   unicast, seq=258, size=69 bytes, dist=0, time=0.600ms

10.0.0.22 : multicast, seq=258, size=69 bytes, dist=0, time=0.610ms

10.0.0.22 :   unicast, seq=259, size=69 bytes, dist=0, time=0.693ms

10.0.0.22 : multicast, seq=259, size=69 bytes, dist=0, time=0.702ms

10.0.0.22 :   unicast, seq=260, size=69 bytes, dist=0, time=0.674ms

10.0.0.22 : multicast, seq=260, size=69 bytes, dist=0, time=0.685ms

10.0.0.22 :   unicast, seq=261, size=69 bytes, dist=0, time=0.658ms

10.0.0.22 : multicast, seq=261, size=69 bytes, dist=0, time=0.669ms

10.0.0.22 :   unicast, seq=262, size=69 bytes, dist=0, time=0.834ms

10.0.0.22 : multicast, seq=262, size=69 bytes, dist=0, time=0.845ms

10.0.0.22 :   unicast, seq=263, size=69 bytes, dist=0, time=0.666ms

10.0.0.22 : multicast, seq=263, size=69 bytes, dist=0, time=0.677ms

10.0.0.22 :   unicast, seq=264, size=69 bytes, dist=0, time=0.675ms

10.0.0.22 : multicast, seq=264, size=69 bytes, dist=0, time=0.687ms

10.0.0.22 : waiting for response msg

10.0.0.22 : server told us to stop

^C

10.0.0.22 :   unicast, xmt/rcv/%loss = 264/264/0%, min/avg/max/std-dev =
0.542/0.663/0.860/0.035

10.0.0.22 : multicast, xmt/rcv/%loss = 264/264/0%, min/avg/max/std-dev =
0.553/0.675/0.876/0.035

node2:

10.0.0.21 : multicast, seq=251, size=69 bytes, dist=0, time=0.703ms
10.0.0.21 :   unicast, seq=252, size=69 bytes, dist=0, time=0.714ms
10.0.0.21 : multicast, seq=252, size=69 bytes, dist=0, time=0.725ms
10.0.0.21 :   unicast, seq=253, size=69 bytes, dist=0, time=0.662ms
10.0.0.21 : multicast, seq=253, size=69 bytes, dist=0, time=0.672ms
10.0.0.21 :   unicast, seq=254, size=69 bytes, dist=0, time=0.662ms
10.0.0.21 : multicast, seq=254, size=69 bytes, dist=0, time=0.673ms
10.0.0.21 :   unicast, seq=255, size=69 bytes, dist=0, time=0.668ms
10.0.0.21 : multicast, seq=255, size=69 bytes, dist=0, time=0.679ms
10.0.0.21 :   unicast, seq=256, size=69 bytes, dist=0, time=0.674ms
10.0.0.21 : multicast, seq=256, size=69 bytes, dist=0, time=0.687ms
10.0.0.21 :   unicast, seq=257, size=69 bytes, dist=0, time=0.618ms
10.0.0.21 :   unicast, seq=258, size=69 bytes, dist=0, time=0.659ms
10.0.0.21 :   unicast, seq=259, size=69 bytes, dist=0, time=0.705ms
10.0.0.21 :   unicast, seq=260, size=69 bytes, dist=0, time=0.682ms
10.0.0.21 :   unicast, seq=261, size=69 bytes, dist=0, time=0.760ms
10.0.0.21 :   unicast, seq=262, size=69 bytes, dist=0, time=0.665ms
10.0.0.21 :   unicast, seq=263, size=69 bytes, dist=0, time=0.711ms
^C
10.0.0.21 :   unicast, xmt/rcv/%loss = 263/263/0%, min/avg/max/std-dev =
0.539/0.661/0.772/0.037
10.0.0.21 : multicast, xmt/rcv/%loss = 263/256/2%, min/avg/max/std-dev =
0.583/0.674/0.786/0.033

2014-02-14 9:59 GMT+01:00 Stefan Bauer <stefan.bauer at cubewerk.de>:

> you have to disable all offloading features (rx, tx, tso...)
>
>
> Mit freundlichen Grüßen
>
> Stefan Bauer
> --
> Cubewerk GmbH
> Herzog-Otto-Straße 32
> 83308 Trostberg
> 08621 - 99 60 237
> HRB 22195 AG Traunstein
> GF Stefan Bauer
>
> Am 14.02.2014 um 09:40 schrieb "Beo Banks" <beo.banks at googlemail.com>:
>
> ethtool -K eth0 tx off
> ethtool -K eth1 tx off
>
> same result...retransmit issue
>
>
> 2014-02-14 9:31 GMT+01:00 Beo Banks <beo.banks at googlemail.com>:
>
>> i have also try
>>
>> "No more delay when you disable multicast snooping on the host:"
>>
>> echo 0 > /sys/devices/virtual/net/br1/bridge/multicast_router
>> echo 0 > /sys/devices/virtual/net/br1/bridge/multicast_snooping
>>
>>
>> 2014-02-14 9:28 GMT+01:00 Beo Banks <beo.banks at googlemail.com>:
>>
>> @jan and stefan
>>>
>>> must i set it for both bridges
>>> eth1 (br1) eth0 (br0) on the host or guest ?
>>>
>>>
>>> 2014-02-14 9:06 GMT+01:00 Jan Friesse <jfriesse at redhat.com>:
>>>
>>> Beo,
>>>> do you experiencing cluster split? If answer is no, then you don't need
>>>> to do anything. Maybe network buffer is just filled. But, if answer is yes,
>>>> try reduce mtu size (netmtu in configuration) to value like 1000.
>>>>
>>>> Regards,
>>>>   Honza
>>>>
>>>> Beo Banks napsal(a):
>>>>
>>>>> Hi,
>>>>>
>>>>> i have a fresh 2 node cluster (kvm host1 -> guest = nodeA | kvm host2
>>>>> ->
>>>>> guest = NodeB) and it seems to work but from time to time i have a lot
>>>>> of
>>>>> errors like
>>>>>
>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185 186
>>>>> 187
>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185 186
>>>>> 187
>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185 186
>>>>> 187
>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185 186
>>>>> 187
>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185 186
>>>>> 187
>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185 186
>>>>> 187
>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>> i used the newest rhel 6.5 version.
>>>>>
>>>>> i have also already try solve the issue with
>>>>> echo 1 > /sys/class/net/virbr0/bridge/multicast_querier (host system)
>>>>> but no chance...
>>>>>
>>>>> i have disable iptables,selinux..same issue
>>>>>
>>>>> how can solve it?
>>>>>
>>>>> thanks beo
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/
>>>>> doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/
>>>> doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>
> _______________________________________________
>
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> Project Home: http://www.clusterlabs.org
>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140214/d0ddd1d7/attachment-0003.html>