[ClusterLabs] Can Bonding Cause a Broadcast Storm?

Andrei Borzenkov arvidjaar at gmail.com
Tue Nov 15 22:45:02 EST 2016


16.11.2016 02:48, Eric Robinson пишет:
> mode 1. No special switch configuration. spanning tree not enabled. I
> have 100+ Linux servers, all of which use bonding. The network has
> been stable for 10 years. No changes recently. However, this is the
> second time that we have seen high latency and traced it down to the
> behavior of one particular server. I'm wondering if there is
> something about bonding that could result in a temporary bridge
> loop.
> 

Bonding interface does not forward packets between members so I cannot
imagine how it can cause loop on external switches.

I'd rather look at MAC aging on switches - if server was shut off and
its MAC was forgotten by switches any packets to this server would be
effectively broadcast until sending hosts have aged MAC in turn and
start sending ARP.

So if there was high volume unicast traffic to this server this may
explain what you see.

> ________________________________ From: Jeremy Voorhis
> <jvoorhis at gmail.com> Sent: Tuesday, November 15, 2016 2:13:59 PM To:
> Cluster Labs - All topics related to open-source clustering welcomed 
> Subject: Re: [ClusterLabs] Can Bonding Cause a Broadcast Storm?
> 
> What bonding mode are you using? Some modes require additional
> configuration from the switch to avoid flooding. Also, is spanning
> tree enabled on the switches?
> 
> On Tue, Nov 15, 2016 at 1:26 PM Eric Robinson
> <eric.robinson at psmnv.com<mailto:eric.robinson at psmnv.com>> wrote: If a
> Linux server with bonded interfaces attached to different switches is
> rebooted, is it possible that a bridge loop could result for a brief
> period? We noticed that one of our 100 Linux servers became
> unresponsive and appears to have rebooted. (The cause has not been
> determined.) A couple of minutes afterwards, we saw a gigantic spike
> in traffic on all switches in the network that lasted for about 7
> minutes, causing latency and packet loss on the network. Everything
> was still reachable, but slowly. The condition stopped as soon as the
> Linux server in question became reachable again.
> 
> -- Eric Robinson
> 
> 
> _______________________________________________ Users mailing list:
> Users at clusterlabs.org<mailto:Users at clusterlabs.org> 
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
> http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________ Users mailing list:
> Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
> http://bugs.clusterlabs.org
> 





More information about the Users mailing list