[Pacemaker] [Semi-OT] To bridge or to bond?

Sun May 6 09:13:31 UTC 2012

> Hi all,
> 
> please excuse (and ignore) this mail when you think its not appropriate for
> this list or to faq.
> 
> We had our servers all connected via one gigabit switch and used bonds to
> have 2GB links for each of them (using drbd and pacemaker/corosync to keep
> our data distributed and services/machines up and running).
> As the switch constitudes a SPOF, we wanted to eliminate this and put a
> second GB-switch into the rack.
> Now I/we can't use the real bonding-modes anymore, only fail-over, tlb and
> alb. We don't really like the idea of fail-over because that means going
> back to 1GB data-rates. Using tlb we get nearly 2GBits total rates with
> 1GB per connection so that looks nice throughput wise. But for simple
> icmp-pings, 50-90% of pings are lost propably due to the switches
> re-learning the mac- addresses all the time. Also some tcp-connections
> seem to stall due to this. Not really a nice situation when
> desktop-virtualization and terminal servers are used in this scenario.
> 
> My questions:
> Is there something obvious I missed in the above configuration?(*)
> Would it improve the situation stability- and performance-wise when I use
> bridges instead of bonds to connect to the switches and let stp do its job?
> Would that work with clusters and drbd?
> Obviously the cleanest solution would be to use two stackable switches and
> make sure that they still do their job when one fails. But that is out of
> question due to the prices attached to the switches..
> 
> Thanks for your input on this and have a nice remaining weekend,
> 
> Arnold
> 
> (*) I haven't yet looked into the switches configuration if they have
> special options for such a scenario...

Hi,

please look of your switches are 802.3ad compatible. If the switches do not 
support 802.3ad over stacks you have to stay with active passive.

But even load balancing does not provide for real 50-50 traffic distribution. 
Traffic is loadbalanced according to MAC addresses or to layer3/4 parameters 
(IP address and tcp port). But sure what the setup in a mostly switched 
environment or a where the traffic comes in via one router.

See: http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding

Be sure that the switch speaks 802.3ad over switch stacks. Cisco Nexus does 
vitural port channel, but that is perhaps not the cheapest option. Be sure 
what mode for traffic distribution the switch supports.

Please check modes 5 or more 6 carefully. Perhaps there are some problems 
using different switches. tcpdump is your friend.

Please also check if line errors are recognized reliably by the bonding 
module.

Considering all the said, think about sticking with plain active-backup! Do 
you really need more that 1 Gbit/s?

Linux bridging module only speaks spanning tree, no rapid spanning tree. So 
you have outages of 30 seconds. This is too long for the most cluster 
applications like DRBD, corosync, etc. You would have to tune the options 
there.

Greetings,

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
Fax: (089) 620 304 13
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120506/09b1abe5/attachment-0004.sig>