Re: High-volume HAProxy deployment tips?

From: Willy Tarreau <>
Date: Fri, 19 Dec 2008 07:08:24 +0100

On Fri, Dec 19, 2008 at 12:13:18AM +0100, Benoit wrote:
> Willy Tarreau a écrit :
> >>As for us, we had to double the haproxy boxes to get past the 40Mbps of
> >>http traffic ....
> >>
> >
> >what type of traffic do you have and what type of network cards do you
> >have ?
> >The smallest machines I have (fanless AMD Geode 500 MHz, 2.5W of average
> >power
> >consumption) already supports 100 Mbps and up to 2000 connections/s with a
> >dirt cheap VIA Rhine NIC. So there's definitely a problem with your setup !
> >
> Server is an HP blade, with 4 Broadcom Corporation NetXtreme II BCM5708S
> port, bonded two by two.
> and it's a bi dual core Xeon 5150#2.66Ghz cpu with 4Gb of memory system
> dedicated to haproxy.
> The cpu load is low, very very low, however when at 40 or 50Mbps
> connection time start to go up to the roof
> ( up to 3/4s sometimes, instead of 180ms of total processing time), and
> activating the backup server fixed this
> so it isn't of problem with the backends

OK I understand better now. At a customer's, we've been using HP blades with BNX2 NICs too, and we disabled them. The packet loss is terrible on those crappy NICs. Please run "ethtool -s eth0 | grep discard", you'll find what I'm speaking about. We had 2 tg3 in the same blades, on which there is no problem. If you run a tcpdump on your system, you'll find a lot of SACKs indicating lost segments.

I have tested these NICs with an RHEL5 kernel, and with a 2.6.25. The former showed a very high loss rate, the later less. You can reduce the loss rate by playing with interrupts assignment between NICs and CPUs, though you'll never make them disappear. I think the firmware and/or the NIC is completely buggy in fact.

Once we moved to the tg3, we got about 5% of one CPU for approx 100 Mbps of traffic, which is what we should expect from such a machine. Moreover, we did not have anymore request in the 3s range (TCP retransmit).

I hope for you that you have different NICs in the machine. If you don't have, try to affort an intel e1000 NIC to plug into the machine. They show excellent performance for very low CPU usage.

> This appart it's standard http traffic, however if have a few acl
> matching done within haproxy to redirect to different
> backend

The ACLs don't eat much. A typical haproxy setup eats about 15% user and 85% system. Even if you add hundreds of ACLs and double the processing, you'll see figures like 30% user and 70% system, which means that the performance drop will be about (85-70)/85 = 17%. That's why you can safely do a lot of layer7 processing without hurting performance too much.

Willy Received on 2008/12/19 07:08

This archive was generated by hypermail 2.2.0 : 2008/12/19 07:15 CET