Hello Matthieu,
On Wed, Nov 07, 2007 at 02:41:57PM +0100, Matthieu Huguet wrote:
> Hello,
>
> In the documentation, there is an example of TCP tuning for 2.4 kernels.
>
>
>
> Are there a lot of differences for 2.6 kernels?
Normally not many. However, I think I should refresh the tuning guide even for 2.4 since it does not contain much of the feedback gathered from experience with real traffic.
> Does anyone have examples or guidelines for 2.6 and Haproxy ?
If you use a 2.6 kernel, use a *very* recent one. The major reason I've avoided 2.6 for a long time is because of the buggy O(1) scheduler before 2.6.23 which can sometimes cause very long pauses. Kernels starting around 2.6.18/2.6.20 have seen improvements in this area to reduce the issue, but it was frequent to observe pauses of several seconds on a machine with a medium load (above 50%). I've got very bad feedback from 2.6.8 or 2.6.9 as it was shipped with one Fedora, and possibly RHEL4. Ideally, you should go for 2.6.23 which replaces the old scheduler with a correct one, but if you stick to 2.6.20 or 2.6.22.x, it should be good enough for many situations.
To achieve a high performance level, you should also ensure that all ip_conntrack modules are unloaded, and that HyperThreading technology is disabled if you use Intel processors.
> We are using Haproxy on a 1000Mbps Ethernet link and have some troubles after 3000+ simultaneous connections: connection establishment is sometime very long (10 or 20 sec).
Generally, this is caused by packet drops in the accept() path for instance, due to failure to create a session (see below).
> The load balancer and backends are not CPU/memory overloaded, so we are looking for problems on TCP configuration on the load balancer.
> net.ipv4.netfilter.ip_conntrack_max=65536
Bad, very bad. First, ip_conntrack is loaded, and second it has a very small limit. With the parameters below, whatever you do, a connection will last at least for 30 seconds (even more if you include the 10 seconds in the CLOSE state). So a limit of 65536 connections limits you to 65536/30 ~= 2000 connections per second. Since haproxy is a proxy, you get two conntrack entries per connection (one on the client side, one on the server side), which means an effective limit below 1000 connections/s. You should increase this value by an order of magnitude. Last time I had to tweak it, I found that I could manage 1.2 millions of connections per 512 MB of RAM.
Also, in 2.4, there was a "hashsize" parameter on the ip_conntrack module. I don't remember if it still exists on 2.6, but keep it very high too (at least 1/10 of the ip_conntrack_max).
> net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait=30
> net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait=15
> net.ipv4.tcp_max_syn_backlog = 4096
I suggest that you set this value to at least haproxy's maxconn + a small margin. It says how many connection requests may be pending at a time. Under harsh network conditions, it is possible that you reach very high value here and that all SYNs above this limit are dropped.
> net.ipv4.tcp_timestamps = 0
If you have another firewall between the client and the server and which is able to randomize sequence numbers (eg: pf on OpenBSD, PIX, FWSM, ...), then you should enable tcp_timestamps so that they won't drop random connections when some clients reuse the same source port too fast.
> net.ipv4.tcp_rmem = 16384 65536 524288
> net.ipv4.tcp_wmem = 16384 349520 699040
Those values can induce very large memory consumption, or a failure to allocate memory in some situations. They define the per-socket buffer sizes. They are good when you have a small number of sockets with a high data rate (eg: samba) but not necessarily good when you have a large number of clients with a small bitrate each. I'm used to limit to something around 4, 8 and 16kB respectively. You may want to check.
I'm also used to set this :
Also, check net.ipv4.tcp_mem. It contains the min and max number of pages (4kB on x86) allocated to TCP. If it's too low, you may never run out of memory but still drop packets. If the box runs only haproxy and nothing else, I'm used to set the values to 1/8, 3/16 and 1/4 of the RAM size (and divide them by 4 due to the page size). For instance, with 256 MB RAM: 8192, 12288, 16384.
Last, try with nbproc=1. Nbproc was first implemented to circumvent per-process FD limits, but it can have more negative effect than positive in many cases. On a dual-processor machine, you can already nearly saturate both processors with nbproc=1 because 85-95% of the time is spend in the system, and having more haproxy processes does not bring much except scheduling troubles and cache thrashing between CPUs.
Hoping this helps,
Willy
-- EXOSEC - ZAC des Metz - 3 Rue du petit robinson - 78350 JOUY EN JOSAS N°Indigo: 0 825 075 510 - Accueil: +33 1 30 67 60 65 - Fax: +33 1 72 89 80 19 Site web : http://www.exosec.fr/Received on 2007/11/07 22:42
This archive was generated by hypermail 2.2.0 : 2007/11/07 23:15 CET