Problem with haproxy under testload

From: Valentino Volonghi <dialtone#gmail.com>
Date: Thu, 19 Feb 2009 11:04:21 -0800


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, I've been trying to use haproxy 1.3.15.7 in front of a couple of erlang mochiweb servers in EC2.

The server alone can deal with about 3000 req/sec and I can hit it directly with ab or siege or tsung and see a similar result.

I then tried using nginx in front of the system and it was about to reach about the same numbers although apparently it couldn't really improve performance as much as I expected and instead it increases latency quite a lot.

I then went on to try with haproxy but when I use ab to benchmark with 100k connection with 1000 concurrency after 30k requests I see haproxy jumping to 100% CPU usage. I tried looking into a strace of what's going on and there are many EADDRNOTAVAIL errors which I suppose means that ports are finished, even though I increased the available ports with sysctl.

haproxy configuration is the following:

global

     maxconn 25000
     user haproxy
     group haproxy

defaults
     log global
     mode    http
     option  dontlognull
     option httpclose
     option forceclose
     option forwardfor
     maxconn 25000
     timeout connect      5000
     timeout client       2000
     timeout server       10000
     timeout http-request 15000
     balance roundrobin

listen adserver
     bind :80
     server ad1 127.0.0.1:8000 check inter 10000 fall 50 rise 1

stats enable
     stats uri /lb?stats
     stats realm Haproxy\ Stats
     stats auth admin:pass
     stats refresh 5s

Reading this list archives I think I have some of the symptoms explained in
these mails:

http://www.formilux.org/archives/haproxy/0901/1670.html This is caused by connect() failing for EADDRNOTAVAIL and thus considers the server down.

http://www.formilux.org/archives/haproxy/0901/1735.html I think I'm seeing exactly the same issue here.

A small strace excerpt:

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 18 fcntl64(18, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 setsockopt(18, SOL_TCP, TCP_NODELAY, [1], 4) = 0 connect(18, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRNOTAVAIL (Cannot assign requested address)
close(18)

or

recv(357, 0x9c1acb8, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(0, EPOLL_CTL_ADD, 357, {EPOLLIN, {u32=357, u64=357}}) = 0

The last one mostly to show that I'm using epoll, in fact speculative epoll,
but even turning it off doesn't solve the issue.

An interesting problem is that if I use mode tcp instead of mode http this doesn't
happen, but since it doesn't forward the client IP address (and I can't patch
an EC2 kernel) I can't do it.

ulimit-n showed by haproxy is 50k sockets, well above maxconn and well above
the 30k wehere it breaks.

sysctl.conf has the following settings:

# the following stops low-level messages on console
kernel.printk = 4 4 1 7
fs.inotify.max_user_watches = 524288
# some spoof protection

net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1
# General gigabit tuning:

net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 16384 33554432
net.ipv4.tcp_wmem = 4096 16384 33554432
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500

vm.min_free_kbytes = 65536
vm.swappiness = 0
net.ipv4.ip_local_port_range = 25000 65535

Everything runs on an ubuntu 8.04 with 2.6.21.7. Is there anything that I get
spectacularly wrong? Do you need more strace output?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkmdrTUACgkQ9Llz28widGXofwCfaLI1/BYqRxdyRBbuVTxjCgPS K1kAnRhe9c7gkHgR65kqULvVibHkl++T
=e6kt
-----END PGP SIGNATURE----- Received on 2009/02/19 20:04

This archive was generated by hypermail 2.2.0 : 2009/02/19 21:15 CET