Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Thu, 19 Feb 2009 22:48:17 +0100

Hi Valentino,

On Thu, Feb 19, 2009 at 11:04:21AM -0800, Valentino Volonghi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi, I've been trying to use haproxy 1.3.15.7 in front of a couple of
> erlang mochiweb servers in EC2.
>
> The server alone can deal with about 3000 req/sec and
> I can hit it directly with ab or siege or tsung and see a
> similar result.
>
> I then tried using nginx in front of the system and it was
> about to reach about the same numbers although apparently
> it couldn't really improve performance as much as I expected
> and instead it increases latency quite a lot.
>
> I then went on to try with haproxy but when I use ab to
> benchmark with 100k connection with 1000 concurrency
> after 30k requests I see haproxy jumping to 100% CPU usage.
> I tried looking into a strace of what's going on and there are
> many EADDRNOTAVAIL errors which I suppose means that
> ports are finished,

You're correct, this is typically what happens in such a case. The increase of CPU usage is generally caused by two factors :

connection retries, which act as a traffic multiplier
immediate closure which causes the load injection tool to immediately send a new connection instead of having to wait a few ms for the application.

> even though I increased the available ports with sysctl.

could you try to reduce the number of ports to see if the problem still shows up after 30k reqs or if it shows up before ? Use 5k ports for instance. I'm asking this because I see that you have 25k for maxconn, and since those numbers are very close, we need to find which one triggers the issue.

> haproxy configuration is the following:
>
> global
> maxconn 25000
> user haproxy
> group haproxy
>
> defaults
> log global
> mode http
> option dontlognull
> option httpclose
> option forceclose
> option forwardfor
> maxconn 25000
> timeout connect 5000
> timeout client 2000
> timeout server 10000
> timeout http-request 15000
> balance roundrobin
>
> listen adserver
> bind :80
> server ad1 127.0.0.1:8000 check inter 10000 fall 50 rise 1
>
> stats enable
> stats uri /lb?stats
> stats realm Haproxy\ Stats
> stats auth admin:pass
> stats refresh 5s

Since you have stats enabled, could you check in the stats how many sessions are still active on the frontend when the problem happens ?

> Reading this list archives I think I have some of the symptoms
> explained in
> these mails:
>
> http://www.formilux.org/archives/haproxy/0901/1670.html

oops, I don't remember having read this mail, and the poor guy seems not to have got any response!

> This is caused by connect() failing for EADDRNOTAVAIL and thus considers
> the server down.

I'm not that sure about this one. There's no mention of any CPU usage, and traditionally this is the symptom of the ip_conntrack module being loaded on the machine.

> http://www.formilux.org/archives/haproxy/0901/1735.html
> I think I'm seeing exactly the same issue here.

There could be a common cause, though in John's test, it was caused by a huge client timeout, and maybe the client did not correctly respond to late packets after the session was closed (typically if a firewall is running on the client machines), which causes the sessions to last for as long as the client timeout.

> A small strace excerpt:
>
> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 18
> fcntl64(18, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
> setsockopt(18, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> connect(18, {sa_family=AF_INET, sin_port=htons(8000),
> sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRNOTAVAIL (Cannot
> assign requested address)
> close(18)

This one is a nice capture of the problem. The fact that the FD is very low (18) also makes me think that you will see very few connections on haproxy on the stats page, meaning that some resource in the system is exhausted (likely source ports).

> recv(357, 0x9c1acb8, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource
> temporarily unavailable)
> epoll_ctl(0, EPOLL_CTL_ADD, 357, {EPOLLIN, {u32=357, u64=357}}) = 0

This was is not related, it's normal behaviour : running in non-blocking mode, you try to read, there's nothing (EAGAIN), so you register this FD for polling.

> The last one mostly to show that I'm using epoll, in fact speculative
> epoll, but even turning it off doesn't solve the issue.

Agreed it's speculative epoll which produces the two lines above. If you just use epoll, you'll see epoll_ctl(EPOLL_CTL_ADD), then epoll_wait() then read(). But this will not change anything to the connect() issue the system is reporting.

> An interesting problem is that if I use mode tcp instead of mode http
> this doesn't
> happen, but since it doesn't forward the client IP address (and I
> can't patch
> an EC2 kernel) I can't do it.

this is because your load tester uses HTTP keep-alive by default, which tcp mode does not affect. In http mode with "option httpclose" and "option forceclose", keep-alive is disabled. If you comment out those two lines, the problem will disappear but only the first request in each session will have its x-forwarded-for header added.

> ulimit-n showed by haproxy is 50k sockets, well above maxconn and well
> above
> the 30k wehere it breaks.

ulimit-n is slightly above 2*maxconn (you need two FDs per connection, as well as a few for listeners and server health checks).

> sysctl.conf has the following settings:
>
> # the following stops low-level messages on console
> kernel.printk = 4 4 1 7
> fs.inotify.max_user_watches = 524288
> # some spoof protection
> net.ipv4.conf.default.rp_filter=1
> net.ipv4.conf.all.rp_filter=1
> # General gigabit tuning:
> net.core.rmem_max = 33554432
> net.core.wmem_max = 33554432
> net.ipv4.tcp_rmem = 4096 16384 33554432
> net.ipv4.tcp_wmem = 4096 16384 33554432
> net.ipv4.tcp_mem = 786432 1048576 26777216

This one above is wrong (though it's not our problem here). It says that you'll use a minimum of 3 GB, a default of 4 GB and a max of 100 GB for the TCP stack. The unit here is pages, not bytes.

> net.ipv4.tcp_max_tw_buckets = 360000
> net.core.netdev_max_backlog = 2500
> vm.min_free_kbytes = 65536
> vm.swappiness = 0
> net.ipv4.ip_local_port_range = 25000 65535

Could you check net.ipv4.tcp_tw_reuse, and set it to 1 if it's zero ?

> Everything runs on an ubuntu 8.04 with 2.6.21.7. Is there anything
> that I get
> spectacularly wrong? Do you need more strace output?

more strace output will not show anything beyond your capture above. Running a "netstat -an" (or the faster ss -an if you have it) would give a lot of information about the state of those 30k sockets. And please don't forget to check the stats page. You can even save the page, as it's self contained in a single HTML file.

Regards,
Willy Received on 2009/02/19 22:48

Re: Problem with haproxy under testload