[ANNOUNCE] haproxy 1.3.15.1 and haproxy-1.3.14.5

From: Willy Tarreau <w#1wt.eu>
Date: Sun, 25 May 2008 23:39:07 +0200


Hi all!

I've updated haproxy 1.3.14 and 1.3.15 with the few pending fixes, namely :

The buffer flush bug would occasionally cause truncated stats reports, which is kind of weird, especially for CSV outputs which are generally consumed by automated monitoring/reporting scripts. I think I have already hit this bug because I have already got a truncated stats page but I thought it was my browser which caused it.

The second problem was weird too but less likely to happen. I noticed during performance testing that I got very long response times if the CPU was saturated processing 80000 sockets at gigabit speed (40000 concurrent connections pushing data from the server to the client). After a long analysis, it turned out that the speculative I/O poller (sepoll) was very efficient in such a workload, and almost all the sessions were processed without any polling. The problem was that for 40000 speculative events processed, only a few poll events were processed (about 100), so this resulted in the listen sockets being rarely returned by epoll_wait().

Connection times of up to 40 seconds have been observed to get the stats page on a 3.4 GHz Pentium 4 under such a load, which clearly is unacceptable! Interestingly, at only 10000 connections (20000 sockets), the problem was less noticeable, and it seems this is because the CPU had just enough time to process all speculative events before the server had time to push new data to all sockets, so that the number of speculative sockets remained medium.

The first solution I found against this problem was to allow epoll_wait() to return as much events as have been processed during speculative I/O. That way it cannot starve, and response times have gone down from 40 seconds to about 1 second. This is not fantastic, but should help maintainers of big download sites who see their CPU reach 100%.

I find it important to go further, but it will require more invasive changes. The principle is to apply some prioritization on file descriptors. A listener or a socket waiting for a connect() to happen would have a high priority, while a streamer will have a lower priority. Some work has begun towards this direction in the master branch, consisting in detecting streamers. I really want the response time to remain very low (in the order of milliseconds) even when the machine is saturated, as long as it is not dropping packets, of course.

Also, since no regression was reported between 1.3.14 and 1.3.15, I now consider the last one the recommended branch (but 1.3.14 remains maintained).

Last, I recently noticed one user of 1.2.17 had trouble building this version on FreeBSD. It then came to my attention that I've been having a merge for this problem in the tree for more than a year, along with another fix. So I decided to release 1.2.18 for those users still using 1.2. Judging by the number of build reports, there are not many of them!

I've rebuilt all three versions on Linux/x86 and Solaris/Sparc.

Please find updates here :  

   http://haproxy.1wt.eu/download/1.3/src/    http://haproxy.1wt.eu/download/1.3/bin/

Regards,
Willy Received on 2008/05/25 23:39

This archive was generated by hypermail 2.2.0 : 2008/05/25 23:45 CEST