Re: Source code questions on HAproxy performance

From: Willy Tarreau <w#1wt.eu>
Date: Wed, 11 Feb 2009 23:54:53 +0100


Hi Babu,

On Tue, Feb 10, 2009 at 04:55:06PM +0530, Babu N wrote:
> Hi,
>
> I am exploring haproxy code to understand how high performance
> numbers are realized. Could you please clarify a few queries. If this
> is not the correct alias, please let me know.

it is the right list.

> I am reproducing some information from http://haproxy.1wt.eu/ and
> marking my queries in blue:
> - O(1) event checker on systems that allow it (currently only
> Linux with HAProxy 1.2), allowing instantaneous detection of any
> event on any connection among tens of thousands.
> >> Is this achieved by using epoll ?

Yes, epoll on Linux, and kqueue on FreeBSD/OpenBSD

> - event aggregation : timing resolution is adapted to match the
> system scheduler's resolution. This allows many events to be
> processed at once without having to sleep when we're sure that we
> would have woken up immediately. This also leaves a large performance
> margin with virtually no degradation of response time when the CPU
> usage approaches 100%.
> >> Could you please point me in the code where this adaptation is done ?

I think this one has gone. The principle is that I observed in the past that when we called select() with a certain timer value on linux 2.2, this value was often rounded down, often causing wakeups for nothing because the timer had not elapsed. Thus I added the scheduler's period as a safety margin when calling select(). For instance, when I knew select() would be called with 2 ms delay on a 100Hz scheduler, I added the 10ms to the 2 to select() so that I could be sure that it wouldn't return too early.

This observation led to a second one, it is that at that time, select() wakeups were very expensive in haproxy due to the tasks being woken up, put to sleep, and all the FD map to check and rebuild. I figured that if I could force select() to wait slightly longer, it could collect more events at once and wake up less often. But while working on improving this I observed that this was not necessary because when the machine is heavily loaded, select() is already called less often and gathers more events at once.

Now that select() is only used in compatiblity mode, all those tricks have become worhtless.

> - reduced footprint for frequently and randomly accessed memory
> areas such as the file descriptor table which uses 4 bitmaps. This
> reduces the number of CPU cache misses and memory prefetching time.
> >> The "file descriptor table" referred above is kernel's table ? Or
> is it the fdtab memory in haproxy.c ? How is reduced footprint achieved ?

It is the FD_SETs I was refering here (using select). Only one bit is needed to indicate that an FD is being monitored or not, contrary to many poll-based implementations where a struct is needed for each FD. While this is still true for select(), this has become pretty irrelevant.

> - kernel TCP splicing
> >> I searched for "splice" to find the code which makes use of
> kernel TCP splicing. Is kernel TCP splicing not used in haproxy.c ?

yes, you should have found a call to tcpsplice() which has to be enabled at build time. In current releases, only Alexandre Cassen's implementation is supported (the one on l7sw.org). But with the upcoming 1.3.16, kernel 2.6's TCP splicing will be supported too (in fact it already works but the code is still in beta).

Hoping this helps,
Willy Received on 2009/02/11 23:54

This archive was generated by hypermail 2.2.0 : 2009/02/12 01:00 CET