Hypermail

From: Vincent Bernat <bernat#luffy.cx>
Date: Wed, 04 May 2011 16:54:06 +0200

Hi!

I have tried to bench HAProxy 1.4.15 using two Spirent Avalanche. The HAProxy box has the following features:

HP Proliant DL380 G5
2 Xeon E5405 @ 2.00GHz
4 cores per CPU
4 GB RAM
2.6.35
conntrack has been disabled
2 Intel Gigabit controller using igb driver
32bit kernel, 32bit userland

driver: igb
version: 2.1.0-k2
firmware-version: 1.2-1

Those are multiqueue cards and their interrupts are spread on the 8 processors on TX and RX.

http://www.intel.com/products/server/adapters/pro1000pt-dualport/pro1000pt-dualport-overview.htm

The setup is fairly simple. The HAProxy box is connected to some Nortel 5530 switch using active/active bond (balance-xor on Linux side, MLT on Nortel side). Both Avalanche are also connected to this switch using 2 links. One of them act as a reflector (web server). Each link is mapped to a set of clients (for the regular Avalanche) or act as a set of servers (for the reflector).

Offloading is enabled.

rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on

MTU is set to 1500 (no jumbo frames)

╭─────────────────────────────────────────────────╮ │ 5530 switch ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ │ │ └┬┘ └┬┘ └┬┘ └┬┘ └┬┘ └┬┘ └─┘ └─┘ │ ╰────────────────┼───┼────┼───┼───┼───┼───────────╯

                  │   │    │   │   │   └──────────────┐
                  │   │    │   │   └──────────────┐   │
 ╭────────────────┼───┼─╮╭─┼───┼────────────────╮ │   │
 │ Avalanche     ┌┴┐ ┌┴┐││┌┴┐ ┌┴┐ Avalanche     │ │   │
 │ (clients)     └─┘ └─┘││└─┘ └─┘ (reflector)   │ │   │
 ╰──────────────────────╯╰──────────────────────╯ │   │
                                                  │   │
                                       ╭──────────┼───┼─╮
                                       │ HAProxy ┌┴┐ ┌┴┐│
                                       │         └─┘ └─┘│
                                       ╰────────────────╯

The Avalanche simulates 256 clients on each port to attach 4 IP that are configured in HAProxy. The Reflector simulates 4 web servers, 2 on each port. Those servers serve 1KB pages. Here is my configuration of haproxy :

global

           log 127.0.0.1   local0
           log 127.0.0.1   local1 notice
           user haproxy
           group haproxy
 	  nbproc 1
           daemon
           stats socket /var/run/haproxy.socket
 
   defaults
           log     global
           mode    http
           option  httplog
           option  dontlognull
 	  option  splice-auto
           retries 3
           option  redispatch
           contimeout      5s
           clitimeout      50s
           srvtimeout      50s
 
   listen poolbench
           bind    172.31.200.10:80
           bind    172.31.201.10:80
           bind    172.31.202.10:80
           bind    172.31.203.10:80
           mode    http
 	  option  splice-response
           stats   enable
           option  httpchk /
           option  dontlog-normal
           option  log-health-checks
           balance roundrobin
           server  real1 172.31.208.2:80
           server  real2 172.31.209.2:80
           server  real3 172.31.210.2:80
           server  real4 172.31.211.2:80

Build options :

   TARGET  = linux26
   CPU     = generic
   CC      = gcc
   CFLAGS  = -O2 -g -fno-strict-aliasing
   OPTIONS = USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_REGPARM=1

USE_PCRE=1 Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :

      sepoll : pref=400,  test result OK
       epoll : pref=300,  test result OK
        poll : pref=200,  test result OK
      select : pref=150,  test result OK

Total: 4 (4 usable), will use sepoll.

With this configuration, I get 10 000 HTTP req/s. The haproxy process takes 100% CPU. Changing "maxconn" or disabling splice does not change anything. If I use 6 haproxy process, I can get to 30 000 HTTP req/s. All haproxy takes 100% CPU in this case. Moreover, I am pretty sure that the Avalanche is not the bottleneck since we can bench more than 120 000 HTTP req/s with the same setup. I have tried to stick haproxy to 1 CPU (with taskset) and I still get 10 000 HTTP req/s.

Now, if I look at http://haproxy.1wt.eu/#perf, I can see that I should be able to achieve 40 000 HTTP req/s. This is four times what I am able to achieve. What is wrong with my setup? Why enabling/disabling splice does not affect my results? Is there a way to fetch the 2.6.27-wt5 used for the tests?

A side question now. Enabling the use of multiple processes would allow to leverage the power of modern multi-core machines (we now get 6 cores per CPU on recent servers). However, this is discouraged. One drawback is the inability to get reliable stats. Is this problem worked on? We could spawn some master process that exhibits the stat socket. This master will grab stats from the other processes using the same protocol as on the socket but using pipes. Stats will be aggregated and sent back to the client.

Thanks for any insight on the performance part. Received on 2011/05/04 16:54

Bench of haproxy