Performance problems with 1.3.20

From: James Hartshorn <james.hartshorn#openx.org>
Date: Wed, 12 Aug 2009 12:50:00 -0700


Hi,

We run Haproxy on Amazon ec2 for http load balancing.  On Monday (august 11) we upgraded seven of our load balancers in two of our products to 1.3.20 from 1.3.15.8 (four servers, all of one product) and 1.3.18 (three servers, all of the other product).  We kept the config files the same.  We finished replacing the load balancers by 2300 UTC on aug 11, and at about 0900 UTC Aug 12 the first cluster (the one upgraded from 1.3.15.8) started showing performance issues, enough to cause our monitoring systems to go off.  Response times were several seconds.  Logging on to one of the load balancers I saw normal cpu and memory, but looking at netstat -anp I saw more than 30k lines there, the majority in TIME_WAIT state.  For background, the load balancers each point to the same pool of about 60 servers, which at the time were doing about 20-30 sessions per server, and the servers reporting about 80 requests per second (nominally 60% of peak).  At this point we put the old load balancers back into production and found them to be still working fine.  At around 1200 UTC Aug 12 a nearly identical state occured on the other set of load balancers (the ones upgraded from 1.3.18).

If anyone can see any issues please let me know.

I have pasted a representative haproxy.cfg file below:

# this config needs haproxy-1.1.28 or haproxy-1.2.1

global

#log 127.0.0.1 local0 info
#log 127.0.0.1 local1 notice
#log loghost local0 info

maxconn 75000
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
#debug
#quiet

defaults
#log global
mode http
#option httplog
option dontlognull
    option  redispatch

retries 3
maxconn 75000
contimeout 5000
clitimeout 50000
srvtimeout 2000

frontend openx *:80
#log global
maxconn 75000
       option forwardfor
       default_backend openx_ec2_hosted_http

backend openx_ec2_hosted_http
       mode http
       #balance roundrobin
       balance leastconn
       option abortonclose
       option httpclose
       #remove the line below if not 1.3.20
       #option httpchk HEAD /health.chk
       timeout queue 500
       #option forceclose

       server crt.hosted.bigd04 10.252.102.128:80 check maxconn 150 weight 2
...
       server crt.hosted.d03 10.252.203.175:80 check maxconn 50
...
      server crt.hosted.d75 10.209.81.155:80 check maxconn 30

frontend openx_ssl *:443
       #log    global
       mode tcp
       maxconn 75000
       option forwardfor
       default_backend openx_ec2_hosted_ssl

backend openx_ec2_hosted_ssl
       mode tcp
       #balance roundrobin
       balance leastconn
       option abortonclose
       option httpclose
       #option forceclose

       server crt.hosted.bigd04-ssl 10.252.102.128:443 check maxconn 150
...
       server crt.hosted.d03-ssl 10.252.203.175:443 check maxconn 30
Received on 2009/08/12 21:50

This archive was generated by hypermail 2.2.0 : 2009/08/12 22:00 CEST