Hypermail

From: Michael Fortson <mfortson#gmail.com>
Date: Fri, 16 Jan 2009 13:56:38 -0800

I have simplified this testing using a default cfg file (content-sw-sample.cfg).

Basically, it seems like heavily loading HAProxy causes some sort of networking resource exhaustion on the server.

Box 1:
haproxy

Boxes 2-4:
nginx (set up as one of the backends in the default config)

Running all tests from box 1:
ab -n 10000 -c 100 "box1:81/images/layout/blank.gif"

This test completes in less than 1 second, multiple times in a row. Eventually, however, it gets stuck. The really odd things, though, are that when the haproxy tests from Box1 are stuck:

hitting some of the Box2-4 cluster from Box1 will also fail (but not necessarily all -- some boxes might still be accessible)
hitting Boxes 2-4 from any other machine on the network works fine

It's like the route between Box1 and specific machines is getting stuck, and after a minute or two things get back to normal.

On Thu, Jan 15, 2009 at 10:40 PM, Michael Fortson <mfortson#gmail.com> wrote:
> Our server (a recent 8-core xeon server) is able to handle a few
> rounds of benchmarks, but then connections just seem to start hanging
> somewhere -- they don't show up in haproxy_stats. At the same time,
> every server listed on the stats page starts going red (even though
> they are clearly fine -- you can hit them directly without problem).
> It does not seem to be memory-related (no bulging swap file at any
> rate).
>
> My config file is as follows:
> ________
> global
> log 127.0.0.1 local0 notice
> #log 127.0.0.1 local1 alert
>
> daemon
> nbproc 2
> maxconn 32000
> ulimit-n 65536
> # and uid, gid, daemon, maxconn...
>
> defaults
> # setup logging
> log global
> option tcplog
> option httplog
> option dontlognull
>
> # setup statistics pages
> stats enable
> stats uri /haproxy_stats
> stats refresh 10
> stats auth xxxxxxxxxx:xxxxxxxxxxx
>
>
> mode http
> retries 3
> option abortonclose
> option redispatch
>
>
> timeout connect 80s # time in queue
> timeout client 30s # data from client (browser)
> timeout server 80s # data from server (mongrel)
>
> frontend rails *:7000
> # set connection limits
> maxconn 700
> default_backend mongrels
>
> backend mongrels
> fullconn 80
>
> option httpchk
> option abortonclose
> contimeout 4000
>
> balance roundrobin
>
> server webapp01-101 10.2.0.3:8101 minconn 1 maxconn 5 check
> inter 2s rise 1 fall 2
> server webapp01-102 10.2.0.3:8102 minconn 1 maxconn 5 check
> inter 2s rise 1 fall 2
> <snip> lots more of those over several machines... </snip>
>
> ## this is the service I'm testing against
> frontend webservers *:81
> mode tcp
> default_backend nginx
>
> backend nginx
> mode tcp
> balance leastconn
> contimeout 1000
> server webapp01-http 10.2.0.3:8080 check
> server webapp02-http 10.2.0.4:8080 check
> server webapp04-http 10.2.0.17:8080 check
> server webapp05-http 10.2.0.18:8080 check
> ___________
>
> the test I'm running is:
> ab -n 10000 -c 20 "nowhere.com:81/images/layout/blank.gif"
>
> ... from another machine on the LAN. It runs fine (and pretty fast)
> several times in a row, but then something happens; it'll hang
> somewhere in the test, and none of the requests will show up on the
> haproxy stats page. Sometimes Apache (which feeds into haproxy for its
> balancer for Rails requests) starts showing 503s as well.
>
>
> I'm wondering if I have some bad combination of haproxy configuration,
> or if I should look elsewhere for the cause... the system is running
> FC7.
>
> cheers
>
Received on 2009/01/16 22:56

Re: strange behavior under benchmarking