Re: nbproc>1, ksoftirqd and response time - just can't get it

From: Dmitriy Samsonov <dmitriy.samsonov#gmail.com>
Date: Tue, 19 Jul 2011 04:25:29 +0400


Hi!

>
> Fine, this is a lot better now. Since you're running at 2000 concurrent
> connections, the impact on the cache is noticeable (at 32kB per connection
> for haproxy, it's 64MB of RAM possibly touched each second, maybe only 16MB
> since requests are short and fit in a single page). Could you recheck at
> only 250 concurrent connections in total (125 per ab) ? This usually is
> the optimal point I observe. I'm not saying that it should be your target,
> but we're chasing the issues :-)
>

Using one/two clients with ab2 -c 250 -H 'Accept-Encoding: None' -n 100000000 http://testhost I can get:
Requests per second:    25360.45 [#/sec] (mean) Connection Times (ms)
              min  mean[+/-sd] median   max

Connect:        0    6  94.8      3    9017
Processing:     0    4  16.7      3    5002
Waiting:        0    0  10.3      0    5002
Total:          0   10  96.3      6    9022
Percentage of the requests served within a certain time (ms)
  50%      6
  66%      7
  75%      8
  80%      8
  90%      9
  95%     10
  98%     11
  99%     13
 100%   9022 (longest request)

Maximum session rate is 49000 (exact value).

Also, I've found reason why SYN packets were lost. It looks like it happened because of slow interrupt handling (same ksoftirq/0 at 100%), ifconfig eth0 reports dropped packets. I've tried to set ethtool -g eth0 rx 2040 and dropped packets are gone at least according to ifconfig.

Also I've upgraded kernel from 2.6.38-r6 (gentoo) to 2.6.39-r3 (gentoo) - nothing changed. At all. haproxy version is 1.4.8.

Altering somaxconn also did change anything.

Only changing affinity of irq/haproxy is affecting system, maximum rate changes from 25k to 49k. And that's it...

I'm including sysctl -a output, but I think all this happens because of some troubles with bnx2 driver - I just don't see any explanation why 70-80Mbs are saturating haproxy and irq handling (lost packets!). I have an option to try 'High Performance 1000PT Intel Network Card' could it be any better or I should try find solution for current configuration?

My final task is to handle DDoS attacks with flexible and robust filter available. Haproxy is already helping me to stay alive under ~8-10k DDoS bots (I'm using two servers and DNS RR in production), but attackers are not sleeping and I'm expecting attacks to continue with more bots. I bet they will stop at 20-25k bots. Such botnet will generate approx. 500k session rate. and ~1Gbps bandwidth so I was dreaming to handle it on this one server with two NIC's bonded giving me 2Gbps for traffic:)

> > Typical output of one of two ab2 running is:
> >
> > Server Software:
> > Server Hostname:        nohost
> > Server Port:            80
> >
> > Document Path:          /
> > Document Length:        0 bytes
> >
> > Concurrency Level:      1000
> > Time taken for tests:   470.484 seconds
> > Complete requests:      10000000
> > Failed requests:        0
> > Write errors:           0
> > Total transferred:      0 bytes
> > HTML transferred:       0 bytes
> > Requests per second:    21254.72 [#/sec] (mean)
> > Time per request:       47.048 [ms] (mean)
> > Time per request:       0.047 [ms] (mean, across all concurrent requests)
> > Transfer rate:          0.00 [Kbytes/sec] received
> >
> > Connection Times (ms)
> >               min  mean[+/-sd] median   max
> > Connect:        0   34 275.9     11   21086
>
> This one means there is packet loss on SYN packets. Some requests
> take up to 4 SYN to pass (0+3+6+9 seconds). Clearly something is
> wrong, either on the network or more likely net.core.somaxconn.
> You have to restart haproxy after you change this default setting.
>
> Does "dmesg" say anything on either the clients or the proxy machine ?
>
> > Processing:     0   13  17.8     11     784
> > Waiting:        0    0   0.0      0       0
> > Total:          2   47 276.9     22   21305
> >
> > Percentage of the requests served within a certain time (ms)
> >   50%     22
> >   66%     26
> >   75%     28
> >   80%     30
> >   90%     37
> >   95%     41
> >   98%     47
> >   99%    266
> >  100%  21305 (longest request)
> >
> > Typical output of vmstat is:
> > dex9 ipv4 # vmstat 1
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> > wa
> >  1  0      0 131771328  46260  64016    0    0     2     1  865  503  1  6
> > 94  0
> >  1  0      0 131770688  46260  64024    0    0     0     0 40496 6323  1  9
> > 90  0
>
> OK, so 1% user, 9% system, 90% idle, 0% wait at 40k int/s. Since this is
> scaled to 100% for all cores, it means that we're saturating a core in the
> system (which is expected with short connections).
>
> I don't remember if I asked you what version of haproxy and what kernel you
> were using. Possibly that some TCP options can improve things a bit.
>
> > Also, I've checked version of NIC's firmware:
> > dex9 ipv4 # ethtool -i eth0
> > driver: bnx2
> > version: 2.0.21
> > firmware-version: 6.2.12 bc 5.2.3
> > bus-info: 0000:01:00.0
>
> OK, let's hope it's fine. I remember having seen apparently good results
> with version 4.4 from what I recall, so this one should be OK.
>
> > Moreover, I've tried launching two ab2 localy:
> > dex9 ipv4 # ab2 -c 1000 -H 'Accept-Encoding: None' -n 10000000
> > http://localweb/
> > This is ApacheBench, Version 2.3 <$Revision: 655654 $>
> > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> > Licensed to The Apache Software Foundation, http://www.apache.org/
> >
> > Benchmarking inbet.cc (be patient)
> > Completed 1000000 requests
> > Completed 2000000 requests
> > ^C
> >
> > Server Software:
> > Server Hostname:        localweb
> > Server Port:            80
> >
> > Document Path:          /
> > Document Length:        0 bytes
> >
> > Concurrency Level:      1000
> > Time taken for tests:   104.583 seconds
> > Complete requests:      2141673
> > Failed requests:        0
> > Write errors:           0
> > Total transferred:      0 bytes
> > HTML transferred:       0 bytes
> > Requests per second:    20478.13 [#/sec] (mean)
> > Time per request:       48.833 [ms] (mean)
> > Time per request:       0.049 [ms] (mean, across all concurrent requests)
> > Transfer rate:          0.00 [Kbytes/sec] received
> >
> > Connection Times (ms)
> >               min  mean[+/-sd] median   max
> > Connect:        0   38 352.7      6   21073
> > Processing:     0   10  49.7      7   14919
> > Waiting:        0    0   0.0      0       0
> > Total:          1   48 365.8     13   21078
> >
> > Percentage of the requests served within a certain time (ms)
> >   50%     13
> >   66%     19
> >   75%     26
> >   80%     35
> >   90%     36
> >   95%     37
> >   98%     39
> >   99%     67
> >  100%  21078 (longest request)
> >
> > Two such ab2 processes are running both at 100% and saturating haproxy to
> > 100%. 'Cur' session rate is also around 40-44k/s.
>
> Fine, so those are the exact same numbers, with the same issue with packet
> losses.
>
> > Should I get rid of dell r410 and replace it with Core i5?:)) Being serious,
> > is there any other tips or tricks I can try? To see those amazing 100k/s
> > session rate?
>
> Two things to test first as indicated above :
>  1) retest with less concurrency from ab to see if things improve
>  2) increase /proc/sys/net/core/somaxconn to 10000 or so
>
> Next, if things don't get any better, please post the output of sysctl -a.
>
> Hmmm please also note that when reaching 300k connections/s on the core i5,
> it was done with 10Gb NICs which have an extremely low latency and nice TCP
> stateless optimizations. I'm used to see much better results with them than
> with gig NICs even at sub-gig rate. But anyway, more than 100k is expected
> from such a machine.
>
> For instance, I'm attaching a capture of a test I caught one year ago on my
> PC (Core 2 duo 2.66 GHz at that time), and which exhibits 212ksess/s. I
> think it was a bench of TCP connections, not HTTP sessions, but still even
> if we double the number of TCP packets exchanged over the wire, we should
> still see more than 100k on this machine.
>
> Regards,
> Willy
>

Received on 2011/07/19 02:25

This archive was generated by hypermail 2.2.0 : 2011/07/19 02:30 CEST