RE: Too large of timeouts

From: John Lauro <john.lauro#covenanteyes.com>
Date: Wed, 28 Jan 2009 20:03:56 -0500

> -----Original Message-----
> From: Willy Tarreau [mailto:w#1wt.eu]
>
> Hi John,
>
> On Wed, Jan 28, 2009 at 10:57:40AM -0500, John Lauro wrote:
> However, there's a workaround for this. You can tell haproxy that
> you want the connection to the server to be closed early, once the
> request has been sent. This is achieved by "option forceclose".

This is mode tcp and not mode http. I may have missed stating that in the message I sent. The documentation implies that forceclose is for http. Will it work for mode tcp? Also, the client can send more then one request, so I don't want to close early, so it might not be safe for this app??? Setting the timeouts to 24 seconds all but eliminated that symptom for this app, but better options may be useful for other protocols with longer timeouts.

> The following then happens just after the request is sent :
>
> 1) proxy says FIN to server => proxy-server connection goes FIN_WAIT1
> 2) server ACKs the FIN => proxy-server connection goes FIN_WAIT2
> 3) the server finally responds and closes => proxy-server connection
> is closed
> 4) proxy says FIN to client => client-proxy connection goes FIN_WAIT1
> 5) client says FIN to proxy => client-proxy connection is closed
>
> Even in the case the client is dead, the server connection is closed
> before we wait for the client, so the remaining dead conns lie on the
> client side and not on the server side.
>
> One thing that could be improved would be to support two timeouts
> per side : normal and fast. We would always use the normal timeout,
> unless we have sent a shutdown notification, in which case we would
> wait for the fast timeout. This can often be useful because it would
> speed up the wiping up of dead connections, but it's dangerous for
> a lot of protocols, including HTTP sometimes if servers take time
> to respond.

Thank you for the details. It's been awhile from when I looked at The tcp states and it seems like something is missing, I will look it over more later when I have a chance to think about it. Wouldn't the above idea of dual trimeouts be safe as long as you can somehow verify the other side received the last bits of data the proxy received were completely sent prior to closing? I don't know enough about the apis to know if that is reasonably possible (not a massive cpu or other resource drain) or not, but it should be tcp wise.

> > That caused the number of tracked connections to run high, but that's
> not
> > really the problem. the main problem is. after awhile under heave
> load,
> > under errors, conn for the backend line it would increase but none
> of the
> > servers would show any increase in errors or warnings, and none of
> the
> > sessions for servers or backend were at the max. Connection failure
> rate
> > would start slow and quickly speed up. Also noticed higher latency
> prior to
> > failure, and the CPU seems to go to 100% at the same time instead of
> the
> > normally only a few %.
>
> If your CPU goes high, I suspect you're on a system which does not
> support
> a fast poller or that you have not enabled a scalable polling mechanism
> at haproxy build time. Could you please run "haproxy -vv" so that we
> try
> to find what is missing here ?

I think the CPU load spiked quickly to 100%, and not gradually. That said, was more worried about minimizing downtime, and didn't have much time to watch vmstat just prior to failure, etc... It would run at 100% cpu without problems for a short time.  

> > I am working ok for now, but am a little concerned about the
> > backend errors where it didn't try a server and didn't log anywhere
> as to
> > why and AFAIK didn't reach any limits.
>
> If the problem happens a lot, it is possible that you have some
> firewalls
> between the client and haproxy which expire the session before it
> expires
> on haproxy (this too is a common problem in multi-level architectures).
> If you can't increase your firewall timeouts to cover the client's or
> haproxy's, then you can enable "option tcpka" on haproxy so that it
> sends
> keepalives. But warning, it is the system which decides the keepalive
> interval. On linux for instance, it's two hours by default, so your
> firewall must not expire a live session before that delay.

That is entirely possible, as the clients are all over the internet... Probably not more then a 1000 connections a minute, so it's not tons of traffic, but was having the problem every 10 minutes at peek time. I did implement on a pair of servers, so hb_standby or hb_takeover did allow for fast switching between servers. Restarting haproxy would sometimes help, sometimes not.

# ulimit -a

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 6400
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 120000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 6400
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Hmmm... I just noticed the max user processes and pending Signals. Is it possible I was reaching one of those limits? HAPROXY was reporting several thousand current connections when it was failing. After fixing the timeout, the cur max has barely went over 1000...

> Regards,
> Willy

Thank you for all your work on this project and the help on making it work! Received on 2009/01/29 02:03

This archive was generated by hypermail 2.2.0 : 2009/01/29 02:15 CET