Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Thu, 29 Jan 2009 07:26:01 +0100

On Wed, Jan 28, 2009 at 08:03:56PM -0500, John Lauro wrote:
> > On Wed, Jan 28, 2009 at 10:57:40AM -0500, John Lauro wrote:
> > However, there's a workaround for this. You can tell haproxy that
> > you want the connection to the server to be closed early, once the
> > request has been sent. This is achieved by "option forceclose".
>
> This is mode tcp and not mode http. I may have missed stating that in
> the message I sent.

> The documentation implies that forceclose is for http. Will it work
> for mode tcp?

no it won't.

> Also, the client can send more then
> one request, so I don't want to close early, so it might not be safe for
> this app???

not necessarily.

> Setting the timeouts to 24 seconds all but eliminated that
> symptom for this app, but better options may be useful for other
> protocols with longer timeouts.

Yes, I think that implementing the fast timeouts for the half-closed case might help for pure TCP cases.

(...)
> Thank you for the details. It's been awhile from when I looked at
> The tcp states and it seems like something is missing, I will look
> it over more later when I have a chance to think about it. Wouldn't
> the above idea of dual trimeouts be safe as long as you can somehow
> verify the other side received the last bits of data the proxy received
> were completely sent prior to closing?

it should, but you must keep in mind that you don't need to be sure the other end received the data, and you can't be sure either. Anyway, the goal of the fast timeout is to speed up closing when *one* side wants to close and the other does not respond.

> I don't know enough about the
> apis to know if that is reasonably possible (not a massive cpu or other
> resource drain) or not, but it should be tcp wise.

TCP is handling the shutdown notification itself. You send data, the buffer is full, the client does not seem to read it, and during that time, the slow timeout runs. When the server finally closes, we would switch to the faster timeout. When it expires, we send a shutdown to the client. TCP takes care of sending it after the data, so in normal situations, there's no risk of truncating data, as the client will only ACK the FIN when it has eaten all data before.

So while you don't know whether the other side has ACKed data, you don't care because TCP is responsible for handling this.

(...)
> I think the CPU load spiked quickly to 100%, and not gradually.

you make me think about something. Whenever I've encountered this, it was because I reached the limit of file descriptors. Have you configured an "ulimit-n" entry in the global section ? If so, you should remove it as it is automatically computed. You should try to increase the global maxconn parameter though.

> That
> said, was more worried about minimizing downtime, and didn't have much
> time to watch vmstat just prior to failure, etc... It would run at
> 100% cpu without problems for a short time.

I certainly understand, eventhough that's not the expected behaviour !

(...)
> That is entirely possible, as the clients are all over the internet...
> Probably not more then a 1000 connections a minute, so it's not tons
> of traffic, but was having the problem every 10 minutes at peek time.

well, 1000 connections a minute means 10000 possible connections after 10 minutes. You really need to ensure your config is properly tuned. Also, check that your ip_local_port_range is large enough to support a high number of connections. It is possible that the system is refusing to establish new connections to the server due to the lack of source ports.

> I did implement on a pair of servers, so hb_standby or hb_takeover
> did allow for fast switching between servers. Restarting haproxy
> would sometimes help, sometimes not.

"sometimes not" makes me worry here. Could you check that there is no "ip_conntrack" or "nf_conntrack" module loaded on the machine ?

> # ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 6400
> max locked memory (kbytes, -l) 32
> max memory size (kbytes, -m) unlimited
> open files (-n) 120000
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) unlimited
> cpu time (seconds, -t) unlimited
> max user processes (-u) 6400
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
> Hmmm... I just noticed the max user processes and pending
> Signals. Is it possible I was reaching one of those
> limits?

no, haproxy uses only one process and no signals.

> HAPROXY was reporting several thousand
> current connections when it was failing. After fixing the
> timeout, the cur max has barely went over 1000...

So this would really confirm a lack of proper closure from the clients.

Regards,
Willy Received on 2009/01/29 07:26

Re: Too large of timeouts