This is a relatively new setup (under a week), but had problems yesterday as load increased.
This problem was reproducible in both the latest 1.2 (tried 1.2 after problems) and 184.108.40.206. With large values for timeout for clitimeout (also srvrtimeout, but to a lesser extent), I ran into some problems.
The server would close the connection (and end up in CLOSE_WAIT on the haproxy machine I think), and still be counted as a connection for a long time. Would be fully closed on the actual servers.
That caused the number of tracked connections to run high, but that's not really the problem. the main problem is. after awhile under heave load, under errors, conn for the backend line it would increase but none of the servers would show any increase in errors or warnings, and none of the sessions for servers or backend were at the max. Connection failure rate would start slow and quickly speed up. Also noticed higher latency prior to failure, and the CPU seems to go to 100% at the same time instead of the normally only a few %.
It's as if I was reaching some limit, but from what I could tell no limit was being reached.
After much head scratching and several failures, I set both clitimeout and srvrtimeout to 24000, which is closer to what the sever will close after being idle (previously clitimeout was 30min). Just curious what limit I may have reached, and why it didn't seem to log it decently anywhere.. Cur connection are probably at about 10% of what I was seeing before adjusting the timeouts down, but nothing in the log, etc.
I would expect if the server side closes, haproxy would close it's client side, and so wouldn't hurt to have an extra long timeout. Obviously that's not the case. I am working ok for now, but am a little concerned about the backend errors where it didn't try a server and didn't log anywhere as to why and AFAIK didn't reach any limits. Received on 2009/01/28 16:57
This archive was generated by hypermail 2.2.0 : 2009/01/28 17:00 CET