Re: [haproxy] Help tracking down a possible HAProxy outage...

From: hernan <hernan.silberman#gmail.com>
Date: Mon, 1 Dec 2008 21:14:56 -0800


On Sat, Nov 29, 2008 at 12:51 PM, Willy Tarreau <w#1wt.eu> wrote:
> On Fri, Nov 28, 2008 at 03:59:52PM -0800, hernan wrote:
>> Hey folks, I need some help analyzing an application outage this afternoon.
>> Here's what I see in my haproxy log. Client IP addresses have been
>> obscured and I've added my comments in [[]]'s:
>>
>> --------------------------------------------------------
>> Nov 28 16:21:18 127.0.0.1 haproxy[21447]:
>> 0.0.0.0:4408[28/Nov/2008:16:21:18.451] http app/<NOSRV> 0/0/0/45/151
>> 200 484 - - ----
>> 1999/1999/290/0 0/0 "POST /mysite/graniteamf/amf HTTP/1.1"
>> Nov 28 16:21:18 127.0.0.1 haproxy[21447]:
>> 0.0.0.0:50333[28/Nov/2008:16:21:18.603] http app/<NOSRV> 0/0/0/23/110
>> 200 484 - - ----
>> 1999/1999/290/0 0/0 "POST /mysite/graniteamf/amf HTTP/1.1"
>> Nov 28 16:21:18 127.0.0.1 haproxy[21447]:
>> 0.0.0.0:52577[28/Nov/2008:16:21:18.713] http app/<NOSRV> 0/0/0/42/146
>> 200 485 - - ----
>> 1999/1999/290/0 0/0 "POST /mysite/graniteamf/amf HTTP/1.1"
>> Nov 28 16:21:18 127.0.0.1 haproxy[21447]:
>> 0.0.0.0:50602[28/Nov/2008:16:21:18.860] http app/<NOSRV> 0/0/0/79/135
>> 200 7836 - - ----
>> 1999/1999/290/0 0/0 "POST /mysite/graniteamf/amf HTTP/1.1"
> ^^^^
> suspicious
>
>> [[ Looks like either no requests came for several minutes here which is
>> unlikely, or haproxy stopped accepting connections at this point. ]]
>>
>> Nov 28 16:29:42 127.0.0.1 haproxy[21447]:
>> 0.0.0.0:51556[28/Nov/2008:16:29:42.355] smartfox sf/<NOSRV>
>> -1/1/0/-1/61 0 92 - - ----
>> 2000/0/0/0 0/0 "<BADREQ>"
> ^^^^
> suspicious
>
> You likely have a global "maxconn" set to 2000, or a per-frontend maxconn
> set to 2000.

You're right, there was a maxconn=2000 declared in the defaults. Thanks for helping dissect the log calls, I have to assume that we did in fact have 2000 connections and we're investigating this. Since we started looking, the high water mark has been around 300 connections so whatever happened was an anomaly (in our code, in our usage patterns, or due to some malicious third party).

>> [[ The above log record is the only sign of life in the server and users
>> called en masse to complain about a service outage. ]]
>>
>> [[ HAproxy was restarted at this point. ]
>>
>> Nov 28 16:34:39 127.0.0.1 haproxy[4403]: Proxy smartfox started.
>> Nov 28 16:34:39 127.0.0.1 haproxy[4403]: Proxy http started.
>> Nov 28 16:34:39 127.0.0.1 haproxy[4403]: Proxy registration started.
>> Nov 28 16:34:39 127.0.0.1 haproxy[4403]: Proxy sf started.
>> Nov 28 16:34:39 127.0.0.1 haproxy[4403]: Proxy static started.
>> Nov 28 16:34:39 127.0.0.1 haproxy[4403]: Proxy app started.
>> [normal healthy operation after the restart]
>> Nov 28 16:34:39 127.0.0.1 haproxy[4403]:
>> 0.0.0.0:50523[28/Nov/2008:16:34:39.908] http app/<NOSRV> 0/0/0/1/26
>> 200 447 - - ----
>> 3/3/3/0 0/0 "POST /mysite/graniteamf/amf HTTP/1.1"
>> Nov 28 16:34:39 127.0.0.1 haproxy[4403]:
>> 0.0.0.0:50522[28/Nov/2008:16:34:39.895] http app/<NOSRV> 4/0/1/38/68
>> 200 485 - - ----
>> 3/3/3/0 0/0 "POST /mysite/graniteamf/amf HTTP/1.1"
>> --------------------------------------------------------
>>
>> All of the individual services behind HAProxy are monitored closely and the
>> only alerts I received were those that tested the application through
>> HAProxy. There didn't appear to be any network issues getting to the
>> HAProxy host. On the HAProxy host there were no OS errors in the system
>> logs and the machine looks effectively idle and healthy as usual.
>>
>> HAProxy's maxconn is set really high, I don't think we were even close to
>> the limit.
>
> Please double-check this, as the logs look too suspicious.
>
>> HAProxy version info is "HA-Proxy version 1.3.14.6 2008/06/21" running on
>> Linux (CentOS 5).
>
> You should upgrade to 1.3.14.10, as quite a few annoying bugs have been fixed
> since 1.3.14.6, one of which is related to the server timeout which could be
> ignored under some circumstances. This might cause dead connections to accumulate
> in presence of server errors, up to the point maxconn is reached.

The "recommended version" at http://haproxy.1wt.eu/ is 1.3.15.6. Shouldn't I just upgrade to that version or is it not considered production-ready?

>> Let me know if you can suggest additional things to look for in my HAProxy
>> configuration. I've been looking at the host and I don't see anything that
>> strikes me as out of the ordinary. I can share my haproxy.cfg if needed.
>
> it would obviously help.
>
> Please double-check your maxconn settings, and also consider upgrading. The
> upgrade might fix one part of the problem and mask the config limit I suspect
> though, so if you do, please keep a copy of your config so that we can study
> it.

Thanks for the response, Willy. I've been on this list for a while now and I'm always impressed at your level of engagement with your users.

hernan

> Regards,
> Willy
Received on 2008/12/02 06:14

This archive was generated by hypermail 2.2.0 : 2008/12/02 06:30 CET