Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Thu, 14 Jul 2011 22:07:57 +0200

Hi,

On Thu, Jul 14, 2011 at 10:21:56AM +0000, gidot wrote:
> Thanks Willy,
>
> I managed to fix the problem with the log. It's due to my entry in /etc/
> syslogd.conf. This thread http://www.serverphorums.com/read.php?
> 10,127228,127867 helped me out :).
>
> Here is the excerpt of my haproxy.log. Hope someone can enlighten me if there's
> anything obvious from this log that can help me to troubleshoot my problem.
> Here's from grep'ing 504:
>
> Jul 13 20:37:57 localhost haproxy[98507]: 213.47.109.71:51261 [13/
> Jul/2011:20:37:07.967] webjailfarm webjailfarm/wj01 7/0/0/-1/+50009 504 +194 -
> - sHVN 62/62/52/12/0 0/0 "GET /main.php?location=war HTTP/1.1"
(...)

All of these logs indicate that the server is simply not responding within 50 seconds. As frustrating as this can be, this is something quite common when servers get overloaded or when they try to access a locked resource. The two following ones however are more concerning :

> Jul 13 20:41:25 localhost haproxy[98507]: 188.123.218.31:4180 [13/
> Jul/2011:20:40:35.132] webjailfarm webjailfarm/wj08 2/0/0/-1/+50005 504 +194 -
> - sHVN 78/78/61/4/0 0/0 "GET /images/gamefavicons.png HTTP/1.1"
> Jul 13 20:43:36 localhost haproxy[98507]: 217.246.8.81:2284 [13/
> Jul/2011:20:42:46.796] webjailfarm webjailfarm/wj05 1/0/0/-1/+50003 504 +194 -
> - sHVN 56/56/46/0/0 0/0 "GET /emptyicon.gif HTTP/1.1"

I think that such resources are purely static and have no reason not to be quickly delivered. Is there any possibility that the same servers are accessed via other backend sections, or even directly without passing via haproxy ? I'm asking because what I suspect is that the server's connection limit is reached due to other activity, but the listening socket is not yet saturated, so our request lies in the server's backlog until a connection is released so that a process (or thread) can process the pending request (which did not happen in time here).

> And some others:
>
> Jul 13 20:38:08 localhost haproxy[98507]: 89.228.101.118:51199 [13/
> Jul/2011:20:38:08.893] webjailfarm webjailfarm/wj05 0/0/-1/-1/+1 503 +212 - -
> SCDN 44/44/37/7/+3 0/0 "POST /login.php HTTP/1.1"

The connection was referencing a server which was already detected as DOWN (hence the "D" flag), so the health checks have noticed the event. The connection was redispatched onto another server (wj05) but the connection failed there. It could be the same thing as above, but with the backlog full, so the system is rejecting extra connections instead of queuing them. It could also be that you restarted the server and the connections were attempted while the port was not bound.

> Jul 13 20:38:14 localhost haproxy[98507]: 188.101.27.150:61567 [13/
> Jul/2011:20:38:14.883] webjailfarm webjailfarm/wj06 13/0/1/-1/+14 502 +204 - -
> SHVN 53/53/47/7/0 0/0 "GET /js/scriptaculous.js?load=effects,slider HTTP/1.1"

The 502s normally indicate that the server broke the connection without responding. This can be the consequence of a server restart as well as it can indicate dying processes.

> At the moment we're still having problems with clients receiving 502 and 504
> errors. It was quiet for the first few days after we have tuned the box, but
> since 2 days ago, they're back.

If you check your stats page, you should see that your servers state are changing a lot. A server must not flap, it must have a steady state. In my opinion, the fact that they're seen down is not the cause of the problem but one of the consequences : something is blocking your servers or making them process requests slowly and at one point they can't even process health checks anymore. Requests are aborted on timeouts and checks fail, causing the server to be marked "down".

This is normally what happens when servers connection limit gets overflown. You may want to try to increase your MaxClients or equivalent. Be careful though, as this can imply a higher memory usage.

Another solution people generally like is to split dynamic/static contents, which is called content switching. You build a farm out of a very fast and scalable server such as nginx and send the static requests there. You keep the rest on current servers, the load should drop quite a bit.

> Btw, I tried to run the command echo "show sess" | socat stdio /var/run/haproxy/
> haproxy.sock, and some entries show that it's not forwarding to any server
> (none). Is this normal?
>
> [/root] # echo "show sess" | socat stdio /var/run/haproxy/haproxy.sock
> 0x800fbfc00: proto=tcpv4 src=178.190.178.184:52094 fe=webjailfarm
> be=webjailfarm srv=<none> ts=02 age=46s calls=1 rq
> [f=501000h,l=0,an=0eh,rx=3s,wx=,ax=] rp[f=001000h,l=0,an=00h,rx=,wx=,ax=] s0=
> [7,18h,fd=55,ex=] s1=[0,0h,fd=-1,ex=] exp=3s
(...)

It is normal for connections which have not yet sent a full request. In your case, the request buffer is empty so nothing was received from the client. Until you don't see too many of them, you don't have to worry, those can be port probes or similar. If you have enabled keep-alive on the client side, it is also possible that those are simply second requests of a single connection, and that the client maintains it just in case it needs it.

Regards,
Willy Received on 2011/07/14 22:07

Re: HAProxy - "504 Gateway Timeout" error.