Re: haproxy alleatory fails on config reload

From: Willy Tarreau <w#1wt.eu>
Date: Thu, 29 May 2008 22:32:06 +0200


Hi Pablo, Hi Krzysztof,

On Thu, May 29, 2008 at 10:11:41PM +0200, Pablo Escobar wrote:
> El Thursday 29 May 2008 21:55:20 Krzysztof Oledzki escribió:
> > On Thu, 29 May 2008, Pablo Escobar wrote:
> > > Hi Krzysztof,
> >
> > Hola Pablo, ;)
> >
> Hi again,
>
> I have just arrived home but still on the keyboard :)

I'm just replying quickly, I'm exhausted.

> > > Right now I am testing with version 1.3.14.2 but I have the same problem.
> > > I have done around 5 reloads without problem but finally I got the 503
> > > error.
> >
> > I have asked for 1.3.15 because it has many improvement in both logging
> > and redispatching.
> >
> > > On my last 503 error (which also was 1 minute long) I got this on haproxy
> > > log. As I said this only happens randomly as far as I know.
> >
> > There has to be an easy explanation. What we have to do is to track this
> > problem end eventually fix it.
> >
> > > I have also attached my haproxy config. ????????Maybe the problem is that I
> > > am using "option httpchk HEAD /" ? ????maybe I would get better results
> > > using "option httpchk HEAD /check.txt HTTP/1.0"?
> >
> > There should be no noticeable difference.
> >
> > > comments are in spanish, sorry :)
> >
> > No hay problema! ;)
> >
> > > I have also attached a image of the haproxy stats. Sorry I can´t get the
> > > full webpage on a image. I you want I can send you the url and password.
> > > At least you can see that the CHK colum is 0 on all my backends (thats is
> > > what Willy told that is important to check)
> >
> > Hm, stats looks OK: CHK/Conn/Retr/Redis all are zeros.
> >
> > > Sorry I couldnt send the 503 error logs. I have a really big syslog with
> > > many "503" on it. I tried to do a regex to take just the 503 error lines
> > > but it didn´t work.
> >
> > Something like
> > egrep "/[0-9\+]+ 503 " /var/log/haproxy
> > should do the trick.
> >
> > > I think after 12 hours at the lab isn´t the best moment to do
> > > regex´s :) Tomorrow in the morning I will try again.
> >
> > OK, please send logs from 1.3.15(.1) if that's possible.
> >
> Ok, I am going to use 1.3.15 again I will send you the logs.
>
>
> > Best regards,
> >
> > Krzysztof Ol??dzki
>
> On my last mail I forgot to tell you that I am running haproxy on debian etch
> x86_64 with kernel 2.6.18-6-amd64 #1 SMP
>
> where can I look for the req/sec on haproxy?? I cant find it on the 1.3.14.2
> stats web. At least you can see my web traffic stats here:
> http://bioinfo.cipf.es/webstats/bioinfo/awstats.bioinfo.cipf.es.html
>
> all that traffic goes through haproxy.
>
> here is the output of your regex. I can only see 503 errors for trac backend
> but I get the 503 error on all backends.
>
> May 29 15:47:20 127.0.0.1 haproxy[7298]: 127.0.0.1:45523
> [29/May/2008:15:47:20.914] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 2/2/0/0/0 0/0 "GET /trac/ HTTP/1.1"

I'm a bit puzzled here :
  "sC" means connection timeout
  "DN" means the connection was sent to a down server. But the total time is zero (/0 before 503 above).

So on the one hand, it means that the server was down (while you seem to see none down), and on the other hand it means that a null timeout was reached immediately when the request came in.

This smells awfully like a bug, but I see nothing like that between 1.3.14.2 and current. It would be good to test with checks disabled on server gen29 in the trac backend. That way we will know if this has anything to do with the checks or not. For this, you need to remove the "check" option on the server line.

Also, I've got very few feedback on x86_64. It is possible that some bug triggers on this architecture. You should try to rebuild haproxy in 32-bit (gcc -m32 ?) and see if the bug disappears.

> May 29 15:47:21 127.0.0.1 haproxy[7298]: 127.0.0.1:45574
> [29/May/2008:15:47:21.783] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 2/2/0/0/0 0/0 "GET /trac/ HTTP/1.1"
> May 29 15:47:22 127.0.0.1 haproxy[7298]: 127.0.0.1:45612
> [29/May/2008:15:47:22.240] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 5/5/0/0/0 0/0 "GET /trac/ HTTP/1.1"
> May 29 16:18:32 127.0.0.1 haproxy[8410]: 127.0.0.1:39391
> [29/May/2008:16:18:32.960] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 4/4/0/0/0 0/0 "GET /isacghtrac/ HTTP/1.1"
> May 29 16:18:33 127.0.0.1 haproxy[8410]: 127.0.0.1:39442
> [29/May/2008:16:18:33.688] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 3/3/0/0/0 0/0 "GET /isacghtrac/ HTTP/1.1"
> May 29 16:18:34 127.0.0.1 haproxy[8410]: 127.0.0.1:39514
> [29/May/2008:16:18:34.742] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 2/2/0/0/0 0/0 "GET /isacghtrac/ HTTP/1.1"
> May 29 16:18:35 127.0.0.1 haproxy[8410]: 127.0.0.1:39548
> [29/May/2008:16:18:35.242] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 4/4/0/0/0 0/0 "GET /isacghtrac/ HTTP/1.1"
> May 29 16:18:35 127.0.0.1 haproxy[8410]: 127.0.0.1:39583
> [29/May/2008:16:18:35.736] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 3/3/0/0/0 0/0 "GET /isacghtrac/ HTTP/1.1"
> May 29 16:18:36 127.0.0.1 haproxy[8410]: 127.0.0.1:39610
> [29/May/2008:16:18:36.209] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 2/2/0/0/0 0/0 "GET /isacghtrac/ HTTP/1.1"
> May 29 16:18:36 127.0.0.1 haproxy[8410]: 127.0.0.1:39649
> [29/May/2008:16:18:36.615] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 2/2/0/0/0 0/0 "GET /isacghtrac/ HTTP/1.1"
> May 29 16:24:20 127.0.0.1 haproxy[8410]: 127.0.0.1:52215
> [29/May/2008:16:24:20.602] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 4/4/0/0/0 0/0 "GET /gepas4trac/ HTTP/1.1"
> May 29 16:24:21 127.0.0.1 haproxy[8410]: 127.0.0.1:52252
> [29/May/2008:16:24:21.344] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 4/4/0/0/0 0/0 "GET /gepas4trac/ HTTP/1.1"
> May 29 16:24:21 127.0.0.1 haproxy[8410]: 127.0.0.1:52298
> [29/May/2008:16:24:21.935] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 5/5/0/0/0 0/0 "GET /gepas4trac/ HTTP/1.1"
> May 29 16:24:22 127.0.0.1 haproxy[8410]: 127.0.0.1:52327
> [29/May/2008:16:24:22.295] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 2/2/0/0/0 0/0 "GET /gepas4trac/ HTTP/1.1"
> May 29 16:24:22 127.0.0.1 haproxy[8410]: 127.0.0.1:52354
> [29/May/2008:16:24:22.663] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 3/3/0/0/0 0/0 "GET /gepas4trac/ HTTP/1.1"
> May 29 16:32:11 127.0.0.1 haproxy[8410]: 127.0.0.1:36760
> [29/May/2008:16:32:11.578] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 6/6/0/0/0 0/0 "GET /isacghtrac/browser HTTP/1.1"
> May 29 16:32:16 127.0.0.1 haproxy[8410]: 127.0.0.1:37288
> [29/May/2008:16:32:16.535] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 3/3/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:32:19 127.0.0.1 haproxy[8410]: 127.0.0.1:37534
> [29/May/2008:16:32:19.462] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 3/3/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:32:33 127.0.0.1 haproxy[8410]: 127.0.0.1:38559
> [29/May/2008:16:32:33.316] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 17/17/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:32:44 127.0.0.1 haproxy[8410]: 127.0.0.1:39605
> [29/May/2008:16:32:44.179] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 4/4/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:32:45 127.0.0.1 haproxy[8410]: 127.0.0.1:39722
> [29/May/2008:16:32:45.624] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 3/3/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:34:00 127.0.0.1 haproxy[8410]: 127.0.0.1:45752
> [29/May/2008:16:34:00.747] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 32/32/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:34:02 127.0.0.1 haproxy[8410]: 127.0.0.1:45824
> [29/May/2008:16:34:02.192] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 34/34/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:34:03 127.0.0.1 haproxy[8410]: 127.0.0.1:45906
> [29/May/2008:16:34:03.124] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 31/31/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:34:40 127.0.0.1 haproxy[8410]: 127.0.0.1:47740
> [29/May/2008:16:34:40.549] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 29/29/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:34:41 127.0.0.1 haproxy[8410]: 127.0.0.1:47916
> [29/May/2008:16:34:41.876] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 19/19/0/0/0 0/0 "GET /isacghtrac HTTP/1.1"
> May 29 16:35:42 127.0.0.1 haproxy[8410]: 127.0.0.1:33523
> [29/May/2008:16:35:42.916] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 6/6/0/0/0 0/0 "GET /isacghtrac/wiki HTTP/1.1"
> May 29 16:35:43 127.0.0.1 haproxy[8410]: 127.0.0.1:33596
> [29/May/2008:16:35:43.667] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 4/4/0/0/0 0/0 "GET /isacghtrac/wiki HTTP/1.1"
> May 29 16:35:44 127.0.0.1 haproxy[8410]: 127.0.0.1:33661
> [29/May/2008:16:35:44.153] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 5/5/0/0/0 0/0 "GET /isacghtrac/wiki HTTP/1.1"
> May 29 16:35:44 127.0.0.1 haproxy[8410]: 127.0.0.1:33700
> [29/May/2008:16:35:44.379] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 1/1/0/0/0 0/0 "GET /isacghtrac/wiki HTTP/1.1"
> May 29 16:35:44 127.0.0.1 haproxy[8410]: 127.0.0.1:33736
> [29/May/2008:16:35:44.583] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 2/2/0/0/0 0/0 "GET /isacghtrac/wiki HTTP/1.1"
> May 29 16:35:44 127.0.0.1 haproxy[8410]: 127.0.0.1:33761
> [29/May/2008:16:35:44.780] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCDN
> 9/9/0/0/0 0/0 "GET /isacghtrac/wiki HTTP/1.1"
> May 29 16:58:21 127.0.0.1 haproxy[8410]: 127.0.0.1:55617
> [29/May/2008:16:58:21.123] http-in trac/<NOSRV> 0/-1/-1/-1/0 503 212 - - sCIN
> 4/4/0/0/0 0/0 "GET /gepas4trac/browser HTTP/1.1"
> May 29 18:23:31 127.0.0.1 haproxy[9297]: 127.0.0.1:45056
> [29/May/2008:18:23:31.537] http-in gen20/gen20 0/0/0/39/40 503 2132 - - --NI
> 3/3/0/0/0 0/0 "GET /peio/?q=aggregator/sources/1 HTTP/1.1"
>
> I am going to change haproxy 1.3.15 binary and I will send the logs.

Yes please. As Krzysztof said it, there are finer grained information in 1.3.15. Also, when you send the stats, you don't need to send an image, simply save the page itself (Ctrl-S). It's self-contained, we will all be able to parse it.

> many thanks for your help :)

You're welcome. You seem to be triggering a nasty bug that surely needs to be fixed.

Regards,
Willy Received on 2008/05/29 22:32

This archive was generated by hypermail 2.2.0 : 2008/05/29 22:45 CEST