Re: haproxy alleatory fails on config reload

From: Pablo Escobar <pescobar#cipf.es>
Date: Fri, 30 May 2008 01:22:56 +0200


Hi Krzysztof and Willy,

I have compiled haproxy 1.3.15.1 for x86_64

I found with dmesg some of this errors:
ip_conntrack: table full, dropping packet.

So I tougth I had found the problem but I was wrong. I doubled the ip_conntrack_max value and the dmesg error dissapeared but I still get the same problem. Around 5 reloads ok and then a wrong reload wich takes all my websites offline for EXACTLY 1 MIN.

connecting to port 81 I see every backend up. connecting on port 80 I get 503 error on every backend. exactly when the stats web arrives to "uptime 1:01min" I get every backend up on port 80. I am sure the 1min is not random. I have tried it 2 times with the same result.

Other strange thing is that I am getting this on logs on every reload. on both good reload and wrong reloads

May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy trac started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy trac started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy gen21 started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy gen21 started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy txema started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy txema started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy ibm3 started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy ibm3 started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy gen29_backend started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy gen29_backend started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy phylemon_backend started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy phylemon_backend started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy vmware-web started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy vmware-web started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy tomcat-gen26 started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy tomcat-gen26 started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy gen20 started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy gen20 started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy babelomics_backend started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy babelomics_backend started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy gepas_backend started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy gepas_backend started.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy trac.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy trac.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy trac failed to enter pause 
mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy trac failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy gen21.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy gen21.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen21 failed to enter pause 
mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen21 failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy txema.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy txema.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy txema failed to enter pause 
mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy txema failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy ibm3.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy ibm3.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy ibm3 failed to enter pause 
mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy ibm3 failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy gen29_backend.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy gen29_backend.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen29_backend failed to enter 
pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen29_backend failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy phylemon_backend.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy phylemon_backend.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy phylemon_backend failed to 
enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy phylemon_backend failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy vmware-web.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy vmware-web.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy vmware-web failed to enter 
pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy vmware-web failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy tomcat-gen26.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy tomcat-gen26.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy tomcat-gen26 failed to enter 
pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy tomcat-gen26 failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy gen20.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy gen20.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen20 failed to enter pause 
mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen20 failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy babelomics_backend.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy babelomics_backend.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy babelomics_backend failed to 
enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy babelomics_backend failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy gepas_backend.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy gepas_backend.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gepas_backend failed to enter 
pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gepas_backend failed to enter pause mode.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy http-in.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Pausing proxy http-in.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Some proxies refused to pause, 
performing soft stop now.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Some proxies refused to pause, performing soft stop now.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy trac in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy trac in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy gen21 in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy gen21 in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy txema in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy txema in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy ibm3 in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy ibm3 in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy gen29_backend in 0 
ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy gen29_backend in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy phylemon_backend in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy phylemon_backend in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy vmware-web in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy vmware-web in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy tomcat-gen26 in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy tomcat-gen26 in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy gen20 in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy gen20 in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy babelomics_backend in 
0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy babelomics_backend in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy gepas_backend in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy gepas_backend in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy http-in in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Stopping proxy http-in in 0 ms.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy trac stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy trac stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen21 stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen21 stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy txema stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy txema stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy ibm3 stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy ibm3 stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen29_backend stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen29_backend stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy phylemon_backend stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy phylemon_backend stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy vmware-web stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy vmware-web stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy tomcat-gen26 stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy tomcat-gen26 stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen20 stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gen20 stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy babelomics_backend stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy babelomics_backend stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gepas_backend stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy gepas_backend stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy http-in stopped.
May 30 00:47:58 127.0.0.1 haproxy[27763]: Proxy http-in stopped.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy http-in started.
May 30 00:47:58 127.0.0.1 haproxy[27776]: Proxy http-in started.


other strange thing is that in the wrong reload I got this on the log:

May 30 00:47:42 127.0.0.1 haproxy[27763]: 127.0.0.1:35292 [30/May/2008:00:47:42.500] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34106 - - PR-- 0/0/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:42 127.0.0.1 haproxy[27763]: 127.0.0.1:35297 [30/May/2008:00:47:42.990] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34084 - - PR-- 1/1/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:43 127.0.0.1 haproxy[27763]: 127.0.0.1:35310 [30/May/2008:00:47:43.507] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34053 - - PR-- 1/1/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:43 127.0.0.1 haproxy[27763]: 127.0.0.1:35315 [30/May/2008:00:47:43.622] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34055 - - PR-- 2/2/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:44 127.0.0.1 haproxy[27763]: 127.0.0.1:35328 [30/May/2008:00:47:44.109] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34056 - - PR-- 1/1/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:44 127.0.0.1 haproxy[27763]: 127.0.0.1:35335 [30/May/2008:00:47:44.555] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34056 - - PR-- 2/2/0/0/0 0/0 "GET /stats HTTP/1.1"

30/May/2008:00:47:44

---------- HERE STOPS LOGIN FOR AROUND ONE MINUTE     ---------------------
---------- THEN CONTINUES LOGGING BUT TIMESTAMP IS 1SEC LATER ----------- May 30 00:47:45

May 30 00:47:45 127.0.0.1 haproxy[27763]: 127.0.0.1:35338 [30/May/2008:00:47:45.035] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34056 - - PR-- 0/0/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:45 127.0.0.1 haproxy[27763]: 127.0.0.1:35355 [30/May/2008:00:47:45.559] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34059 - - PR-- 2/2/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:46 127.0.0.1 haproxy[27763]: 127.0.0.1:35371 [30/May/2008:00:47:46.043] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34060 - - PR-- 3/3/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:46 127.0.0.1 haproxy[27763]: 127.0.0.1:35373 [30/May/2008:00:47:46.283] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34060 - - PR-- 2/2/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:46 127.0.0.1 haproxy[27763]: 127.0.0.1:35376 [30/May/2008:00:47:46.545] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34060 - - PR-- 1/1/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:47 127.0.0.1 haproxy[27763]: 127.0.0.1:35389 [30/May/2008:00:47:47.217] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34060 - - PR-- 5/5/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:47 127.0.0.1 haproxy[27763]: 127.0.0.1:35406 [30/May/2008:00:47:47.915] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34061 - - PR-- 2/2/0/0/0 0/0 "GET /stats HTTP/1.1" May 30 00:47:48 127.0.0.1 haproxy[27763]: 127.0.0.1:35415 [30/May/2008:00:47:48.371] http-in http-in/<STATS> 0/-1/-1/-1/0 200 34096 - - PR-- 3/3/0/0/0 0/0 "GET /stats HTTP/1.1"

¿¿maybe can affect that I am using syslog-ng?? I dont think so but.....

The output of ethtool is this:

Settings for eth0:

        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes


Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

Not sure if is possible what you mention " duplex mismatch between the haproxy server and the switch". I will try to explore it deeper tomorrow at the lab.

The 503 error logs that I sent on my last mail I think was ok. I mean, the backend was really down and the 503 error was right. I just added the "trac" backend today and testing it I turned it off. The strange thing is that those are the only 503 errors on my log. On the 1 minute downtime I dont get any 503 error on my haproxy log (I can see the 503 just on my browser).

I thik tomorrow I will try to compile latests haproxy version for x86 and do some testing....

many thanks again for all your help.

I go to bed now. enough for one day :)

best wishes,
Pablo.

El Thursday 29 May 2008 22:40:59 Willy Tarreau escribió:
> On Thu, May 29, 2008 at 06:00:18PM +0200, Pablo Escobar wrote:
> > Hi,
> >
> > Many thanks for your answer Willy, really helpful as always :)
> >
> > After your indications I tried to increase the check interval and reduce
> > the rise parameter and I have been testing during this week. For three
> > days it hasn´t failed but today suddenly the 503 error came back.
> >
> > I havent found that the check column is growing in noany of the backends.
> >
> > I have a quite strange behaviour. If I directly connect to the haproxy
> > listening on port 81 I can see that all my backeds are up but I get the
> > 503 error for around 10 -15 seconds after the reload. This happens
> > randomly.
> >
> > Also if a backend is down when I do the reload I get the 503 on all my
> > backend, not just the down backend. ¿is this the normal behaviour?
>
> No, but it sounds to me like a network problem on the proxy's network
> interface. Having several backends down at start time means that some
> health-checks have failed at least once. The fact that you get them
> more often when you reload may simply be because during a reload you
> already have traffic on the interface, while if you start from scratch
> you don't have any traffic yet so the health-checks have more chances
> to pass.
>
> This would also explain why you get them all down if you reload during
> a problem : if you see one backend down, it means the problem is happening
> again. Then you reload everything and get all checks to fail on startup.
> This could be caused by a duplex mismatch between the haproxy server and
> the switch (most often a 100 Mbps switch forced to 100full with a server
> set to autoneg).
>
> Please check "ethtool -i eth0" on the server. I would not be surprized
> to see "100 Half" here. Also, "dmesg" might be showing links going UP
> and DOWN.
>
> You may as well be using spanning tree on your local switches, with a
> failing link causing occasionnal STP recalculations during which no
> traffic can pass.
>
> > Right now I have a reverse proxy using apache´s mod_proxy which sends all
> > my inbound http traffic to the haproxy listening on the same machine on
> > port 81. ¿maybe this can be affecting?
>
> no, I see no reason for that to cause any problem.
>
> > I am doing it in this way because having a
> > apache´s vhost which processes all my http traffic let me apply
> > mod_rewrite, mod_security and mod_cband to all my traffic.
> >
> > If you are interested I can send you my vhost config and my haproxy
> > config.
> >
> > many thanks in advance for any help.
> >
> > Pablo.
> >
> > p.d. I havent forgot about the docs to get snmpd working with perl
> > support. Sorry for the delay, late weeks I have been overloaded with
> > work. I promise to send it this week.
>
> No emergency, keep calm :-)
> We're all used to disappear for weeks and re-appear here on the list.
> It seems we all have a ton of work :-)
>
> Regards,
> Willy

-- 
Pablo Escobar Lopez
Head of Infrastructure & IT Support
Bioinformatics Department
Centro de Investigación Príncipe Felipe (CIPF)
Tfn: (34) 96 328 96 80 ext: 1004
http://bioinfo.cipf.es
Received on 2008/05/30 01:22

This archive was generated by hypermail 2.2.0 : 2008/05/30 01:31 CEST