Re: haproxy alleatory fails on config reload

From: Pablo Escobar <pescobar#cipf.es>
Date: Thu, 29 May 2008 19:22:36 +0200


I am doing some testing right now and I have found something strange.

On my last reload I got a 503 error. I couldn´t access noany of my backend but if I connect directly to haproxy on port 81 it shows all backends up ( green color on all backends but 503 erros keeps). When the backends arrived to "status 1min up" the 503 dissapears. On my haproxy I have no 1min timeout anywhere. On all backends I have "inter 5s rise 2" so I don´t understand why this 1min period until every backend is up and working.

I also tried to add this to my haproxy init.d script to try to find if the problem was related to apache, with no luck :(

haproxy_reload()
{

        $HAPROXY -f "$CONFIG" -p $PIDFILE -D $EXTRAOPTS -sf $(cat $PIDFILE) \
                || return 2

	if [ $? eq 0 ]
	then
		/etc/init.d/apache2 reload
	fi

        return 0

}

I am using haproxy 1.3.15. I am going to try 1.3.14.2 because I have no more ideas. I will tell to the list if I had luck.

best wishes,
Pablo.

El Friday 23 May 2008 22:15:32 Willy Tarreau escribió:
> On Thu, May 22, 2008 at 02:43:01PM +0200, Pablo Escobar wrote:
> > Hi to the list,
> >
> > I am having an strange issue when I try to reload haproxy´s config. For
> > each 5 tries to reload the config it usually fails 1 or 2 tries. Most of
> > the times haproxy keeps it current state and reload the new config but
> > alleatory it does a restart instead a reload so I get a 503 error on all
> > my websites untils the "rise 2" is acomplished on all my backends and
> > everythings gets up again.
> >
> > The most strange thing in the fail is the alleatory (maybe a bug??) .
> > Maybe I reload the config 5 times without any problem but suddenly fails.
> >
> > I need to add new acl quite often and it´s a pain to get a 503 error each
> > time I need a new acl. I think the config is ok because once the "rise 2"
> > is acomplissed and the 503 error dissapear my new acls work ok.
> >
> > I am running haproxy version 1.3.15 on debian etch x86_64.
> >
> > On my /etc/init.d/haproxy file I have this function to reload as
> > explained on the docs.
> >
> > haproxy_reload()
> > {
> > $HAPROXY -f "$CONFIG" -p $PIDFILE -D $EXTRAOPTS -sf $(<$PIDFILE)
> > \
> >
> > || return 2
> >
> > return 0
> > }
> >
> > I also tried this function which comes with debian lenny packages which I
> > think is the same:
> >
> > haproxy_reload()
> > {
> > $HAPROXY -f "$CONFIG" -p $PIDFILE -D $EXTRAOPTS -sf $(cat
> > $PIDFILE) \
> >
> > || return 2
> >
> > return 0
> > }
> >
> >
> > I verified that $PIDFILE points to /var/run/haproxy.pid in
> > both /etc/init.d/haproxy and /etc/haprox/haproxy.cfg and it contains the
> > right PID for the haproxy process.
> >
> > any ideas or suggestion about how to solve this problem?? it is really
> > annoying. Right now what I am doing is to add new acl at night when the
> > web traffic is lower. It´s a dirty solution and not usefull when someone
> > needs a new acl "for yesterday"

>

> I think that it's not a reload problem, but a health-check problem. When
> the new process starts, it sends a first health-check to the servers,
> and the result of *this* health check indicates the initial server state.
> This is to ensure that we don't start with dead servers and that we don't
> take too much time detecting valid servers.
>

> If your servers regularly fail several health checks, it is quite possible
> that upon a restart they all fail and you have to wait for rise*inter to
> get the service again. This should also show up in the stats with
> constantly growing values in the "check" column.
>

> What you can do is to increase the check interval in order to give them
> more time to respond. But be careful, it also means that if one still
> fails, it will take more time to be seen UP again. In this case, the
> solution is to reduce the "rise" parameter. You can also set "rise" to
> zero but use a slowstart instead. That way, as soon as a check succeeds,
> the server will be seen as UP, but will take a slowly growing load. If
> all servers are in the same situation, they will still get all the load,
> but if only a few are in this situation, they will get so little load
> compared to others that they will have the time to stabilize.
>

> Hoping this helps,
> Willy
-- 
Pablo Escobar Lopez
Head of Infrastructure & IT Support
Bioinformatics Department
Centro de Investigación Príncipe Felipe (CIPF)
Tfn: (34) 96 328 96 80 ext: 1004
http://bioinfo.cipf.es
Received on 2008/05/29 19:22

This archive was generated by hypermail 2.2.0 : 2008/05/29 19:30 CEST