Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Wed, 3 Dec 2008 16:32:20 +0100

On Wed, Dec 03, 2008 at 03:57:55PM +0100, Krzysztof Oledzki wrote:
> >>It must only do that after the retry counter has expired on the
> >>first server. In fact, we might change the behaviour to support
> >>multiple redispatches with a counter (just like retry) and set
> >>the retry counter to only 1 when we are redispatching. It's
> >>probably not that hard.
> >
> >I must admit that I haven't looked at the haproxy code and don't know
> >anything about retry counters. But whatever fixes these 503's is most
> >welcome here! :)
>
> I think it is doable, I'll look into it. However, it is not going to solve
> your problem as is case of failure there is no server haproxy can switch
> into. Even if there are backup servers it take some time to activate them.

It should take fastinter*fall, which should be short enough for the redispatches to try several severs before switching to them.

> >Yes. Maybe there should be a way to limit this behaviour only to the
> >case of failover to backup. These people may not want redispatching to
> >happen between the "primary" servers when one responds slowly or throws
> >errors temporarily but I'm sure most of them would also like seamless
> >failover to the backups when *all* primaries have failed.
>
> Indeed, "emergency redispatch to backups" is one of my
> yet-unfinished-patches I'm going to clean and publish, eventually.

I honnestly find this behaviour *very* dangerous and undesirable. Quite honnestly, haproxy is most often used with stickyness, and prematurely switching to another server is one of the worst things to do. If you need to load-balance stateless static servers, why not use LVS instead ? Maybe I'm missing some use cases, but it's a general feeling of doing the wrong thing.

> >This is especially true when you depend on sticky sessions (stateful
> >webservers) because the drop of a server will kill the users that were
> >bound to that server.
>
> How about triggering the fastinter mode if there were N failures (tcp rst,
> 4xx/5xx codes) *in a row*? First successfully serviced request should
> clear this counter.

That's what I suggested too, except that 4xx should be avoided (triggered by the client), and the 5xx reported by the server should be avoided too because if the server is a healthy proxy reporting errors from an unhealthy server, it will always be running fastinter.

However, using haproxy-generated errors to trigger fastinter is a good thing IMHO. If it detects a few failures on real traffic, it should speed-up checks.

Regards,
Willy Received on 2008/12/03 16:32

Re: Avoid 503 during failover to backup?