Re: Avoid 503 during failover to backup?

From: Krzysztof Oledzki <ole#ans.pl>
Date: Wed, 3 Dec 2008 17:04:06 +0100 (CET)

On Wed, 3 Dec 2008, Willy Tarreau wrote:

> On Wed, Dec 03, 2008 at 03:57:55PM +0100, Krzysztof Oledzki wrote:
>>>> It must only do that after the retry counter has expired on the
>>>> first server. In fact, we might change the behaviour to support
>>>> multiple redispatches with a counter (just like retry) and set
>>>> the retry counter to only 1 when we are redispatching. It's
>>>> probably not that hard.
>>>
>>> I must admit that I haven't looked at the haproxy code and don't know
>>> anything about retry counters. But whatever fixes these 503's is most
>>> welcome here! :)
>>
>> I think it is doable, I'll look into it. However, it is not going to solve
>> your problem as is case of failure there is no server haproxy can switch
>> into. Even if there are backup servers it take some time to activate them.
>
> It should take fastinter*fall, which should be short enough for the
> redispatches to try several severs before switching to them.

If all servers send TCP RST then this time may be often too short as AFAIK there is only 1s delay between retries.

>>> Yes. Maybe there should be a way to limit this behaviour only to the
>>> case of failover to backup. These people may not want redispatching to
>>> happen between the "primary" servers when one responds slowly or throws
>>> errors temporarily but I'm sure most of them would also like seamless
>>> failover to the backups when *all* primaries have failed.
>>
>> Indeed, "emergency redispatch to backups" is one of my
>> yet-unfinished-patches I'm going to clean and publish, eventually.
>
> I honnestly find this behaviour *very* dangerous and undesirable. Quite
> honnestly, haproxy is most often used with stickyness, and prematurely
> switching to another server is one of the worst things to do.

It depends on a cost of such switching. If it means losing client's session then it is indeed unacceptable.

> If you need to load-balance stateless static servers, why not use LVS
> instead ? Maybe I'm missing some use cases, but it's a general feeling
> of doing the wrong thing.

Except that there are several things you are not able to do with LVS: tcp retries, headers inserting/deleting, url rewriting, etc.

>>> This is especially true when you depend on sticky sessions (stateful
>>> webservers) because the drop of a server will kill the users that were
>>> bound to that server.
>>
>> How about triggering the fastinter mode if there were N failures (tcp rst,
>> 4xx/5xx codes) *in a row*? First successfully serviced request should
>> clear this counter.
>
> That's what I suggested too, except that 4xx should be avoided (triggered
> by the client),

Indeed, wrong idea.

> and the 5xx reported by the server should be avoided too
> because if the server is a healthy proxy reporting errors from an unhealthy
> server, it will always be running fastinter.

This can be option-controlled. Not all people run a proxy behind another proxy. ;)

> However, using haproxy-generated errors to trigger fastinter is a good thing
> IMHO. If it detects a few failures on real traffic, it should speed-up checks.

Ack.

Best regards,

                         Krzysztof Olędzki Received on 2008/12/03 17:04

This archive was generated by hypermail 2.2.0 : 2008/12/03 17:16 CET