Re: Changes in the check timeouts

From: Willy Tarreau <w#1wt.eu>
Date: Fri, 15 Feb 2008 21:35:33 +0100


On Fri, Feb 15, 2008 at 09:10:15PM +0100, Krzysztof Oledzki wrote:
> >No, that's not all. This one, I've fixed it this morning.
>
> Sorry about that. :(

Oh please don't be sorry about that. Otherwise, I'm sorry too because I reviewed the code and found it to be OK ;-)

> >The second problem is the way the check connect timeout is computed. 1)
> >it is complex and hard to understand from a user standpoint. 2) it's
> >bogus since if some timeouts are not set, they are still added to <now>.
> >That's what my patch addressed, which is more a functional issue than a
> >technical issue :
> >
> >original code does :
> >
> > if (timeout.check)
> > timeout = min(inter, timeout.connect)
> > else
> > timeout = inter
>
> And timeout += timeout.check *if* connection was successfully established.
> So, we still allow for a server to process a request (it may take some
> time) but detect dead servers sooner.
>
> >It's hard to understand why timeout.check has an impact here and how it's
> >used.
>
> Because timeout.check is and additional timeout available after a
> connections was established.

I know, but from a purely functional point of view, it's hard to explain to people why they have to set timeout.check to some random value *if* they want timeout.connect to be considered. I think it is simply not logical. People would rather use timeout.connect if it is set, and not condition its use on the existence of timeout.check.

> >The simpler method (and IMHO more logical) is the following one :
> >
> > if (timeout.connect)
> > timeout = timeout.connect
> > else
> > timeout = inter
> >
> > if (timeout.check)
> > bound (timeount, timeout.check)
>
> Hm... timeout.check is was not supposed to bound timeount but rather to
> give a possibility to have short timeout for connect and additional
> timeout for response.

Oh, I'm perfectly aware of that. I was suggesting a simpler yet useful usage : set timeout.check to the maximum acceptable time for a check, including connect. That way, users know that health checks will never be delayed by more that timeout.check+inter, which is easier to evaluate than timeout.check?min(timeout.connect,inter):inter+n*timeout.check+inter.

> As you have suggested, some people have large
> timeout.connect for queue/tarpit, others have large inter.

I know, and that's why I think that caping them timeout.check is a simpler method.

> Maybe it is not
> very easy to understand, but imho this is the only clean solution and is
> documented.

But I can adjust the doc if we agree on an even simpler mechanism :-)

> Alternatively we may simply use timeout.connect alone and
> force people to use "timeout queue" and "timeout tarpit" instead of large
> timeout.connect but it looks too invasive.

The problem is not that it's invasive, it's that it will strongly break existing setups.

Hmmm, speaking of existing setups, now I see your point. The fact is that my proposal will silently change the connect timeout of existing setups as long as timeout.check is not set :-/

Well, what would you think of the following then :

If no timeout.check is set, use inter (exactly like now) if timeout.check is set, use it as the maximum time for the whole test to succeed, which means both the connect timeout, and the rest. At least, if I'm sat in front of the keyboard as a user, I understand that if I write "timeout check", then I expect a check to fail after that amount of time waiting for it to succeed.

That way we remove timeout.connect from the equation, as it is not very much an important part of the check anyway, and we stay backwards compatible with existing setups without getting grey hair trying to understand how all of those variables interact.

What do you think ?

Willy Received on 2008/02/15 21:35

This archive was generated by hypermail 2.2.0 : 2008/02/15 21:46 CET