Re: [PATCH] Don't increment server connections too much + fix retries

From: Willy Tarreau <w#1wt.eu>
Date: Mon, 4 Feb 2008 21:37:50 +0100


On Mon, Feb 04, 2008 at 02:17:01AM +0100, Krzysztof Oledzki wrote:
> >From 9af493edbe56489524f57c22d58aca0584931c8a Mon Sep 17 00:00:00 2001
> From: Krzysztof Piotr Oledzki <ole#ans.pl>
> Date: Mon, 4 Feb 2008 02:10:09 +0100
> Subject: [BUG] Don't increment server connections too much + fix retries
>
> Commit 98937b875798e10fac671d109355cde29d2a411a while fixing
> one bug introduced another one. With "retries 4" and
> "option redispatch" haproxy tries to connect 4 times to
> one server server and 1 time to a second one. However
> logs showed 5 connections to the first server (the
> last one was counted twice) and 2 to the second.
>
> This patch also fixes srv->retries and be->retries increments.
>
> Now I get: 3 retries and 1 error in a first server (4 cum_sess)
> and 1 error in a second server (1 cum_sess) with:
> retries 4
> option redispatch
>
> and: 4 retries and 1 error (5 cum_sess) with:
> retries 4
>
> So, the number of connections effectively served by a server is:
> srv->cum_sess - srv->failed_conns - srv->retries

Fine, I'm applying this fix too. Good catch!

> However, IMHO the fix from 98937b875798e10fac671d109355cde29d2a411a
> is wrong. Now be->cum_sess < SUM(servers->cum_sess) as with 1 sesssion
> processed by a backed I get 5 "sessions" processed by servers. This
> is very... abnormal. IMHO we should only increment srv->cum_sess
> once and only when a connection was successfully established but we
> need an additional variable: srv->conns.

No in fact it is not abnormal, I would say "unexpected". At Exosec, we saw a problem by which a server could have more errors than connections, which is absurd. In some cases, the connection error count could rise with the connection count still at zero. Finally, it made sense to count by "connections" what was sent on the wire. If you check a firewall logs after haproxy, you would see a lot of connection attempts for a failing server, so it was not normal not to count them.

I think that as usual when we are torn between one use and another one for a same variable, it means that in fact we need two variables. Also, I'd like that we don't have to break everything the day we will get keep-alive to work.

So why not consider something like this (I'm thinking loudly, critics welcome) :

> The final situation is going to look like this:
> be->cum_sess: connections (sessions) sent from a frontend to a backend
> be->failed_conns: unhandled connections (sessions) by a backend
> be->cum_sess - be->failed_conns: properly handled connections (sessions) by a backend

Now for the backend, it should become easier to distinct between backend-specific stats and simple sums. Basically, everything that comes from a frontend is counted as a transaction and is backend-global. Everything relative to connections is a sum of all server's stats.

Maybe I'm over-simplifying, but it appears more clear to me that way.

I'm CC'ing our software engineer, Emeric, with whom we've spent some time on this subject without reaching a solution which satisfies either of us. He has developped a tool to produce graphs from the stats, and got grey hair trying to solve this puzzle.

Best regards,
Willy Received on 2008/02/04 21:37

This archive was generated by hypermail 2.2.0 : 2008/02/04 22:30 CET