Re: [PATCH]: Spread checks

From: Willy Tarreau <>
Date: Sat, 29 Sep 2007 13:54:48 +0200

On Sat, Sep 29, 2007 at 01:14:35PM +0200, Krzysztof Oledzki wrote:
> On Sat, 29 Sep 2007, Willy Tarreau wrote:
> >On Sun, Sep 23, 2007 at 10:44:15PM +0200, Krzysztof Oledzki wrote:
> >>
> >>
> >>On Tue, 18 Sep 2007, Willy Tarreau wrote:
> >>
> >>>On Tue, Sep 18, 2007 at 11:35:43AM +0200, Krzysztof Oledzki wrote:
> >>>>I noticed that each server receive all checks in a very short (<1ms)
> >>>>time
> >>>>(attached checklog2 file). I think that having 10s for 48 (16*3) tests
> >>>>it
> >>>>is possible to reduce both servers' and loadbalancer stress a little, by
> >>>>running each test every 10s/48 = ~ 0.2s.
> >>>
> >>>Yes, and this was even worse in the past because all servers for a same
> >>>group were checked at the same instant. There were people doing
> >>>load-balancing
> >>>on same machines with multiple ports who got regular load peaks during
> >>>the
> >>>checks. So I have spread them apart within one backend.
> >>>
> >>>However, the problem still remains if you share the same server between
> >>>many instances. I'm not sure how I could improve this. Maybe add a
> >>>per-backend
> >>>start delay for the checks, which would be equal to min_inter/#backends.
> >>>As an
> >>>alternative right now, you can rotate your servers within different
> >>>backends.
> >>>
> >>>I think I could also add a global "spread-check" parameter allowing us to
> >>>add
> >>>some random time between all checks in order to spread them apart. It
> >>>would
> >>>take a percentage parameter adding or removing that many percent to the
> >>>interval
> >>>after each check.
> >>
> >>Attached patch implements per-server start delay in a different way.
> >>Checks are now spread globally - not locally to one backend. It also makes
> >>them started faster - IMHO there is no need to add a 'server->inter' when
> >>calculating first execution.
> >
> >the reason for the server->inter is that there are people using 10 servers
> >entries which all point to the same machine on 10 different ports. The
> >"inter"
> >gives a hint about how often we expect the checks to be sent.
> OK, next check is going to be after server->inter, so no problem IMHO.
> Only the first one (for each server) is going to be executed faster. This
> is important, because when you restart haproxy and some servers are down,
> haproxy may send them connections and I see no reason to delay checks so
> much.

Yes I agree. When some servers are checked every 30 seconds, this can be a bit nasty. I intended to have two speeds for checks, a fast one used during transitions and a normal one. Basically, upon startup, or just after one failed health check or one success on a failed server, it would switch to fast checks (eg: inter 1000 instead of inter 10000). It would ensure that we could get rid of all these annoying things. Also, it would detect failures faster.

> >>Calculation were moved from cfgparse.c to
> >>checks.c. There is a new function start_checks() and now it is not called
> >>when haproxy is started in MODE_CHECK.
> >>
> >>With this patch it is also possible to set a global 'spread-check'
> >>parameter. It takes a percentage value (1..50, probably something near
> >>5..10 is a good idea) so haproxy adds or removes that many percent to the
> >>oryginal interval after each check. My test shows that with 18 backends,
> >>54 servers total and 10000ms/5% it takes about 45m to mix them completely.
> >
> >I think that we should be *very* careful when subtracting random
> >percentage.
> >It is very easy to go down to zero that way, and have a server bombed by
> >health-checks.
> Is is not possible as spread-check accepts only 1..50, so in a worst case
> this time should be (server->inter/2)+1.

OK fine. I thought I saw something in the style of "inter + random(100) - 50" but may be I just confused with something else. In this case, I find it normal that we would accept 0 for the spread-check. It would simply disable it but in a more convenient way.

> >BTW, I'm suddenly thinking about something: when I build the servers map,
> >I use all the server weights and arrange them so that all their occurrences
> >are as far as possible from each other, while respecting their exact
> >weight.
> >It it works very well. I'm realizing that we need exactly the same
> >principle
> >for the checks. The "weight" here is simply the frequency at which the
> >servers
> >must be checked, which is 1/interval. So by using the exact same functions
> >as
> >is used to build the servers map, we could build a health-check map that
> >way :
> >
> > foreach srv in all_servers :
> > weight(srv) = max(all inter) / inter(srv)
> >
> > build_health_map(all_servers)
> >
> >Afterwards, we would just have to cycle through this map to start the
> >checks.
> >It would even remove the need for one timer per server in the timers table.
> >The immediate advantage is that they would all be spread apart upon
> >startup,
> >and we would not need the random anymore (eventhough it's would not harm to
> >conserve the option).
> >
> >What do you think about this ?
> I need to think about this for a moment

Of course. I explained it quickly and it's not easy. I developped the algorithm using a small program which used to draw the map for various values, but I don't know where I put it.

> - I'm currently traveling and it
> takes a lot of patience to work with ss over GPRS. ;)


> Anyway, server weights are afaik 1..255 and server->inter are 0..10K or
> even worse.

Yes, that's a good point. In fact, we can limit interval to some reasonable small values (eg: no below 50 ms). But even with that, some people would still use a 1 minute interval, leading to disproportions of 60/.05 = 1200. This is not enormous, but quite big yet.

> I'm not sure if this could work here. Please also note that my
> solution is quite simple and after one week I can say it works pretty well
> when used with at least small spread-check.

Oh I have no problem with that. I was just thinking loudly. I think I will merge your patch in 1.3.12, but it does not prevent us from thinking about evolutions ;-)

Willy Received on 2007/09/29 13:54

This archive was generated by hypermail 2.2.0 : 2007/11/04 19:21 CET