Hypermail

From: Krzysztof Oledzki <ole#ans.pl>
Date: Sat, 29 Sep 2007 17:26:51 +0200 (CEST)

On Sat, 29 Sep 2007, Willy Tarreau wrote:

> On Sat, Sep 29, 2007 at 01:14:35PM +0200, Krzysztof Oledzki wrote:

>>
>>
>> On Sat, 29 Sep 2007, Willy Tarreau wrote:
>>
>>> On Sun, Sep 23, 2007 at 10:44:15PM +0200, Krzysztof Oledzki wrote:
>>>>
>>>>
>>>> On Tue, 18 Sep 2007, Willy Tarreau wrote:
>>>>
>>>>> On Tue, Sep 18, 2007 at 11:35:43AM +0200, Krzysztof Oledzki wrote:
>>>>>> I noticed that each server receive all checks in a very short (<1ms)
>>>>>> time
>>>>>> (attached checklog2 file). I think that having 10s for 48 (16*3) tests
>>>>>> it
>>>>>> is possible to reduce both servers' and loadbalancer stress a little, by
>>>>>> running each test every 10s/48 = ~ 0.2s.
>>>>>
>>>>> Yes, and this was even worse in the past because all servers for a same
>>>>> group were checked at the same instant. There were people doing
>>>>> load-balancing
>>>>> on same machines with multiple ports who got regular load peaks during
>>>>> the
>>>>> checks. So I have spread them apart within one backend.
>>>>>
>>>>> However, the problem still remains if you share the same server between
>>>>> many instances. I'm not sure how I could improve this. Maybe add a
>>>>> per-backend
>>>>> start delay for the checks, which would be equal to min_inter/#backends.
>>>>> As an
>>>>> alternative right now, you can rotate your servers within different
>>>>> backends.
>>>>>
>>>>> I think I could also add a global "spread-check" parameter allowing us to
>>>>> add
>>>>> some random time between all checks in order to spread them apart. It
>>>>> would
>>>>> take a percentage parameter adding or removing that many percent to the
>>>>> interval
>>>>> after each check.
>>>>
>>>> Attached patch implements per-server start delay in a different way.
>>>> Checks are now spread globally - not locally to one backend. It also makes
>>>> them started faster - IMHO there is no need to add a 'server->inter' when
>>>> calculating first execution.
>>>
>>> the reason for the server->inter is that there are people using 10 servers
>>> entries which all point to the same machine on 10 different ports. The
>>> "inter"
>>> gives a hint about how often we expect the checks to be sent.
>>
>> OK, next check is going to be after server->inter, so no problem IMHO.
>> Only the first one (for each server) is going to be executed faster. This
>> is important, because when you restart haproxy and some servers are down,
>> haproxy may send them connections and I see no reason to delay checks so
>> much.

>
> Yes I agree. When some servers are checked every 30 seconds, this can be a
> bit nasty. I intended to have two speeds for checks, a fast one used during
> transitions and a normal one. Basically, upon startup, or just after one
> failed health check or one success on a failed server, it would switch to
> fast checks (eg: inter 1000 instead of inter 10000). It would ensure that
> we could get rid of all these annoying things. Also, it would detect failures
> faster.

I was also thinking on this for a while. It may be also good idea to add a restart-delay parameter, so when haproxy is restarted may be able to finish most of checks before replacing old process. Anyway, I'm not sure if it can be easily done...

>>>> Calculation were moved from cfgparse.c to
>>>> checks.c. There is a new function start_checks() and now it is not called
>>>> when haproxy is started in MODE_CHECK.
>>>>
>>>> With this patch it is also possible to set a global 'spread-check'
>>>> parameter. It takes a percentage value (1..50, probably something near
>>>> 5..10 is a good idea) so haproxy adds or removes that many percent to the
>>>> oryginal interval after each check. My test shows that with 18 backends,
>>>> 54 servers total and 10000ms/5% it takes about 45m to mix them completely.
>>>
>>> I think that we should be *very* careful when subtracting random
>>> percentage.
>>> It is very easy to go down to zero that way, and have a server bombed by
>>> health-checks.
>>
>> Is is not possible as spread-check accepts only 1..50, so in a worst case
>> this time should be (server->inter/2)+1.

>
> OK fine. I thought I saw something in the style of "inter + random(100) - 50"
> but may be I just confused with something else. In this case, I find it normal
> that we would accept 0 for the spread-check. It would simply disable it but in
> a more convenient way.

This value is in percents and calculation is performed with respect of using ints (not floats) so:

+ if (global.spread_check) {
Do something only when spread_check is enabled (>0)

+ rv = s->inter*global.spread_check/100; Calculate dedicated spread-check value for a server from global percent representation. In a corner case it is for example:

1*50/100 = 0 (down-rounded to int).
2*50/100 = 1
3*50/100 = 1 (-||-)
4*50/100 = 2
etc

+ rv -= (int) (2*rv*(rand()/(RAND_MAX+1.0))); Get a random value in range 0..(2*rv-1) and substract it from rv:

0 -> 0 -> 0
1 -> 0..1 -> +1..0
2 -> 0..3 -> +2..-1
etc

+ tv_ms_add(&t->expire, &t->expire, s->inter+rv); Final value according to s->inter:

1 -> 1
2 -> 2..3
3 -> 3..4
4 -> 3..6
5 -> 4..7
6 -> 4..9
etc

As you may noticed it even prefers larger values than lower, which should not be noticeable with more reasonable intervals like 1000ms. I spent some time on this so I hope it should work in all corner cases. BTW: I hope no one is so crazy enough to setup so small values! ;)

Allowing spread-check=0 make only sense when we setup a default value to something > 0. Maybe ideed this could be a good idea, with initial value like 5% it should not breake anything.

>>> BTW, I'm suddenly thinking about something: when I build the servers map,
>>> I use all the server weights and arrange them so that all their occurrences
>>> are as far as possible from each other, while respecting their exact
>>> weight.
>>> It it works very well. I'm realizing that we need exactly the same
>>> principle
>>> for the checks. The "weight" here is simply the frequency at which the
>>> servers
>>> must be checked, which is 1/interval. So by using the exact same functions
>>> as
>>> is used to build the servers map, we could build a health-check map that
>>> way :
>>>
>>> foreach srv in all_servers :
>>> weight(srv) = max(all inter) / inter(srv)
>>>
>>> build_health_map(all_servers)
>>>
>>> Afterwards, we would just have to cycle through this map to start the
>>> checks.
>>> It would even remove the need for one timer per server in the timers table.
>>> The immediate advantage is that they would all be spread apart upon
>>> startup,
>>> and we would not need the random anymore (eventhough it's would not harm to
>>> conserve the option).
>>>
>>> What do you think about this ?
>>
>> I need to think about this for a moment

>
> Of course. I explained it quickly and it's not easy. I developped the algorithm
> using a small program which used to draw the map for various values, but I don't
> know where I put it.
>

>> - I'm currently traveling and it
>> takes a lot of patience to work with ss over GPRS. ;)

>
> :-)
>

>> Anyway, server weights are afaik 1..255 and server->inter are 0..10K or
>> even worse.

>
> Yes, that's a good point. In fact, we can limit interval to some reasonable
> small values (eg: no below 50 ms). But even with that, some people would still
> use a 1 minute interval, leading to disproportions of 60/.05 = 1200. This is
> not enormous, but quite big yet.
>

>> I'm not sure if this could work here. Please also note that my
>> solution is quite simple and after one week I can say it works pretty well
>> when used with at least small spread-check.

>
> Oh I have no problem with that. I was just thinking loudly. I think I will
> merge your patch in 1.3.12, but it does not prevent us from thinking about
> evolutions ;-)

Of course, all I was trying to say is that it may not be worth spending too much time on this problem. The goal was to prevent sending checks simultaneously and IMHO it was achieved.

Best regards,

Krzysztof Olêdzki Received on 2007/09/29 17:26

Re: [PATCH]: Spread checks