Re: [PATCH] [RFC] Decrease server health based on http responses / events, version 2

From: Krzysztof Olędzki <ole#ans.pl>
Date: Tue, 15 Dec 2009 23:29:33 +0100


On 2009-12-15 23:07, Willy Tarreau wrote:
> Hi Krzysztof,

Hi Willy,

> I forgot to check the list and did not see your mail.
>
> On Fri, Dec 11, 2009 at 10:33:56AM +0100, Krzysztof Piotr Oledzki wrote:

>> Changes in this version:
>>  - documentation
>>  - close race between a started check and health analysis event
>>  - don't force fastinter if it is not set
>>  - better names for options
>>  - layer4 support
>>
>> Possibility to set/change default options is going to be implemented in
>> a different patch.

>
> OK I'm fine with this. Let's keep your work in small chunks for
> easier handling.
>
>> diff --git a/doc/configuration.txt b/doc/configuration.txt
>> index ef768f8..bdb5c78 100644
>> --- a/doc/configuration.txt
>> +++ b/doc/configuration.txt
>> @@ -4642,6 +4642,13 @@ cookie <value>
>>    the same cookie value, and it is in fact somewhat common between normal and
>>    backup servers. See also the "cookie" keyword in backend section.
>>  
>> +error_limit <count>

>
> "error-limit" please :-)

Sure. Would you like a new patch with that change?

> Well, overall this looks pretty good. I have two new questions.
> You said we'd have default settings for the action and consecutive
> errors so that we could enable a default behaviour by just setting
> the "observe XXX" statement. I still have in mind my stupid choice
> of the "stats" keyword where we have lots of default settings that
> are all used once we enable any of them, making it difficult to
> change them once set in a defaults instance. For instance, if you
> put "stats uri /stats" in a defaults intance, you have stats enabled
> everywhere.
>
> I don't think the same issue could happen here because from my
> understanding we still have to enable the behaviour by using
> "observe XXX". But I'd like that you take some time to check for
> a few corner cases we could fall into in case we've overseen
> something. For instance, when supporting settings in defaults
> instance, we don't want a simple default setting of the number
> of consecutive errors to enable response checking for all servers
> of all backends. We just want this to set the default value.
>
> Maybe I'm not very clear, please tell me if so.

Yes, it is exacly how you described it - we'll be able to change defaults without enabling the feature itself.

Currently I'm thinking about something like:
> server default [error-limit XX] [on-error XX] [inter XX] [fastinter XX] [downinter XX] [rise XX] [fail XX]
+ maybe some other options we might find useful.

> The second point is that I've got reports of people checking
> very large numbers of servers (say 500) at a very high rate
> (10-30 ms) because they prefer to fail a request quickly than
> to send a reqeust to a server which might return bad contents.
> (For this reason, I think your patch will interest them :-))
> But those people are generally experiencing high CPU usages
> because of the checks alone. I'm realizing that from the
> start, I've never considered the checks to be on the critical
> path. In such situations it can really be a critical path, and
> we must keep in mind that we should keep processing low. I've
> not seen anything expensive in your patch, but it's just a
> reminder to help you make choices in the future if required.

This patch should not make healht checks more CPU hungry, however failed requests cost now little more. Hopefully, if health analyses feature is not enabled, it is one more call to health_adjust() which simply returns in the first condition.

> Feel free to tell me when you'd like your patch to be merged.
> In my opinion it's already in good shape.

I have just sent the v3 version, if you accept the changes I think we can merge it now, of course with respect to s/error_limit/error-limit/g.

Best regards,

                        Krzysztof Olędzki Received on 2009/12/15 23:29

This archive was generated by hypermail 2.2.0 : 2009/12/15 23:45 CET