Re: [PATCH] [MEDIUM] Health check reporting code rework + health logging, v2

From: David Birdsong <david.birdsong#gmail.com>
Date: Fri, 25 Sep 2009 14:36:18 -0700


On Fri, Sep 25, 2009 at 2:28 PM, Willy Tarreau <w#1wt.eu> wrote:
> On Fri, Sep 25, 2009 at 10:59:48PM +0200, Krzysztof Oledzki wrote:

>> >     if (s->proxy->options2 & PR_O2_LOGHCHKS &&
>> >-        (((s->state & SRV_RUNNING) && (s->result & SRV_CHK_ERROR)) ||
>> >-         (!(s->state & SRV_RUNNING) && (s->result & SRV_CHK_RUNNING)) ||
>> >+        (((s->health != 0) && (s->result & SRV_CHK_ERROR)) ||
>> >+         ((s->health != s->rise + s->fall - 1) && (s->result &
>> >SRV_CHK_RUNNING)) ||
>> >         ((s->state & SRV_GOINGDOWN) && !(s->result & SRV_CHK_DISABLE)) ||
>> >         (!(s->state & SRV_GOINGDOWN) && (s->result & SRV_CHK_DISABLE))))
>> >         {
>>
>> Indeed, it is good idea to log such situations. Your condition looks much
>> better. ;)
>

> Well, I would not say that, both require some mount of brain cycles
> to parse :-)
>
>> >A workaround could be to report the number of checks left before UP
>> >or DOWN, which depends on the server status and health. If UP with
>> >a failed check, we report "(health-rise) checks left before going down",
>> >and if we have a DOWN state with a successful check, we could report
>> >"(rise-1-health) checks left before going up". That's just an idea.
>>
>> Looks very reasonable, I'll try to implement it.
>

> nice.
>
>> >Hmm I've just tested with "http-check disable-on-404", and the
>> >404 are reported as failed checks while the health increases.
>> >Here again it's not easy because a 404 is a success if the server
>> >is UP, and a failure if the server is down.
>>
>> For me 404 is always failed check, even with disable-on-404 as it disables
>> balancing to that server. Maybe with disable-on-404 it is not fatal - a
>> proxy may decide that a server is up, but still...
>

> Well, it is a special case defined as a success in haproxy's specs.
> If you set up disable-on-404, AND the server returns 404, it means
> it is still alive but does not want to receive any new session. This
> is an admin-defined maintenance mode, which of course, excludes the
> ability to detect a failure on 404. It's almost only used with
> dynamic servlets tuned to respond like this, but I know there are
> a few users running with it on purely static sites in order to ease
> content deployment.

aye, we use 404's for marking a server up/down in a static environment. basically local daemons that make load based decisions to add or remove the static health check file attracting or repelling traffic from the host.
>
>> >I think that computing a "direction" variable at the top of the function
>> >should help a lot, it would contain -1 if the check is a failure, 1 if
>> >it's a success, or 0 if it has no impact.
>> >
>> >I think that a success is designated by :
>> >
>> > (SRV_RUNNING && CHK_DISABLE) || (health < rise+fall-1 && CHK_RUNNING)
>> >
>> >A failure should be :
>> >
>> > (!SRV_RUNNING && CHK_DISABLE) || (health != 0 && CHK_ERROR)
>> >
>> >I'm not sure for the last one if it takes into account the check
>> >timeout (maybe we don't have CHK_ERROR in this case).
>>
>> I need to think a little about it, but generally it should work.
>

> I think, more or less a +/- 1 on the tests.
>
>> However,
>> as I stated above - I don't want to call it as a succeeded check, but if
>> you insist we may add third case - "conditionally succeeded". What do you
>> think?
>

> I'm perfectly happy with the third case, and that would reflect the
> special mesasge you already have. In fact, here's what I get on this
> situation :
>

> Sep 25 21:42:17 localhost haproxy[25439]: Health check for server Thousand_HTTP/127.0.0.001 succeeded, reason: Layer7 check conditionally passed, code: 404, check duration: 0ms, UP/DOWN status: 3/6.
>

> I already don't find it ambiguous at all given the "conditionnally
> passed" message. But in my opinion, this is only acceptable if we
> take into account that the server is being monitored for this (which
> CHK_DISABLE already does). I would absolutely not want to see a
> success on a 404 if disable-on-404 is not set, of course !
>

> Regards,
> Willy
>
>
>
Received on 2009/09/25 23:36

This archive was generated by hypermail 2.2.0 : 2009/09/25 23:45 CEST