Re: Sending requests to servers that are marked as down?

From: Willy Tarreau <w#1wt.eu>
Date: Sun, 30 Nov 2008 08:52:40 +0100


Hello Kai,

On Sun, Nov 30, 2008 at 01:28:05AM +0000, Kai Krueger wrote:
> > From your description, it looks like this is what is happening. However,
> >this must be independant on the LB algo. What might be happening though,
> >is that the last server to be identified as failing gets all the requests.
> >
> Is there a way to try and find out if this is the case? I think at those
> times, there were still other backends up.

You can only check this by carefully analysing the logs. You will see at what moment exactly a server goes down, and when requests are sent to it.

> Bellow I have a log which I think captures the effect. (Similar config
> as previously, but some maxconn and check inter settings tuned to make
> it easier to trigger this and it was run on a different machine)

Ah, exactly the thing I wanted to see ;-)

> Nov 30 00:21:31 aiputerlx haproxy[10400]: Proxy ROMA started.
> Nov 30 00:21:45 aiputerlx haproxy[10400]: 127.0.0.1:51993
> [30/Nov/2008:00:21:39.125] ROMA ROMA/mat 0/0/0/2339/6830 200 8909913 - -
> ---- 3/3/3/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"

This one was before the last check.

> Nov 30 00:21:47 aiputerlx haproxy[10400]: Server ROMA/mat is DOWN. 2
> active and 0 backup servers left. 2 sessions active, 0 requeued, 0
> remaining in queue.
> Nov 30 00:21:55 aiputerlx haproxy[10400]: 127.0.0.1:52006
> [30/Nov/2008:00:21:45.544] ROMA ROMA/mat 0/0/0/4754/9698 200 8909913 - -
> ---- 4/4/4/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
> Nov 30 00:21:56 aiputerlx haproxy[10400]: 127.0.0.1:52009
> [30/Nov/2008:00:21:46.908] ROMA ROMA/mat 2/0/0/3830/8578 200 8909913 - -
> ---- 3/3/3/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"

These last two were sent before the the server was seen as down.

> Nov 30 00:22:03 aiputerlx haproxy[10400]: 127.0.0.1:52014
> [30/Nov/2008:00:21:48.399] ROMA ROMA/mat 0/6843/0/3320/14661 200 8909913
> - - ---- 2/2/2/0/0 0/1 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30
> HTTP/1.0"
This one was sent after the server was marked as down. Strange.

> Nov 30 00:22:05 aiputerlx haproxy[10400]: 127.0.0.1:52001
> [30/Nov/2008:00:21:44.025] ROMA ROMA/mat 0/0/13/10559/21837 200 8938528
> - - ---- 1/1/1/0/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
> Nov 30 00:22:18 aiputerlx haproxy[10400]: 127.0.0.1:51998
> [30/Nov/2008:00:21:42.519] ROMA ROMA/Quiky 0/0/130/3176/35580 200 8938503
> - - ---- 0/0/0/0/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"

These one were sent before. I don't understand why they were not reported in the DOWN message ("2 sessions active", should have been "4").

> Nov 30 00:22:26 aiputerlx haproxy[10400]: Server ROMA/mat is UP. 3
> active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
>
> According to the log, Server ROMA/mat is DOWN at 00:21:47, however the
> 4th request which was sent at 00:21:48 was still queued to ROMA/mat
> which at that time was still down, as it is not up till 00:22:26. It
> might have received a valid health check message and might have been
> down going up. Am I perhaps misreading the logs here?

No, you're reading the logs correctly and I don't understand the problem either. There are two problems above IMHO :

Could you try to change the LB algo from leastconn to roundrobin and reproduce the test above ? If only leastconn triggers the issue, then there might be a bug in its set_server_down() function. Maybe sometimes a server is not removed from the tree.

> >>Is there a way to prevent this from happening?
> >>
> >
> >Not really since the connections are already established, it's too late.
> >You can shorten your srvtimeout though. It will abort the requests earlier.
> >Yours are really huge (20mn for HTTP is really huge). But anyway, once
> >connected to the server, the request is immediately sent, and even if you
> >close, you only close one-way so that server will still process the
> >request.
> >
> The backend servers have to query a database to generate the response
> and some of these replies can occasionally take up to 10 - 20 minutes to
> process. This is why we chose the very long request timeout so that even
> these can be handled. Th

The problem is that you have large timeouts for all requests. Can't you identify some requests which might take long, and switch them to a distinct backend section with a large timeout ?

> >In fact, what I wanted to add in the past, was the ability to either
> >modulate weights depending on the error ratios, or speed-up health
> >checks when an error has been detected in a response, so that a failing
> >server can be identified faster.
> That would probably be useful to detect these problems and take the
> backend server offline faster. I had a similar idea and tried hacking
> the source code a bit. I ended up adding a call to set_server_down() in
> process_srv() of proto_http.c if the reply is a 500 error. It seemed to
> work sofar, but I haven't understood the code well enough to know if
> this is even closely valid or if it has nasty race conditions or other
> problems. Is this a safe thing to do?

For it to work, you need to set s->health = s->rise before calling the function (otherwise it does nothing as you might have noticed). I don't see any reason for this to be unsafe.

> >Also, is it on purpose that your "inter"
> >parameter is set to 10 seconds ? You need at least 20 seconds in your
> >config to detect a faulty server. Depending on the load or your site,
> >this might impact a huge number of requests. Isn't it possible to set
> >shorter intervals ? I commonly use 1s, and sometimes even 100ms on some
> >servers (with a higher fall parameter). Of course it depends on the work
> >performed on the server for each health-check.
> >
> It could probably be reduced, somewhat, although if there is a queue
> even a short interval may see a lot of failed requests,
> as it can drain away the entire queue in the mean time.

You shouldn't have a queue on the server itself, otherwise haproxy's health checks could fail. You should only use haproxy's queuing mechanism as you do with the servers' maxconn parameters. However, you should ensure that these values are low enough to let a health check pass through, since health-checks bypass the queue. So if your server can process 10 concurrent requests, having maxconn 9 leaves one slot for a health-check, and this one can be fast.

> >There are two problems to replay a request. The first one is that the
> >request is not in the buffer anymore once the connection is established
> >to the server. We could imagine mechanisms to block it under some
> >circumstances. The second problem is that as you say, haproxy cannot
> >know which requests are safe to replay. HTTP defines idempotent methods
> >such as GET, PUT, DELETE, ... which are normally safe to replay.
> >Obviously now GET is not safe anymore, and with cookies it's even
> >more complicated because a backend server can identify a session
> >for a request and make that session's state progress for each request.
> >Also imagine what can happen if the user presses Stop, clicks a different
> >link, and by this time, haproxy finally gives up on former request and
> >replays it on another server. You can seriously affect the usability of
> >a site or even its security (you don't want a login page to be replayed
> >past the logout link for instance).
> >
> >So there are a lot of complex cases where replaying is very dangerous.
> >
> It is true that in many cases it is dangerous to requeue requests, but
> it would be nice if there were a configuration parameter with which one
> could tell haproxy that in this particular case one knows for sure that
> requeueing is safe.

This could be selected via ACLs for instance. The real problem right now is still to keep the request in the buffer once it has already been sent.

> >I'd be more tempted by adding an option to not return anything upon
> >certain types of errors, so that the browser can decide itself whether
> >to replay or not. Oh BTW you can already do that using "errorfile".
> >Simply return an empty file for 502, 503 and 504, and your clients
> >will decide whether they should replay or not when a server fails to
> >respond.
> >
> Well, I had hoped to use haproxy to mask these errors and provide a
> higher availability and not shift the burden of retrying to the client.

I understand your point of view, but HTTP was designed so that it's up to the client to decide what to do in abnormal conditions. In particular, HTTP 1.1 is very clear on this. There are a lot of situations where a server or a client cannot complete an operation and must break the connection without sending data so that the other end knows.

I think that in your case it's different, because you don't use real browsers, but you use HTTP as a transport protocol between various systems. In this case, haproxy abstracts TCP problems from the HTTP layer, but the HTTP errors still have to be processed on the client just like it would have to handle TCP errors if it would be using TCP.

I'm still noting your need of replayable request. It might be easier to implement in future versions than it is right now. I still think that there's little chance we see this before keep-alive though...

> >>P.S. In case it helps, I have attached the configuration we use with
> >>haproxy (version 1.3.15.5, running on freeBSD)
> >>
> >
> >Thanks, that's a good reflex, people often forget that important part ;-)
> >
> :-)
>
> Here are the things that changed in the config that generated the log
> file above and it was this time run on a linux box
>
> server mat localhost:80 maxconn 6 weight 10 check inter 1s rise 30 fall 1
> server Quiky 68.49.216.76:80 maxconn 1 weight 10 check inter 10s rise 3 fall 1
> server mat 79.143.240.199:80 maxconn 1 weight 10 check inter 10s rise 3 fall 1

Look, you have to servers called "mat". That explains a lot of things in the log above ! Some of the requests were for the first one, others for the second one. Could you please recheck with different names for all of them ?

Regards,
Willy Received on 2008/11/30 08:52

This archive was generated by hypermail 2.2.0 : 2008/11/30 09:00 CET