Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1

From: Willy Tarreau <w#1wt.eu>
Date: Mon, 15 Mar 2010 18:43:59 +0100


Hi Erik,

On Mon, Mar 15, 2010 at 10:27:38AM +0100, Erik Gulliksson wrote:
> Hi Willy,
>
> Thanks for your elaborative answer.
>
> > Did you observe anything special about the CPU usage ? Was it lower
> > than with 1.3 ? If so, it would indicate some additional delay somewhere.
> > If it was higher, it could indicate that the Transfer-encoding parser
> > takes too many cycles but my preliminary tests proved it to be quite
> > efficient.
>
> I did not notice anything special about CPU usage. It seems to be
> around 2-4% with both versions. When checking munin-graphs, this
> morning I did however notice that the counter "connection resets
> received" from "netstat -s" was increasing a lot more with 1.4.
>
> This led me to look at the log more closely, and there seems to be a
> lot new errors that looks something like this:
> w.x.y.z:4004 [15/Mar/2010:09:50:51.190] fe_xxx be_yyy/upload-srvX
> 0/0/0/-1/62 502 391 - PR-- 9/6/6/3/0 0/0 "PUT /dav/filename.ext
> HTTP/1.1"
Interesting ! It looks like haproxy has aborted because the server returned an invalid response. You can check that using socat on the stats socket. For instance :

   echo "show errors" | socat stdio unix-connect:/var/run/haproxy.stat

If you don't get anything, then it's something else :-/

> This is only for a few of the PUT requests, most requests seem to get
> proxied successfully. I will try to reproduce this in a more
> controlled lab setup where I can sniff HTTP-headers to see what is
> actually sent in the request.

That would obviously help too :-)

> > No, I've run POST requests (very similar to PUT), except that there
> > was no Transfer-Encoding in the requests. It's interesting that you're
> > doing that in the request, because Apache removed support for TE:chunked
> > a few years ago because there was no user. Also, most of my POST tests
> > were not performance related.
>
> Interesting. We do use Apache for parts of this application on the
> backend side, although PUT requests are handled by an in-house
> developed Erlang application.

OK.

> > A big part has changed, in previous version, haproxy did not care
> > at all about the payload. It only saw headers. Now with keepalive
> > support, it has to find requests/responses bounds and as such must
> > parse the transfer-encoding and content-lengths. However, transfer
> > encoding is nice to components such as haproxy because it's very
> > cheap. Haproxy reads a chunk size (one line), then forwards that
> > many bytes, then reads a new chunk size, etc... So this is really
> > a cheap operation. My tests have shown no issue at gigabit/s speeds
> > with just a few bytes per chunk.
> >
> > I suspect that the application tries to use the chunked encoding
> > to simulate a bidirectionnal access. In this case, it might be
> > waiting for data pending in the kernel buffers which were sent by
> > haproxy with the MSG_MORE flag, indicating that more data are
> > following (and so you should observe a low CPU usage).
> >
> > Could you please do a small test : in src/stream_sock.c, please
> > comment out line 616 :
> >
> >   615                          /* this flag has precedence over the rest */
> >   616                     //     if (b->flags & BF_SEND_DONTWAIT)
> >   617                                  send_flag &= ~MSG_MORE;
> >
> > It will unconditionally disable use of MSG_MORE. If this fixes the
> > issue for you, I'll probably have to add an option to disable this
> > packet merging for very specific applications.
>
> I tried to comment out the line above as instructed, but it made no
> noticable change. As stated above, I will try to reproduce the problem
> in a lab setup. This may be an issue with our application rather than
> haproxy.

OK, thanks for testing !

Best regards,
Willy Received on 2010/03/15 18:43

This archive was generated by hypermail 2.2.0 : 2010/03/15 19:00 CET