Hypermail

From: Krzysztof Olędzki <ole#ans.pl>
Date: Thu, 01 Sep 2011 22:55:48 +0200

On 2011-09-01 22:15, Julien Vehent wrote:
>
> On Thu, 01 Sep 2011 03:09:56 +0200, Krzysztof Olędzki wrote:
>>
>> Your concern is very valid and I think this is a moment where you
>> should take advantage of HAProxy, so you came to the right place. ;)
>> Each active session on HAProxy does not cost too much (much less than
>> on http server), so you may use "http-server-close" mode. You will
>> provide keep-alive to clients and only to clients - http requests
>> between LB and http server(s) will be handled without keep-alive.
>>
>> HAproxy also gives you possibility to transparently distribute
>> requests (and load) between two or more servers, without additional
>> dns records.
>>
>
> As a matter of fact, I've been around for some time :) We are already
> using haproxy but not as a web front end. Our architecture will look
> like that once I had the secondary lighttpd
>
> +-----------+
> +--------+ +----> tomcat |
> +-------->lighttpd+ | +-----------+
> keep +--------| | +-----------+
> alive +->+----------+----> tomcat |
> |haproxy + +-----------+
> +--------+->+----------| +-----------+
> +-------->lighttpd| |----> tomcat |
> keep +--------+ | +-----------+
> alive | +-----------+
> +----> tomcat |
> +-----------+
>
> We do not do any keepalive passed lighttpd. Everything is non-keepalive
> because we need the x-forwarded-for header on each request for tomcat.
> (that's yet another story)

With http-server-close you will still get a proper x-forwarded-for header.

> Anyway, the reason why I don't want to put haproxy in front of
> everything right now is because we have a shit load of rewrite rules and
> it will take me forever to convert all of that into haproxy acl's
> syntax.

Sure, but passing the same traffic over several proxies looks very suboptimal to me. Especially that HAProxy was designed to be a proxy and lighthttp was not.

> Plus, whether it's lighttpd or haproxy, the memory impact on the kernel
> is going to be the same.

Kernel - yes, userspace - not. Remember that proxy needs some memory for each active connection. And this is much, much more memory that is required by a kernel.

> Right now, I'm considering putting a keepalive timeout at 5 seconds.
> Maybe even less.

From my pov 5s for keepalive looks very reasonable.

>>> As a side question: do you know where I can find the information
>>> regarding connection size and conntrack size in the kernel ? (other
>>> than
>>> printf size_of(sk_buff) :p).
>>
>> Conntrack has nothing to do with sk_buff. However, you are able to
>> find this information with for example:
>> # egrep "(#|connt)" /proc/slabinfo
>>
>
> Very nice ! I'll read about slabinfo. Here is the result from the
> current lighttpd server
>
> # name<active_objs> <num_objs> <objsize> <objperslab>
> <pagesperslab> : tunables<limit> <batchcount> <sharedfactor> : slabdata
> <active_slabs> <num_slabs> <sharedavail>
> ip_conntrack_expect 0 0 136 28 1 : tunables 120 60
> 8 : slabdata 0 0 0
> ip_conntrack 23508 39767 304 13 1 : tunables 54 27
> 8 : slabdata 3059 3059 15

304 bytes? Which kernel version?

>> It should be around 208 bytes on x86 and 264 bytes on x86_64 (2 x
>> longer pointers), but this is not all. Each conntrack can have some
>> additional data attached, which is known as "extends". Currently
>> there
>> are 5 possible extends:
>> - helper - struct nf_conn_help: 16 bytes (x86) / 24 bytes (x86_64)
>> - nat - struct nf_conn_nat: 16 bytes (x86) / 24 bytes (x86_64)
>> - acct - struct nf_conn_counter: 16 bytes
>> - ecache - struct nf_conntrack_ecache: 16 bytes
>> - zone - struct nf_conntrack_zone: 2 bytes
>>
>> So, as you can see, in the worst case there can be 66 / 82 more bytes
>> allocated with each conntrack and this goes into kmalloc-x slab that
>> rounds it into 2^n bytes.
>>
>
> That's for conntrack only ? This is pretty low (346 bytes max per
> connection), I thought conntrack would consume more than that.

No, conntracks are rather cheap and a decent hardware is able handle even 500K without much trouble. However, as they are hashed and collected into buckets you may waste much more memory that a number of conntracks migh indicate. Especially that with big load you really need to bumb nf_conntrack.hashsize not to saturate your CPU.

> What about the sk_buff and other structure? I didn't dive in the
> networking layer for some time. I don't need an exact number, but just
> an idea of how much memory we are talking about.

Size of sk_buff depends on MTU, driver used and several other factors. With MTU=1500 you typically are able to fit into 2048 or 4096 bytes.

Best regards,

Krzysztof Olędzki Received on 2011/09/01 22:55

Re: This good-old keep-alive discussion again