Re: haproxy & linux firewall (netfilter)

From: Willy Tarreau <w#1wt.eu>
Date: Sat, 20 Oct 2007 15:31:01 +0200


On Sat, Oct 20, 2007 at 02:54:00PM +0200, Krzysztof Oledzki wrote:
> >Quite interesting, it reminds me the old days when I put netfilter-based
> >firewalls in production for the first time.... I got 10% drops because
> >at this time it would not accept a SYN during TIME_WAIT.
>
> This is exactly what I get, but I managed to workaround it temporairly
> allowing haproxy to setup a pool of addresses used in a roundrobin mode:
>
> backend some-name
> mode http
> balance roundrobin
> cookie SERVERID insert indirect nocache
>
> retries 4
> redispatch
>
> source 192.168.150.11
> source 192.168.150.12
> source 192.168.150.13
> source 192.168.150.14
> source 192.168.150.15
> source 192.168.150.16
> source 192.168.150.17
> source 192.168.150.18
> source 192.168.150.19
>
> server (...)

I assume that you put *one* source address per server entry.

> It helped _a lot_ but still it did not resolved this problem completely -
> I still get about 1% unsuccessful connections. Unfortunately Linux can
> still use a port waiting in a TIME_WAIT, even if there are other "free"
> ports.

Yes I know about this problem. This is why on my HTTP traffic generator, I manage the source IPs and ports myself, in order to perform an LRU-style algorithm.

> >I remember to have worked with Joszef precisely on the part which was
> >changed above, and I'm not sure that those changes are enough.
>
> Did your work got pushed into the kernel?

Yes, and in fact you're using it ;-)

$ grep -iA3 willy /usr/src/linux-2.6.20/net/ipv4/netfilter/ip_conntrack_proto_tcp.c

I had to add several states to the FSM and to fix several transitions too. We've been working hard with Jozsef, because it's very tempting to reject non-conform traffic, but we must refrain from it. I used to grab logs and captures on the production system, try to analyze, reproduce, and propose fixes. Jozsef is a very nice person to work with BTW.

> >In fact, what is strange is that the TCP stack on the peer accepts the
> >SYN. I've very used to encounter this problem when testing firewalls
> >for instance. You simply chain an HTTP client, a firewall which randomizes
> >ISN (PIX or OpenBSD) and an HTTP server.
>
> No, there is no firewall which randomizes ISN, only Linux & Windows. Both
> ISN and port randomization is performed by my Linux server (IP stack
> feature).

What is very strange is that linux uses random increments, so your ISNs should not wrap in a matter of a few seconds.

> >The common problem is that once you have rolled over the range of source
> >ports, the traffic falls down to a very low rate, and you observe this :
>
> With newest kernels (src port randomization code is there) this problem
> may appear _much_ faster as there is no need to roll over to hit
> previously used port. This is the reason why "source pool" only made this
> less likely to happen.

I think there has been another change to randomize ISNs, otherwise I cannot explain what you get!

> >1. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
> >2. C <--[ACK(ACK=Z)]---------> FW <--[ACK(ACK=Z)]---------- S
> >3. C ---[RST(SEQ=Z+1)]-------> FW ---[RST(SEQ=Z+1)]-------> S
> > ( 3 seconds delay )
> >4. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
> >5. C <--[SYN/ACK(ACK=X+1)]---> FW <--[SYN/ACK(ACK=Y+1)]---- S
> >6. C ---[ACK(SEQ=X+1)]-------> FW ---[ACK(SEQ=X+1)]-------> S
> >
> >The reason is S geting a SYN with a SEQ lower than what it has in its
> >table for a TIME_WAIT session. Thus it naturally just sends an ACK to
> >remind its peer where it was last time, but the peer obviously refuses
> >a simple ACK in response to a SYN, then sends an RST which definitely
> >terminates the session on S. When C retries its SYN, S is happy and
> >accepts it.
> >
> >The two solutions I know to this problem are :
> > 1) enable PAWS (echo 1 > tcp_timestamps)
> > This is the cleaner method as it was invented exactly for that
> > problem of ISNs rolling over in too short a time. It requires
> > both the client, the server and the firewall to support it,
> > though. But while the real problem would be on the firewall,
> > we can note that those which are able to randomize ISNs generally
> > support PAWS.
>
> Yes, I'm using timestamps, maybe this explains why my Windows server
> accepts such connections.

Maybe (I said *maybe*) linux completely randomizes the ISNs when timestamps are enabled ? You may want to retry with timestamps disabled. Anyway, I think it would be time to implement PAWS in netfilter :-/

> > 2) disable randomization on the firewall. This is also a solution,
> > but it more often hides a real problem than fix it. In fact,
> > while adding random breaks compliance with the RFC (which clearly
> > states that ISNs must monotonically grow), it also enlights a
> > real problem with the way the connections are handled in the
> > whole chain.
>
> Like I said - there is no randomization on the firewall. Strictly speaking
> - there is no other device (except a L2 switch) between haproxy (Linux)
> and Window Servers.

OK, I was speaking in the general case, not your particularly :-)

> >In your case, you fixed the firewall, which was the first one to
> >block. But I'm surprized that the server accepts your SYNs. Maybe
> >it's because the TCP stack is different (windows). As Patrick said
> >it in the discussion, it would be better to add PAWS to netfilter
> >(and 8 more bytes aren't that much of a problem, considering the
> >current size of the session table).
> >
> >I'll see if the patches are also relevant to my 2.4-based kernels
> >(since I still get quite a higher performance with 2.4 than 2.6).
>
> OK. BTW: what do you think about this "source pool" idea? Initially I
> thought that it is only a workaround for a bug existed outside the
> haproxy, but since I already mentioned about this patch I start wondering
> if such functionality may be useful. If so, I can clean this patch and
> push it to you.

Since I've already implemented it in another program, I know that when you do this, you also need to manage the source ports yourself. And this does eat a lot of memory (16 bits per IP/port). For instance, with 50000 source ports and 20 IP addresses, you consume 2 MB of RAM. I already have written this somewhere in my TODO list, then realized that it would not work well in multi-process mode (otherwise the ports would have to be allocated by steps, like I do in the traffic generator).

I'm not really sure this is interesting to do. In your case, the bug is between linux and the firewall which runs on it (netfilter). It's not expected that if you enable timestamps exactly to fix this problem, it makes the problem worse !

Regards,
Willy Received on 2007/10/20 15:31

This archive was generated by hypermail 2.2.0 : 2007/11/04 19:21 CET