Re: haproxy & linux firewall (netfilter)

From: Willy Tarreau <w#1wt.eu>
Date: Sat, 20 Oct 2007 07:56:04 +0200


Hi Krzysztof,

On Sat, Oct 20, 2007 at 12:21:49AM +0200, Krzysztof Oledzki wrote:
> Hello,
>
> This is maybe not strictly haproxy related but I believe that it is worth
> to notice that recently there were two quite important fixes that can
> dramatically improve performance of haproxy installed on a linux server
> with conntrack enabled, especially on the most recent kernels (2.6.22+?)
> that have tcp port randomisation feature implemented:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=17311393f969090ab060540bd9dbe7dc885a76d5
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bc34b841556aad437baf4199744e55500bfa2088
>
> If any of you are interested, there is a full thread describing the
> problem:
> http://marc.info/?t=119081130100010&r=2&w=4
> http://marc.info/?t=119081130100010&r=1&w=4

Quite interesting, it reminds me the old days when I put netfilter-based firewalls in production for the first time.... I got 10% drops because at this time it would not accept a SYN during TIME_WAIT. I remember to have worked with Joszef precisely on the part which was changed above, and I'm not sure that those changes are enough.

In fact, what is strange is that the TCP stack on the peer accepts the SYN. I've very used to encounter this problem when testing firewalls for instance. You simply chain an HTTP client, a firewall which randomizes ISN (PIX or OpenBSD) and an HTTP server. The common problem is that once you have rolled over the range of source ports, the traffic falls down to a very low rate, and you observe this :

  1. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
  2. C <--[ACK(ACK=Z)]---------> FW <--[ACK(ACK=Z)]---------- S
  3. C ---[RST(SEQ=Z+1)]-------> FW ---[RST(SEQ=Z+1)]-------> S ( 3 seconds delay )
  4. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
  5. C <--[SYN/ACK(ACK=X+1)]---> FW <--[SYN/ACK(ACK=Y+1)]---- S
  6. C ---[ACK(SEQ=X+1)]-------> FW ---[ACK(SEQ=X+1)]-------> S
The reason is S geting a SYN with a SEQ lower than what it has in its table for a TIME_WAIT session. Thus it naturally just sends an ACK to remind its peer where it was last time, but the peer obviously refuses a simple ACK in response to a SYN, then sends an RST which definitely terminates the session on S. When C retries its SYN, S is happy and accepts it.

The two solutions I know to this problem are :

  1. enable PAWS (echo 1 > tcp_timestamps) This is the cleaner method as it was invented exactly for that problem of ISNs rolling over in too short a time. It requires both the client, the server and the firewall to support it, though. But while the real problem would be on the firewall, we can note that those which are able to randomize ISNs generally support PAWS.
  2. disable randomization on the firewall. This is also a solution, but it more often hides a real problem than fix it. In fact, while adding random breaks compliance with the RFC (which clearly states that ISNs must monotonically grow), it also enlights a real problem with the way the connections are handled in the whole chain.

In your case, you fixed the firewall, which was the first one to block. But I'm surprized that the server accepts your SYNs. Maybe it's because the TCP stack is different (windows). As Patrick said it in the discussion, it would be better to add PAWS to netfilter (and 8 more bytes aren't that much of a problem, considering the current size of the session table).

I'll see if the patches are also relevant to my 2.4-based kernels (since I still get quite a higher performance with 2.4 than 2.6).

Thanks,
Willy Received on 2007/10/20 07:56

This archive was generated by hypermail 2.2.0 : 2007/11/04 19:21 CET