Re: haproxy & linux firewall (netfilter)

From: Krzysztof Oledzki <ole#ans.pl>
Date: Sat, 20 Oct 2007 16:03:38 +0200 (CEST)

On Sat, 20 Oct 2007, Willy Tarreau wrote:

> On Sat, Oct 20, 2007 at 02:54:00PM +0200, Krzysztof Oledzki wrote:
>>> Quite interesting, it reminds me the old days when I put netfilter-based
>>> firewalls in production for the first time.... I got 10% drops because
>>> at this time it would not accept a SYN during TIME_WAIT.
>>
>> This is exactly what I get, but I managed to workaround it temporairly
>> allowing haproxy to setup a pool of addresses used in a roundrobin mode:
>>
>> backend some-name
>> mode http
>> balance roundrobin
>> cookie SERVERID insert indirect nocache
>>
>> retries 4
>> redispatch
>>
>> source 192.168.150.11
>> source 192.168.150.12
>> source 192.168.150.13
>> source 192.168.150.14
>> source 192.168.150.15
>> source 192.168.150.16
>> source 192.168.150.17
>> source 192.168.150.18
>> source 192.168.150.19
>>
>> server (...)
>
> I assume that you put *one* source address per server entry.

No, with each new connection haproxy gets next source address from above list. There are no source address defined per server.

<CUT>

>> Did your work got pushed into the kernel?
>
> Yes, and in fact you're using it ;-)
>
> $ grep -iA3 willy /usr/src/linux-2.6.20/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
> * Willy Tarreau:
> * - State table bugfixes
> * - More robust state changes
> * - Tuning timer parameters

OK, thank you.

> I had to add several states to the FSM and to fix several transitions too.
> We've been working hard with Jozsef, because it's very tempting to reject
> non-conform traffic, but we must refrain from it. I used to grab logs and
> captures on the production system, try to analyze, reproduce, and propose
> fixes. Jozsef is a very nice person to work with BTW.

Ineed, I can confirm this. :)

>>> In fact, what is strange is that the TCP stack on the peer accepts the
>>> SYN. I've very used to encounter this problem when testing firewalls
>>> for instance. You simply chain an HTTP client, a firewall which randomizes
>>> ISN (PIX or OpenBSD) and an HTTP server.
>>
>> No, there is no firewall which randomizes ISN, only Linux & Windows. Both
>> ISN and port randomization is performed by my Linux server (IP stack
>> feature).
>
> What is very strange is that linux uses random increments, so your ISNs
> should not wrap in a matter of a few seconds.

Good point. I need to investigate this.

>>> The common problem is that once you have rolled over the range of source
>>> ports, the traffic falls down to a very low rate, and you observe this :
>>
>> With newest kernels (src port randomization code is there) this problem
>> may appear _much_ faster as there is no need to roll over to hit
>> previously used port. This is the reason why "source pool" only made this
>> less likely to happen.
>
> I think there has been another change to randomize ISNs, otherwise I cannot
> explain what you get!

I am not sure exactly when (2.6.21 or 2.6.22) but there is a new code for source port randomization. Dunno about ISNs. :(

>>> 1. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
>>> 2. C <--[ACK(ACK=Z)]---------> FW <--[ACK(ACK=Z)]---------- S
>>> 3. C ---[RST(SEQ=Z+1)]-------> FW ---[RST(SEQ=Z+1)]-------> S
>>> ( 3 seconds delay )
>>> 4. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
>>> 5. C <--[SYN/ACK(ACK=X+1)]---> FW <--[SYN/ACK(ACK=Y+1)]---- S
>>> 6. C ---[ACK(SEQ=X+1)]-------> FW ---[ACK(SEQ=X+1)]-------> S
>>>
>>> The reason is S geting a SYN with a SEQ lower than what it has in its
>>> table for a TIME_WAIT session. Thus it naturally just sends an ACK to
>>> remind its peer where it was last time, but the peer obviously refuses
>>> a simple ACK in response to a SYN, then sends an RST which definitely
>>> terminates the session on S. When C retries its SYN, S is happy and
>>> accepts it.
>>>
>>> The two solutions I know to this problem are :
>>> 1) enable PAWS (echo 1 > tcp_timestamps)
>>> This is the cleaner method as it was invented exactly for that
>>> problem of ISNs rolling over in too short a time. It requires
>>> both the client, the server and the firewall to support it,
>>> though. But while the real problem would be on the firewall,
>>> we can note that those which are able to randomize ISNs generally
>>> support PAWS.
>>
>> Yes, I'm using timestamps, maybe this explains why my Windows server
>> accepts such connections.
>
> Maybe (I said *maybe*) linux completely randomizes the ISNs when timestamps
> are enabled ? You may want to retry with timestamps disabled. Anyway, I
> think it would be time to implement PAWS in netfilter :-/

I agree but I do not feel brave enough to do it myself. :|

<CUT>

>> OK. BTW: what do you think about this "source pool" idea? Initially I
>> thought that it is only a workaround for a bug existed outside the
>> haproxy, but since I already mentioned about this patch I start wondering
>> if such functionality may be useful. If so, I can clean this patch and
>> push it to you.
>
> Since I've already implemented it in another program, I know that when
> you do this, you also need to manage the source ports yourself.

Only when you want to implement it this way. In my solution it simply was:

+struct source_addr_pool {
+       struct sockaddr_in addr;
+       struct source_addr_pool *next;
+};
+

(...)

struct proxy {
(...)

-       struct sockaddr_in source_addr;         /* the address to which we want to bind for connect() */
+       struct source_addr_pool *source_addr;   /* pool of addresses to which we want to bind for connect() */
+       struct source_addr_pool *curr_sa;       /* the address to which we want to bind for connect() */
}

(...)

-               if (bind(fd, (struct sockaddr *)&s->be->source_addr, sizeof(s->be->source_addr)) == -1) {
+               struct source_addr_pool *sap = s->be->curr_sa;
+
+               s->be->curr_sa=sap->next?sap->next:s->be->source_addr;
+               if (bind(fd, (struct sockaddr *)&(sap->addr), sizeof(sap->addr)) == -1) {

Of course this is only a short example, there are more places requiring changes.

<CUT>
> I'm not really sure this is interesting to do. In your case, the bug is
> between linux and the firewall which runs on it (netfilter). It's not
> expected that if you enable timestamps exactly to fix this problem, it
> makes the problem worse !

OK. I am also not sure and that is why I have never pushed this.

Best regards,

                                 Krzysztof Olędzki Received on 2007/10/20 16:03

This archive was generated by hypermail 2.2.0 : 2007/11/04 19:21 CET