Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Wed, 14 Oct 2009 21:38:10 +0200

Hi Jonah,

On Wed, Oct 14, 2009 at 12:31:07AM -0700, Jonah Horowitz wrote:
>
> driver: tg3
> version: 3.98
> firmware-version: 5721-v3.55a
> bus-info: 0000:03:00.0

OK this is fine.

> Not running bnx2. Looks like it's not a 65563 limit either, I've been
> graphing it and it's up to 80k sometimes, but it goes up and down.

OK.

> When it fails, it seems like it's either 3 seconds or 9 seconds. Would tcp
> retransmits cause that?

yes, that's what I immediately observed on your graphs. Multiples of 3s are a typical consequence of TCP drops. Since the back-off algorithm is exponential, you have 3s, 6s, 12s, 24s ... between each retransmit. So having 3s and 9s implies that you sometimes lose one packet (3s) and sometimes two (3s+6s). The fact that you don't observe 6s implies that all packets are lost in the same direction.

Also, generally such timers are only observable for initial packets (SYN, SYN-ACK, ACK) because as soon as there is traffic, a drop is more quickly detected because the other end does not ack it at after several packets.

And a retransmit on SYNs are most often caused by saturated session tables somewhere (local nf_conntrack module, or any firewall between you and the other place). Oh, something else can happen. If you reach your servers through a PIX or FWSM firewall or at least one that randomizes sequence numbers, the other server will not always be able to accept a new connection for a source port that it has in TIME_WAIT, because the initial sequence number will not be greater than the previous one due to the random. Then the server will return a pure ACK instead of a SYN-ACK, to which your haproxy machine will respond with an RST, then a SYN later upon retransmit.

The only way to detect this is to put a sniffer on both ends and compare sequence numbers. They must match. If not, you have such a nasty thing in the middle that needs to be fixed (for PIX and FWSM, there is an option I don't remember for that).

> I just compiled a kernel with a default retransmit
> of 1sec, but I haven't tested it yet.
>
> Here's the output of netstat -s:
> Tcp:
> 2059992268 active connections openings
> 1933849278 passive connection openings
> 4543998 failed connection attempts
> 2093186 connection resets received
> 142 connections established
> 3547584716 segments received
> 3643865881 segments send out
> 20003371 segments retransmited

This seems to be a lot. Almost 1% of retransmits !

> 0 bad segments received.
> 6179288 resets sent

And this one could confirm the sequence number randomization hypothesis.

> UdpLite:
> TcpExt:
> 4237091 resets received for embryonic SYN_RECV sockets
> 1915476798 TCP sockets finished time wait in fast timer
> 28901367 time wait sockets recycled by time stamp
> 119887 packets rejects in established connections because of timestamp
> 2171355337 delayed acks sent
> 292818 delayed acks further delayed because of locked socket
> Quick ack mode was activated 697528 times
> 15213 times the listen queue of a socket overflowed
> 15213 SYNs to LISTEN sockets dropped

That is not very good, you seem to have a slightly too small SYN backlog queue. Or maybe this only happens during manipulations ?

Regards,
Willy Received on 2009/10/14 21:38

Re: Problems with long connect times