Truncated health check response from real servers

From: Nick Chalk <nick#loadbalancer.org>
Date: Wed, 10 Feb 2010 16:10:46 +0000


Hello.

I wonder if anyone can assist with this problem, reported by one of our customers.

The load balancer is running HAProxy 1.4-rc1, with a modified version of the HTTP ECV patch applied. The customer is using ECV to check the status of a pair of IIS web servers:

listen web 10.3.4.150:80

	mode tcp
	option httpchk GET /gccheck.cfm HTTP/1.0
	http-check expect rstring gCokay
	balance source
	server realserver1 10.4.1.6:80 weight 1 check  inter 2000 rise 2 fall 3
	server realserver2 10.4.1.16:80 weight 1 check  inter 2000 rise 2 fall 3
	server backup 127.0.0.1:9081 backup
	option redispatch
	option abortonclose
	maxconn 40000
	log global

We are seeing both real servers repeatedly going on- and off-line with a period of tens of seconds. Packet tracing, stracing, and adding debug code to HAProxy itself has revealed that the real servers are always responding correctly, but HAProxy is sometimes receiving only part of the response.

It appears that the real servers are sending the test page as three separate packets. HAProxy receives the contents of one, two, or three packets, apparently randomly. Naturally, the health check only succeeds when all three packets' data are seen by HAProxy. If HAProxy and the real servers are modified to use a plain HTML page for the health check, the response is in the form of a single packet and the checks do not fail.

Attached is an example packet trace, firstly a failed check, then a successful one. 10.3.4.90 is the load balancer, 10.4.1.6 is the real server. Wireshark does not seem to recognise the responses as HTTP - they are shown as [TCP segment of a reassembled PDU].

In the first check, the load balancer appears to close the connection early, in frame 9, after receiving only one of the response packets. In the second check, all three packets are received before the connection is closed. strace and dumping data from within HAProxy shows that data is discarded after the connection is closed. The other attached file shows a fragment from stracing HAProxy - here, the first request is successful, the second is truncated. (I've elided customer details in the response.)

This appears not to be an HAProxy problem, as it is not receiving the full real server response. Any ideas for further testing I can perform?

Thanks,
Nick.

-- 
Nick Chalk.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/

Received on 2010/02/10 17:10

This archive was generated by hypermail 2.2.0 : 2010/02/10 17:15 CET