503 errors when services available and other connections succeed to same service during failure window

From: Jerry Champlin <jgc#absolute-performance.com>
Date: Sun, 1 Aug 2010 17:24:12 -0600


I just checked the archive and my original problem statement was not there - just the message with additional info.

Since sending this message, I have upgraded to 1.4.8 and tried disabling poll, epoll and sepoll. Non worked.

-Jerry

Hello haproxy list:

We are running haproxy on Ubuntu 10.04 LTS 32bit with the stock kernel and are experiencing an intermittent problem on the order of once per minute where a connection attempt to the backend servers is unable to be established and a 503 is sent to the end application or user browser. While the problem is happening with a given connection, other connections to the same application are executing successfully on the backend servers from both the perspective of haproxy and the backend applications. I am getting someone else to re-verify this, but we are not seeing the SYN packets without corresponding SYN/ACK to establish the connection to the backend servers and suspect that the packets are not being sent despite the system calls to the contrary or that we are looking for the wrong part of the conversation. We've been trying to track down the source of these problems for a few days, any assistance will be greatly appreciated.

-Jerry

Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60377
[31/Jul/2010:03:36:35.160]

ssbe-core internal_core/pn1 0/10067/-1/-1/10068 503 212 - - SC-- 16/16/15/7/+10 0/0 "POST /observations HTTP/1.1" Jul 31 03:36:45 localhost haproxy[24182]: 172.17.48.32:60379
[31/Jul/2010:03:36:35.252]

ssbe-core internal_core/pn2 0/10084/-1/-1/10085 503 212 - - SC-- 15/15/14/7/+10 0/0 "GET /clients/H206923/hosts HTTP/1.1" *Jul 31 03:36:55 localhost haproxy[24182]: 172.17.48.31:44757
[31/Jul/2010:03:36:45.498]

ssbe-core internal_core/pn1 0/10089/-1/-1/10090 503 212 - - SC-- 15/15/13/7/+10 0/0 "GET /hosts/45/metrics HTTP/1.1"*

Strace usage
lb1: ~ 16 -> strace -p 24182 -t -e trace=network -o haproxy.strace

Here are the relevant lines from the strace (we're only looking at the request for /hosts/45/metrics here, don't be fooled by the coincident 03:36:45 timestamps of the other two SC entries above).

03:36:45 accept(12, {sa_family=AF_INET, sin_port=htons(44757), sin_addr=inet_addr("172.17.48.31")}, [16]) = 34 03:36:45 setsockopt(34, SOL_TCP, TCP_NODELAY, [1], 4) = 0 03:36:45 recvfrom(34, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 8192, MSG_NOSIGNAL, NULL, NULL) = 428

03:36:45 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 35
03:36:45 setsockopt(35, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:45 connect(35, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr("172.17.48.32")}, 16) = -1 EINPROGRESS (Operation now in progress)
03:36:45 sendto(35, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
03:36:45 recvfrom(34, 0xb5f4b6, 15930, 16384, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
03:36:45 sendto(35, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)
03:36:46 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 40
03:36:46 setsockopt(40, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:46 connect(40, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr("172.17.48.32")}, 16) = -1 EINPROGRESS (Operation now in progress)
03:36:46 sendto(40, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
03:36:46 recvfrom(39, "GET /clients/H205604/hosts HTTP/"..., 8192, MSG_NOSIGNAL, NULL, NULL) = 893
03:36:46 sendto(40, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)
03:36:47 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 39
03:36:47 setsockopt(39, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:47 connect(39, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr("172.17.48.32")}, 16) = -1 EINPROGRESS (Operation now in progress)
03:36:47 sendto(39, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
03:36:47 sendto(39, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)
03:36:48 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 57
03:36:48 setsockopt(57, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:48 connect(57, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr("172.17.48.32")}, 16) = -1 EINPROGRESS (Operation now in progress)
03:36:48 sendto(57, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
03:36:48 sendto(57, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

[ repeat connection attempt once per second for 10 total seconds ...]

03:36:55 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 23
03:36:55 setsockopt(23, SOL_TCP, TCP_NODELAY, [1], 4) = 0
03:36:55 connect(23, {sa_family=AF_INET, sin_port=htons(44757),
sin_addr=inet_addr("172.17.48.31")}, 16) = -1 EINPROGRESS (Operation now in progress)
03:36:55 sendto(23, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
03:36:55 sendto(23, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)

03:36:55 sendto(34, "HTTP/1.0 503 Service Unavailable"..., 212, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 212 03:36:55 shutdown(34, 1 /* send */) = 0 03:36:55 sendto(5, "<134>Jul 31 03:36:55 haproxy[241"..., 200, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(514), sin_addr=inet_addr("127.0.0.1")}, 16) = 200

Jerry Champlin
Absolute Performance Inc.

--
Enabling businesses to deliver critical applications at lower cost and
higher value to their customers.
Received on 2010/08/02 01:24

This archive was generated by hypermail 2.2.0 : 2010/08/02 01:30 CEST