Problems with HAProxy, down servers and 503 errors

From: John Marrett <JMarrett#mediagrif.com>
Date: Fri, 23 Jan 2009 17:38:29 -0500


We have been using HAProxy in a production environment, without issue for a long period. Thanks for a wonderful product!

Unfortunately we recently encountered some issues as we have worked on the migration of one of our sites onto a new HAProxy based load balancing solution. We've started to notice issues related to persistent cookies, client requests and down backend servers.

This new application requires users remain on the same web server to avoid losing session information, which is not shared between backend servers. If we stop one of the backend servers (port 80 is no longer listening, HAProxy receives a RST packet from the server when sending a health check or client request) the clients who have a persistent session on this web server will continue to be sent to the until it is formally declared down (after two health checks fail, as controlled by the fall 2 parameter).

Here's where we run into our issue:

While HAProxy is receiving RST responses, it sends a 503 response to the client. We're not very eager to send this error response to the client. It appeared, from my reading of the documentation, that by setting "option redispatch" and "retries 3" (or greater than 1) we should get HAProxy to retry, and, in the event of explicit connection failure from the backend server, move on to the next functioning server on the final retry. This doesn't appear to be the case.

To make matters worse, when HAProxy throws a 503 response because of a RST it ignores the errorfile directive. If you have two servers, and stop one you will receive an extremely plain 503 error response. If no backend is available at all the errorfile directive functions properly, and the "pretty" error message is returned.

Ideally, in the event that a backend server is returning RSTs, we'd like to move to the next server. HAProxy could either do this immediately or buffer the request until it makes the final determination that the backend server is down and can send it to another server. If that isn't possible, we'd really like the 503 response returned to the client to be the one specified in the errorfile.

I'm going to investigate the possibility of creating a patch for this issue tonight, though if more experience hands could help, either with the patch, or with something obvious that I've missed in my configuration, I'd greatly appreciate it.

A few other notes:

Interestingly, even upon receiving a RST to a client request to the backend server, HAProxy doesn't consider the server as having failed a health check until it performs it's next health check. So, if you have a health checking interval of 10 seconds, if a customer makes a request 1 second after the first health check, with fall 2 set, it will take 29 seconds before the backend server is declared down and the client moved on to the next server.

The documentation for the track command could also be made a bit clearer, it took me a while (and a colleagues examination of the source) to determine that the <proxy> is another backend, in the case that you are trying to reference a server from a different backend. (Perhaps, depending on your configuration, it's not always a backend, but could be something else?).

Thank you for taking the time to read this novella :), configuration follows, thanks in advance for your help,

-JohnF

Configuration Details

We are running 1.3.15.7, with the following configuration (excerpt):

global
  stats socket /var/run/haproxy.stat

defaults
  balance roundrobin
  cookie SERVERID insert indirect
  option httpchk GET /index.html HTTP/1.0   timeout client 10m
  timeout server 10m
  timeout connect 3s

frontend http_frontend *:80
  mode http
  reqirep ^Host:([^:]*) Host:\1
  #Traffic matching ACLs
[...]

  acl host_qa_site_com hdr(host) -i qa.site.com   use_backend qa_site_com_http if host_qa_site_com frontend ssl_frontend *:81
  mode http
  reqirep ^Host:([^:]*) Host:\1
  #Traffic matching ACLs
[...]

  acl host_qa_site_com hdr(host) -i qa.site.com   use_backend qa2_site_com_ssl if host_qa_site_com
[...]

backend qa_site_com_http
  mode http
  errorfile 503 /etc/haproxy/errorfiles/503.http   option redispatch
  retries 3
  option httpchk GET /ut.asp
   server web_1 web1:80 cookie 3f06565277298ce80af6bbaab8c5b584 check inter 5100ms fall 2

   server web_2 web2:80 cookie bc62407de9ecf95bae662880b593a0d4 check inter 5100ms fall 2
backend qa_site_com_ssl
  mode http
  errorfile 503 /etc/haproxy/errorfiles/503.http   option redispatch
  retries 3
  option httpchk GET /ut.asp
   server web_1 web1:80 cookie 3f06565277298ce80af6bbaab8c5b584 track qa_site_com_http/web_1

   server web_2 web2:80 cookie bc62407de9ecf95bae662880b593a0d4 track qa_site_com_http/web_2 Received on 2009/01/23 23:38

This archive was generated by hypermail 2.2.0 : 2009/01/23 23:45 CET