Added latency in cloud

From: <haproxy#serverphorums.com>
Date: Mon, 18 Oct 2010 23:24:18 +0200


I currently have Apache acting as a proxy/load balancer in front of several backend application servers. It works pretty well, but as our traffic is ramping up I feel we need a more intelligent proxy solution. So I've come across haproxy as what seems to be the standard for extremely high traffic load balancing. Editors note: this is all set up in Rackspace cloud servers. So far I've been able to set up all functionality I need, (sticky sessions with cookie, and an ACL where certain requests will go to a different set of application servers).

The only problem I'm having is that the round-trip time for a request has gone up dramatically. When using apache or nginx, with both HTTP and HTTPS connections, there is an average RTT per request of 50ms (which I consider good). With haproxy I am getting with HTTP an average RTT of a bit over 100ms, and adding stunnel4 in front to terminate the HTTPS connections adds another ~50ms which brings the total upwards 175-225ms per request.

This is strange because when I prototyped this network layout at home with hardware (e.g. no virtual machines) haproxy had a higher minimum RTT than nginx, but it did not add anywhere near 50ms even when using stunnel in front of and again behind haproxy (to retain fully encrypted traffic to the backend servers) and under very high loads there was no significant increase as well. Granted that was all on a LAN, but when I see +50ms added to my requests in the cloud just for using haproxy instead of apache/nginx, that seems a little out of the ordinary. Another +50ms for terminating an HTTPS request with stunnel and routing it to a local static page? That's crazy.

My theory was that perhaps in the virtualized environment when you route a request through the localhost interface before sending out through your internal LAN IP (which is how haproxy/stunnel are passing on their connections), it somehow hits a virtualization buffer/queue that takes it out to the host OS network stack before being routed back as a localhost request. This could explain all the added latency every time I make a new localhost based route. But when I ping localhost on a cloud server, I get 0.000ms RTT. Same if I ping my eth0 or eth1 network interfaces. Which is the way it should be, no network added latency and in fact no measurable latency at all.

If I am doing a proxy configuration that allows SSL to the proxy and opens another SSL tunnel to a backend server adding 250-300ms latency penalty right off the top, that is just not acceptable. My site will look sluggish and unresponsive. (And I will need this full-through encryption to meet PCI compliance.)

I've tried every option related to keepalive and no keepalive...

This is my current haproxy.cfg

global

        log 127.0.0.1   local0
        maxconn 4096
        user haproxy
        group haproxy
        daemon
# for hatop
        stats socket /var/run/haproxy.sock mode 0600 level admin

defaults
        log     global
        mode   http
        option  httplog
        option  logasap
        option  redispatch
        retries 3
        maxconn 2000
        timeout connect 10s
        timeout client  60s
        timeout server  60s

        option  http-server-close
        option http-pretend-keepalive
        option forwardfor
        cookie BALANCEID

frontend http_proxy
        bind 0.0.0.0:80
        default_backend cluster1
        acl is_gfish path_beg /gfish_url
        use_backend cluster2 if is_gfish

backend cluster1
        server swww1 :80 cookie balancer.www1
        server swww2 :80 cookie balancer.www2

backend cluster2
        # replace the HTTP version in the request header (from 1.1 to 1.0) because glassfish will give a 0 byte response otherwise
        reqrep ^(.*)\ HTTP/[^\ ]+$ \1\ HTTP/1.0
        server gfish1 :8080 cookie balancer.www1
        server gfish2 :8080 cookie balancer.www2


I was also thinking that maybe the reqrep header mangling would add some latency, but I don't know how much and I can't test since I don't manage the glassfish servers on the back to try them with different settings. But since stunnel+haproxy is adding latency to all requests, and not just glassfish ones, then I don't really know. Any ideas for debugging this issue would be greatly appreciated.

---
posted at http://www.serverphorums.com
http://www.serverphorums.com/read.php?10,216799,216799#msg-216799
Received on 2010/10/18 23:24

This archive was generated by hypermail 2.2.0 : 2010/10/18 23:30 CEST