Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Fri, 8 Oct 2010 06:28:11 +0200

On Thu, Oct 07, 2010 at 05:56:49PM -0400, Les Stroud wrote: (...)
> So, it sounds like, in this case that haproxy itself is the limiting factor.

in your case yes (or VMware is, depending on which of the two components is considered necessary).

> Tomcat outperforms in some ways because it has removed some of the vm dependency.

no, it's not about removing one of the VM dependency, it's that it does something different. While it's apparently very similar, when running inside a VM, the things it does not have to do (connect to the next hop) save it from one half of the work. If your tomcat had to work as a proxy and forward the connection to a next server, you'd experience the same issues. In fact, while I'm surprized with your tomcat numbers, I suspect that they should be a lot better when running native. Probably that you could make everything run on a single server.

> I would assume that some of the extreme optimization that you do
> specifically targets a hardware architecture and not the software
> architecture that vmware uses.

No, once again it's not a matter of optimization target, but workload. We're precisely saying that haproxy has to support around 100k packets/s in your environment, and while that's ridiculous for a hardware machine, it's huge for a VM.

> So, am I right to conclude that this is about as good as I can get,
> from a throughput perspective in this environment?

The highest score I've seen under VMware was 6500 connections/s on a core2duo at 2.66 GHz. The frequency here is much more important than the number of cores BTW. Also, with the optimizations I suggested, you should get slightly higher performance.

> If I wanted to get the throughput advantage that haproxy can deliver I would
> need to move the haproxy install to a physical box, correct?

yes, and it would run a lot better. If you're about to do that, I really suggest that you take the opportunity to try the same with tomcat to get numbers, because most likely your 4 tomcat instances on a single host will outperform the numbers you're getting now. Given your numbers, right now a VM is probably unable to make two tomcats run at full speed.

(...)
> I think that I did observe the clock issue that you are referring to.
> Now that you mention it, I have noticed top refresh quicker than it should
> (skip some seconds) from time to time.

It's more obvious in "vmstat 1". But the loop below can be frightening :

while sleep 1; do date; done

Sometimes you may see impressive jumps that many software don't like at all.

> I also noticed a related but slightly different behavior. During some of the tests, I was refreshing the stats page constantly (manually). I had a separate port listener setup for the stats. On some of the test runs (but not consistently), the test would ?pause?, but the stats page would continue to return in the normal quick fashion. The counters on the stats listener were increasing, but they were not changing for the backend. This would last a few seconds and then it would return to normal. I suppose that it could be a clock synchronization happening on the backend servers, but it seems unlikely to be happening simultaneously. Any thoughts?

If they don't change for the backend, it can mean that the communication between the backend and the servers was altered. Sometimes, it can be as simple as a TCP tuning problem. Please ensure that none of the haproxy or tomcat VMs have iptables loaded, and that a few important sysctls are set at least in the haproxy VM :

   net.ipv4.ip_local_port_range = 1024 65535
   net.ipv4.tcp_tw_reuse = 1
   net.ipv4.tcp_tw_recycle = 0

Other ones are important for the frontend sidee, but since you don't have any issue there yet, let's only focus on that for now.

> > Last thing while I'm at it, you should really remove that "nbproc 4" line,
> > it makes the debugging even harder as you never know what process gets what
> > request, nor which one gets the stats.
> >
>
> Got it. Out of curiosity, is there an optimal setting (or formula) for how many processes you should have for nbproc?

The optimal is 1. This parameter was added when we did not know how to support more than 256 file descriptors per process on Solaris. Since it showed better performance on CPU-bound machines at that time, it appeared in some docs and even in some tests I've been running myself. But it has so many downsides given the low gain that it's really recommended never to set it.

Maybe one day we'll have a master listening process, centralized stats, stickiness tables, counters and checks, and then it will make sense to play with it. But we're not there yet.

Regards,
Willy Received on 2010/10/08 06:28

Re: Performance Question