Re: high cpu utilization

From: Marc Breslow <marc#mbreslow.net>
Date: Sat, 16 Feb 2008 07:41:21 -0500


Thanks Willy. I wanted to wait until a slower time to run strace as it sounded like it could interrupt or slow down our services. HAProxy is running at 50% CPU now with roughly 275 HTTP sessions and 100 TCP sessions.

I generated the trace file. I searched for "refused" and found things like 07:26:30.697634 send(372, "HEAD /staging.online HTTP/1.0\r\n\r"..., 33, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 ECONNREFUSED (Connection refused) <0.000010>

Is that an example of something that takes a lot of CPU for haproxy? Maybe we're not using haproxy in the most effective way. We have a couple of spare web server instances in our cluster that are usually not online. The way that we bring them online is by creating the file that haproxy uses to see if it's up or down. So every 2.5s it's checking those two servers and finding their down.

We also have an entire duplicate haproxy configuration for our testing site which we'll add 1 or 2 servers into at any time. We add the servers in by touching a different file on the web server that haproxy is constantly polling for. 5 or 6 out of 6 of these instances are usually unavailable. Is that more overhead for haproxy then if the servers are always available?

What else can I look for in the trace file?

Thanks,
---Marc

On Feb 14, 2008 1:28 AM, Willy Tarreau <w#1wt.eu> wrote:

> On Wed, Feb 13, 2008 at 09:14:24PM -0700, Dan Zubey wrote:
> > > From top it looks like it's almost all system - although I'm not sure
> > > how to get that breakdown at the process level.
> > >
> > > top - 16:58:29 up 17 days, 8:09, 1 user, load average: 1.03, 1.24,
> 1.31
> > > Tasks: 85 total, 2 running, 83 sleeping, 0 stopped, 0 zombie
> > > Cpu0 : 1.3% us, 36.2% sy, 0.0% ni, 60.8% id, 0.0% wa, 1.7% hi,
> 0.0% si
> > > Cpu1 : 2.7% us, 60.5% sy, 0.0% ni, 36.9% id, 0.0% wa, 0.0% hi,
> 0.0% si
> > > Mem: 2074716k total, 1307116k used, 767600k free, 122528k
> buffers
> > > Swap: 8385912k total, 0k used, 8385912k free, 617172k
> cached
> > >
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > > 18072 root 25 0 141m 138m 408 S 100 6.8 1269:44 haproxy
> > >
> >
> > Someone correct me if I'm wrong, but that looks like a healthy system to
> > me. Yes, HAProxy is pegged at 100% cpu, but the two processors are
> > idling at on the average 50%. You're only running 50% full load right
> now.
>
> Yes, but the system/user ratio is pretty high. Ususally, I see about
> 85/15,
> and here we have 95/5 which looks more like there is a high bandwidth
> involved and the system is tired of forwarding.
>
> > You could try to optimize your config file so that haproxy does less
> > work with every incoming request, that'll save on the system call side.
>
> Marc, do you have ip_conntrack loaded on this machine ? I'm asking this
> because ip_conntrack is a disaster on a proxy since it has to create 2
> connections for each proxied connection. It often consumes about 50% of
> the power for itself in such conditions. Also, if it is loaded, it is
> possible that you have a low hashsize value, making the system work a
> lot for each packet transmitted.
>
> If you have nothing such, you should run strace for a few seconds on the
> process in order to see what it's doing. Maybe the system is regularly
> refusing to establish connections or things like this which need a lot of
> retries ?
>
> # strace -tt -T -o haproxy.trace -p 18072
> [ wait 5 seconds ]
> Ctrl-C
>
> Be careful, the trace may be big, and it will slow down haproxy during the
> capture, reason for not letting it run more than a few seconds.
>
> Best regards,
> Willy
>
>
Received on 2008/02/16 13:41

This archive was generated by hypermail 2.2.0 : 2008/02/16 13:46 CET