RE: Reducing I/O load of logging

From: John Lauro <john.lauro#covenanteyes.com>
Date: Fri, 13 Feb 2009 08:04:50 -0500


It wouldn't hurt to put RHEL 5 or Centos 5 on the box instead of FC. FC is generally meant for desktops instead of servers.  

Your default ulimit -n is only 1024. Just make sure you raise that to match or exceed your Haproxy configuration prior to starting Haproxy. Even if that is a problem, it wouldn't explain why you have a problem when looking at the logs.  

The grep on /var/messages completed too quick to really catch much. That said, your SYS time is a little high, especially after it finished. For an 8 core box, only 12.5% would mean one core dedicated to the task, and it rose from 4 to 16. Given that it was counted as sys and not user, and generated little I/O, indicates it might be slow memory processing on the cache.  

What's uname -a give?  

If you have i386 (32-bit) listed instead of x86_64, you have too much memory in your box for a 32-bit kernel to handle well. (32-bit takes a big hit accessing >4gb). Running "swapoff -a" to disable swap will help... If you have a 32-bit kernel, it will waste too much time trying to decide what to keep in memory and what to swap, and swap really is pointless when it's address space is only 4GB and you have 8GB of RAM. If you have a 64-bit kernel, it shouldn't be an issue.  

If 32-bit kernel, run "swapoff -a" should help a lot (would help a little in 64-bit too, but not much), and/or reinstall with 64-bit os (assuming your CPUs are capable).  

If you don't have a 32-bit kernel, I am out of ideas that would explain the problem.    

From: Michael Fortson [mailto:mfortson#gmail.com] Sent: Thursday, February 12, 2009 11:23 PM To: John Lauro
Cc: haproxy#formilux.org
Subject: Re: Reducing I/O load of logging  

Sorry, forgot to answer the disk question. I *think* this has 6 10k rpm drives in a raid 10. It's a dell running FC7.    

On Thu, Feb 12, 2009 at 8:20 PM, Michael Fortson <mfortson#gmail.com> wrote:

Here's the result:

http://pastie.org/387928  

This box used to run everything (much of which has now been moved to other clusters). If I can't get it to behave it'll be doing nothing soon :)  

log/messages isn't large enough to trigger a misbehavior, but hopefully it'll show something... I can't really do it on the nginx log (which is massive) because I always have to kill that before enough backend tests flip over to cause a site outage.              

On Thu, Feb 12, 2009 at 6:44 PM, John Lauro <john.lauro#covenanteyes.com> wrote:

> I stopped logging so much in haproxy, but I get the same thing if I
> grep the nginx logs on this server: haproxy's mongrel backend checks
> start failing. I've noticed it only happens when using httpchk (or at
> least it happens much, much more quickly).
>
> Here's an iostat I ran -- the first two are during the grep on the
> nginx logs; the last one is after I finished:

The iostat looks ok.

Cut-n-past the following (or run from a script) so we can get a better idea of the box's general load and to see if they turn up anything:

cat /proc/interrupts
free
netstat --inet -n | awk '{ print $6 }' | sort | uniq -c ulimit -a
vmstat 1 10 & ( sleep 5 ; grep whatever /var/log/messages >/dev/null ) cat /proc/interrupts
echo lsof count `lsof | wc -l`

What type of disk subsystem do you have? Given how it chokes when doing a grep, it almost sounds like you might have a faulty driver. You do realize 8 cores is overkill for this, unless you are running other stuff on the box. The two checks on the interrupts is to see if something (especially disk I/O) is generating too many as we need to look at the difference.     Received on 2009/02/13 14:04

This archive was generated by hypermail 2.2.0 : 2009/02/13 15:15 CET