Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Fri, 13 Feb 2009 22:31:45 +0100

Hi guys,

On Fri, Feb 13, 2009 at 08:04:50AM -0500, John Lauro wrote:
> It wouldn't hurt to put RHEL 5 or Centos 5 on the box instead of FC. FC is
> generally meant for desktops instead of servers.

A customer has encountered a similar issue a few times on RHEL3. We noticed there was swap on the affected machines. It would happen after about 6 months of production. Haproxy would not receive any request for some long periods (several seconds) and we noticed this happened most frequently during network backups.

We had a few occurrences of the issue in the middle of the day while the admins were grepping errors in the logs. There was a lot of CPU usage, so at first we suspected scheduling issues. But when we noticed the swap usage, we figured that some of the process' structures might have been swapped, causing long delays when accessing data. Interestingly, restarting the process was enough to make the issue go away, since the memory usage was quite lower after a restart.

The reason for the swap was not a lack of RAM but a high usage of the disk cache pushing rarely used data into the swap.

And I agree with you John, a "swapoff -a" must absolutely be done. There's not even one valid reason to enable swap on a network server, all it can do is delay all operations and kill performance.

> Your default ulimit -n is only 1024. Just make sure you raise that to match
> or exceed your Haproxy configuration prior to starting Haproxy. Even if
> that is a problem, it wouldn't explain why you have a problem when looking
> at the logs.

It is not a problem if haproxy is started as root, as it adjusts the ulimit-n itself. And you're right, it would not cause side effects while looking at the logs.

> The grep on /var/messages completed too quick to really catch much. That
> said, your SYS time is a little high, especially after it finished. For an
> 8 core box, only 12.5% would mean one core dedicated to the task, and it
> rose from 4 to 16. Given that it was counted as sys and not user, and
> generated little I/O, indicates it might be slow memory processing on the
> cache.

Other I/O intensive workloads such as "wc -l /var/log/*" might help seeing if the swap usage suddenly grows.

Another test which might be done when the problem becomes reproducible, is to flush the caches and swapoff everything :

# echo 1 >/proc/sys/vm/drop_caches
# swapoff -a

Then redo the operation. If the problem does not happen anymore, it clearly indicates a poor tradeoff between swap and cache.

Regards,
Willy Received on 2009/02/13 22:31

Re: Reducing I/O load of logging