Hypermail

From: Michael Fortson <mfortson#gmail.com>
Date: Thu, 12 Feb 2009 17:19:11 -0800

making the iostat hopefully more legible:

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 580.32 116.43 2094.73 11.76 359234.08 1049.90 171.04

6.56 3.12 0.48 101.44
dm-0 0.00 0.00 2675.25 127.99 359234.08 1023.94 128.51

35.64 12.82 0.36 101.44
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 242.11 407.49 776.72 245.34 148764.37 5222.67 150.66

2.88 2.85 0.48 48.83
dm-0 0.00 0.00 1018.22 652.63 148660.73 5221.05 92.10

10.99 6.59 0.29 48.83
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 3.46 211.79 7.32 83.94 102.44 2365.85 27.05   0.05 0.51 0.37 3.39
dm-0 0.00 0.00 10.77 295.73 102.44 2365.85 8.05   0.12 0.40 0.11 3.37
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00   0.00 0.00 0.00 0.00

On Thu, Feb 12, 2009 at 5:14 PM, Michael Fortson <mfortson#gmail.com> wrote:
> I'm confused :)
>
> I stopped logging so much in haproxy, but I get the same thing if I
> grep the nginx logs on this server: haproxy's mongrel backend checks
> start failing. I've noticed it only happens when using httpchk (or at
> least it happens much, much more quickly).
>
> Here's an iostat I ran -- the first two are during the grep on the
> nginx logs; the last one is after I finished:
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
> avgrq-sz avgqu-sz await svctm %util
> sda 580.32 116.43 2094.73 11.76 359234.08 1049.90
> 171.04 6.56 3.12 0.48 101.44
> dm-0 0.00 0.00 2675.25 127.99 359234.08 1023.94
> 128.51 35.64 12.82 0.36 101.44
> dm-1 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
> avgrq-sz avgqu-sz await svctm %util
> sda 242.11 407.49 776.72 245.34 148764.37 5222.67
> 150.66 2.88 2.85 0.48 48.83
> dm-0 0.00 0.00 1018.22 652.63 148660.73 5221.05
> 92.10 10.99 6.59 0.29 48.83
> dm-1 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
> avgrq-sz avgqu-sz await svctm %util
> sda 3.46 211.79 7.32 83.94 102.44 2365.85
> 27.05 0.05 0.51 0.37 3.39
> dm-0 0.00 0.00 10.77 295.73 102.44 2365.85
> 8.05 0.12 0.40 0.11 3.37
> dm-1 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00
>
>
> Top showed cpu use around 16% for grep. There was still ram available.
> It's an 8 core machine.
>
>
> Here's a bit out of the haproxy config:
>
> global
> log 127.0.0.1 local0 notice #notice
> #log 127.0.0.1 local1 alert
>
> daemon
> nbproc 1
> maxconn 32000
> ulimit-n 65536
> # and uid, gid, daemon, maxconn...
>
> defaults
> # setup logging
> log global
> option tcplog
> option httplog
> option dontlognull
>
> # setup statistics pages
> stats enable
> stats uri /haproxy_stats
> stats refresh 10
> stats auth xxxxxxxxxxxxxx
>
>
> mode http
> retries 20
> option abortonclose
> option redispatch
>
>
> timeout connect 10 # time waiting for backend server to accept
> timeout client 80s # data from client (browser)
> timeout server 120s # data from server (mongrel)
>
> frontend rails *:7000
> # set connection limits
> maxconn 700
> default_backend mongrels
>
>
> backend mongrels
> fullconn 80
>
> option httpchk HEAD /videos/heartbeat
> option abortonclose
> balance leastconn
>
> reqrep ^([^\ ]*)\ //videos(.*) \1\ /videos\2
>
> server webapp01-101 10.2.0.3:8101 minconn 1 maxconn 5 check
> inter 1s fastinter 200 rise 1 fall 1
> server webapp01-102 10.2.0.3:8102 minconn 1 maxconn 5 check
> inter 1s fastinter 200 rise 1 fall 1
> ( etc, for 80 instances over 5 servers)
>
>
> I'm not really sure what to try next.
>
>
>
>
>
> On Wed, Feb 11, 2009 at 7:36 PM, John Lauro <john.lauro#covenanteyes.com>
wrote:
>> If you prefix the filename with a -, that tells syslog not to flush the
data
>> to disk immediately (which is only good in case of a kernel crash, as you
>> might not miss a log entry leading to it, ha!), and it is inefficient
>> forcing a sync every log entry when you have a log entry more at least
every
>> few seconds, and much more so when you have hundreds or thousands a
second.
>>
>> Note, this isn't exactly standard in all versions of syslog. Do a "man
>> syslog.conf" to make sure it applies to your OS, but generally is
standard
>> with Linux, and other operating systems have others ways to allow for
normal
>> buffering.
>>
>> So, instead of a line like:
>> *.debug;mail.none;authpriv.none;cron.none /var/log/messages
>> Use the following:
>> *.debug;mail.none;authpriv.none;cron.none
-/var/log/messages
>>
>>
>> How are you doing your log rotates? That seems strange that it would
cause
>> problems, as that should be a relatively light process, unless you are
doing
>> something like compressing the logs when you rotate. It should be just
be a
>> mv and then send a -HUP to syslogd. Of course, if a simple grep is
causing
>> issues, then it sounds like you definitely have some sort of resource
>> contention problem.
>>
>>
>>
>>> -----Original Message-----
>>> From: Michael Fortson [mailto:mfortson#gmail.com]
>>> Sent: Wednesday, February 11, 2009 8:38 PM
>>> To: John Lauro
>>> Cc: James Brady; haproxy#formilux.org
>>> Subject: Re: Reducing I/O load of logging
>>>
>>> Hi John,
>>>
>>> Can you explain what you mean by making sure there's a dash in the
>>> name? Is this for something like syslog via /var/log/messages?
>>>
>>> We've had some major issues when logrotate kicks in, knocking haproxy
>>> offline (it reports most of the back-end servers as down and starts
>>> throwing 503s). It's a busy site with 80 servers configured in the
>>> backend. Reducing logrotate to 3 hour intervals shortened the
>>> downtimes but it still happened. It does the same thing if I try to
>>> grep the log.
>>>
>>> I've had to turn off logrotate for now...
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 11, 2009 at 10:01 AM, John Lauro
>>> <john.lauro#covenanteyes.com> wrote:
>>> > It shouldn't be too hard for a machine to keep up with logging. How
>>> are you
>>> > logging? standard syslog? Make sure you have - in front of the
>>> filename?
>>> > How connections per second are you logging?
>>> >
>>> >
>>> >
>>> > Haven't done it with Haproxy, but have with other things that
>>> generate tons
>>> > of logs.
>>> >
>>> >
>>> >
>>> > what you could do is dump the logs (don't forget the - as part of the
>>> file
>>> > name) to /dev/shm/ (assuming linux), and then rotate the logs once a
>>> minute
>>> > and process them. That way, you will not have any disk I/O from the
>>> logs,
>>> > but would increase memory requirements.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > From: James Brady [mailto:james.colin.brady#gmail.com]
>>> > Sent: Wednesday, February 11, 2009 12:14 PM
>>> > To: haproxy#formilux.org
>>> > Subject: Reducing I/O load of logging
>>> >
>>> >
>>> >
>>> > The machine we run HAProxy on is approaching its limits in terms of
>>> disk I/O
>>> > due to our debug-level logging.
>>> >
>>> >
>>> >
>>> > The CPU is barely used, memory is no problem at all - the bottleneck
>>> will
>>> > soon be the rate at which we can log to disk. Because the machine is
>>> more
>>> > than adequate in all other areas, we'd really like to reduce the I/O
>>> load of
>>> > logging.
>>> >
>>> >
>>> >
>>> > We are using the debug log output to tell us the response time and
>>> status
>>> > code for various URLs - this is an essential task we can't do
>>> without.
>>> >
>>> >
>>> >
>>> > Is there any way to get this information without logging and
>>> > post-processing every request? Can HAProxy dump out averaged
>>> statistics like
>>> > this? Can we somehow reduce the I/O load by just logging the exact
>>> fields
>>> > we're interested in?
>>> >
>>> >
>>> >
>>> > Many thanks!
>>> >
>>> > James
>>
>>
>>
>
Received on 2009/02/13 02:19