On Fri, Dec 18, 2009 at 05:00:38PM -0800, Joe Torsitano wrote:
> Hi Willy,
> What's strange is traffic still appears normal, and is, for probably at
> least 99% of the visitors. Logged traffic remains about normal (hundreds of
> thousands of visitors a day). I just get a few e-mails asking why the site
> has been down for days or when it will be back. But I cannot recreate the
> problem. And I know there are probably people who just don't e-mail and,
> unfortunately, don't come back.
yes, very possible unfortunately.
> Here is the config file with the IP addresses changed, pretty much the
> default that comes with it...
A few questions that come to mind :
- What version are you running by the way (haproxy -vv) ? Several cases of truncated responses were observed between 1.3.16 and 1.3.18, and sometimes a 502 response could be sent if the server closed too fast before 1.3.19. So please endure you're on 1.3.22. More info here about the bugs in your version :
I'm also thinking about something else. You said that when you don't go through haproxy you don't get any complaint. Are your systems configured similarly ? I mean, the very low rate of problems could very well be caused by some TCP settings which are incompatible with a minority of users running behind a buggy router/firewall.
In order to check this, you could run the following command on each server (including the one with haproxy) :
$ sysctl -a | fgrep net.ipv4.tcp
Please verify if tcp_ecn and tcp_window_scaling are at the same values. If not, start by setting tcp_ecn to 0 on the haproxy server. Then later you can try to similarly disable tcp_window_scaling, though this one is far less likely because it's enabled almost everywhere.
Also check with "ip route" and "ip address" on all servers if you don't see a different MTU value on the default route. It's possible that a small part of your clients are still running misconfigured a PPPoE ADSL line and can't send/receive full packets. There are still some large sites who deal with that by setting their MTU to 1492 or even 1452 on the external interface. But this is less likely.
Willy Received on 2009/12/19 08:04
This archive was generated by hypermail 2.2.0 : 2009/12/19 08:15 CET