Re: Core dump when trying to reload haproxy

From: Willy Tarreau <w#1wt.eu>
Date: Tue, 11 Nov 2008 09:45:01 +0100


Hi Anders,

On Wed, Nov 05, 2008 at 12:58:43PM +0100, Anders Nordby wrote:
> Hi,
>
> When trying to reload haproxy like this:
>
> /usr/local/sbin/haproxy -p /var/run/haproxy.pid -f /usr/local/etc/haprox
> y.conf -sf `cat /var/run/haproxy.pid`
>
> Haproxy core-dumps on signal 6 (SIGABRT):
>
> pid 13458 (haproxy), uid 0: exited on signal 6 (core dumped)
>
> Backtrace of dump:
>
> root#lb:~# gdb -c haproxy.core /usr/local/sbin/haproxy
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "i386-marcel-freebsd"...
> Core was generated by `haproxy'.
> Program terminated with signal 6, Aborted.
> Reading symbols from /lib/libc.so.6...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /libexec/ld-elf.so.1...done.
> Loaded symbols for /libexec/ld-elf.so.1
> #0 0x28152ecb in kill () from /lib/libc.so.6
> (gdb) bt
> #0 0x28152ecb in kill () from /lib/libc.so.6
> #1 0x28152e68 in raise () from /lib/libc.so.6
> #2 0x28151b78 in abort () from /lib/libc.so.6
> #3 0x280eefdb in _UTF8_init () from /lib/libc.so.6
> #4 0xbfbfeda4 in ?? ()
> #5 0x28158dd3 in sys_nsig () from /lib/libc.so.6
> #6 0x28158cd3 in sys_nsig () from /lib/libc.so.6
> #7 0x28158d30 in sys_nsig () from /lib/libc.so.6
> #8 0x00000000 in ?? ()
> #9 0x28163d80 in ?? () from /lib/libc.so.6
> #10 0xbfbfeb08 in ?? ()
> #11 0x280ef009 in _UTF8_init () from /lib/libc.so.6
> #12 0x28163d80 in ?? () from /lib/libc.so.6
> #13 0x28179a24 in _nsyyin () from /lib/libc.so.6
> #14 0xbfbfebb8 in ?? ()
> #15 0x280efd69 in _UTF8_init () from /lib/libc.so.6
> #16 0x00000000 in ?? ()
> #17 0x00000007 in ?? ()
> #18 0xbfbfecd4 in ?? ()
> #19 0x0804ad5b in fd_delete (fd=672546176) at haproxy.c:1798
> Previous frame inner to this frame (corrupt stack?)
> (gdb) frame 19
> #19 0x0804ad5b in fd_delete (fd=672546176) at haproxy.c:1798
> 1798 close(fd);
> (gdb) print fd
> $1 = 672546176
>
> Any ideas?

No idea right now, this is pretty unexpected. What version is this ? From the line number, it looks like it is version 1.2.

If so, it seems that if an error happens while shutting down a listener's FD, the proxy enters the PR_STERROR state, which does not prevent the FD from being closed. I think that we should explicitly set the fd to -1 in case of error in pause_proxy() and avoid the call to fd_delete() in maintain_proxies() if the fd is -1.

If you can confirm your version, I can work on a patch which I will ask you to try.

> This is in FreeBSD.

I'm not surprised, because when doing a shutdown() on a listen socket, we have 3 different behaviours: linux, bsd and solaris. Hence, having one of those trigger a bug not seen on others has nothing surprising.

Regards,
Willy Received on 2008/11/11 09:45

This archive was generated by hypermail 2.2.0 : 2008/11/11 10:00 CET