Re: [PATCH] : Count retries and redispatches also for servers + extend logs + %d->%u cleanup

From: Willy Tarreau <w#1wt.eu>
Date: Mon, 3 Dec 2007 22:35:33 +0100


On Mon, Dec 03, 2007 at 10:05:33PM +0100, Krzysztof Oledzki wrote:
> >>I also extended log to add how many retries are still possible and fixed
> >
> >Don't you think it would be more useful to count the number of retries
> >performed before getting the connection processed ? I think that putting
> >the remaining retries in the logs does not bring much information as soon
> >as you update your configuration. So most probably, you should log
> >(be->retries - sv->conn_retries).
>
> Indeed. I'll fix this. I also thinking about marking it somehow in logs
> when a redispatch forced by retries==0 has occurred. What is your opinion?

logs are becoming hard to read. In the 1.2 days, I could always tell people what was in the logs. Right now I have to think a lot before speaking. The human brain cannot discernate more than 5 patterns at once, and I'd like the numbers to stay grouped by 5 or less for this reason. Maybe you could simply log -1 for retries when a redispatch occurred (BTW, I believe it is how it happens in the code). Because the redispatch may only work on the last retry. Hmmm now that I'm thinking about it, -1 is not a very good solution because people are used to focus on -1 to detect serious problems. Ah, there was a trick I used to log the total time with "logasap": it prints "+xxx" instead of "xxx". Maybe writing out "+2" for "2 retries + redispatch" would be an acceptable solution to differenciate it from "2" which means "2 retries" ? The advantage of prefixing a number with the "+" sign is that most tools still accept it, and that it's easy to catch at the eye.

> >Also, is it really useful in your situation to see the number of retries ?
> >I'm not against merging the patch, I'm just wondering if that uncovers a
> >lot of problems or just one in while.
>
> Rather one in while: from a global [http] statistics you can obtain an
> information about final situation, but you don't known how does error
> distribution look like: was it a single situation caused by a massive
> traffic (so you may think about tuning knobs like minconn/maxconn) or
> maybe it is rather a constant tendency independent from things like number
> of connections, etc.

OK.

> >>Finally, I changed %d -> %u for retries/redispatches as those variables
> >>are declared as unsigned. Similar cleanup also applies to other variables,
> >>will send additional patch if that is OK?
> >
> >I remember that a long time ago I stopped using one of the printf formats
> >which was not found on all unixes, and I think it was %u, but I'm not sure,
> >maybe I'm confusing with %llu and relatives. At least from the man pages,
> >%u is OK on solaris and openbsd. Should be fine anywhere else. If so, yes
> >you can update the fields you identify.
>
> OK.

Best regards,
Willy Received on 2007/12/03 22:35

This archive was generated by hypermail 2.2.0 : 2007/12/03 23:30 CET