RE: NOSRV and retries

From: asim s <asim_s3000#hotmail.com>
Date: Mon, 30 Jun 2008 22:39:21 +0100

Thank you for taking the time to respond. I have read your post carefully and tried to apply some new settings to my haproxy config:

I set checks to 3s and maxconn 2, this came back with

127.0.0.1:40082 [30/Jun/2008:22:04:32.446] accounts accounts/acc1 5742/1/6221 30485 -- 68/68/68/2/+3 0/72

I assume +3 means 3 attempts.

After playing around with it more and more I've realised that reproducing the redispatch is very difficult with rails servers, I've only been able to get redispatch to work a few times in my testing. In the log I do not see the servers drop out due to failed checks, however the checks may affect the ability for a client to make a connection in maxconn 1 or a client connection may hinder checks as you've said.

It looks as though if the request is in the global queue then its not considered for redispatch? Is this the case?

On another note, I've noticed 503s with sQ and NOSRV followed by some long running requests being logged with cD (client timeout is 50s and the requests are 70-80s). I assume thats down to those requests that disconnect using all the available servers. Would that be a case of using 'option forceclose' or 'option httpclose' to resolve the problem ?

Thanks Again
Asim



> Date: Mon, 30 Jun 2008 21:58:56 +0200
> From: w#1wt.eu
> To: asim_s3000#hotmail.com
> CC: haproxy#formilux.org
> Subject: Re: NOSRV and retries
>
> On Mon, Jun 30, 2008 at 11:15:00AM +0100, asim s wrote:
>> 
>> http mode:
>> 
>> 127.0.0.1:58270 [30/Jun/2008:11:10:20.515] accounts accounts/ 0/5002/-1/-1/5002 503 212 - - sQ-- 73/73/73/0/0 0/72 "GET /current_accounts/all HTTP/1.0"

>
> Ah OK it's a 503, not a 502. I was worried!
> Here it means that it found no server for this request.
>
>> tcp mode:
>> 127.0.0.1:51308 [30/Jun/2008:11:09:17.072] accounts accounts/ 5001/-1/5001 0 sQ 73/73/73/0/0 0/70

>
> OK same here.
>
>> Here is my listen section:
>> 
>> listen  accounts 127.0.0.1:9001
>>         balance roundrobin
>>         server acc0 127.0.0.1:4300 check inter 500 rise 1 fall 2 maxconn 1
>>         server acc1 127.0.0.1:4301 check inter 500 rise 1 fall 2 maxconn 1
>> 
>> I'm using ab with 80 concurrent connections, timeout connect of 5 secs. Basically trying to push to the edge of connection timeout and seeing if it redispatches, which does not seem to happen.

>
> Due to the somewhat small check interval and fall timer, are you sure that you're
> simply not dropping packets ? If a SYN is lost during the connect stage, it is
> only retried 3 seconds later by the system. So maybe sometimes you have some
> failed servers because there's no server anymore ? Then it would make sense to
> return the 503 to pending requests because there's nobody to reply anymore.
>
> You should enable the stats for this. Check them while this happens, and check
> the server's status and uptimes.
>
>> I'm not using persistence cookies. 

>
> OK. So your request are always in the global queue, so that really means that
> *none* of your 2 servers are available.
>
> BTW, I'm thinking about something : you set "maxconn 1". Is it a Rails server ?
> If so, it cannot process more than one request at once, so when it is processing
> a request, it cannot process a health-check! If this is your situation you need
> to increase the check interval, and/or the fall counter in order for the server
> to get a chance to sometimes respond to a check.
>
> Regards,
> Willy
>

All new Live Search at Live.com

http://clk.atdmt.com/UKM/go/msnnkmgl0010000006ukm/direct/01/ Received on 2008/06/30 23:39

This archive was generated by hypermail 2.2.0 : 2008/06/30 23:46 CEST