Re: How does http_find_header() work?

From: Roy Smith <roy#s7labs.com>
Date: Thu, 31 Mar 2011 08:10:04 -0400


My intent was just to have a unique string that could be searched for in the logs. Building it by smashing together the hostid, pid, timestamp, etc, was just a fast hack to get something unique. I made one attempt to compact the string by running it through md5, but then I realized that the more bells and whistles I hung on this, the less portable it would be (i.e. not everybody might have the the same md5 API I was using).

For my purpose, all I need is something that's unique. If anything, rather than making it human readable, I think a better way to approach this would be to make it more compact, by doing some kind of message digest, and perhaps even printing it in some encoding more compact than hex (say, base64).

I didn't really write a specification, but I think a critical part of the spec would be that the only guarantee about the id string is that it's unique, and that the specific format is subject to change without warning. That would discourage people from trying to use it to embed whatever information seems useful at the moment. The right way to recover additional information about the request is to use the id to correlate across logs. For example, when we first discussed this in January, it was suggested (IIRC) that we might want to embed the IP address where the request came from. While I can see how that might be useful, that information is already available. If you see something in a downstream log that interests you and you want to know what IP it came from, use the unique id to find the corresponding entry in the front-end haproxy log, and the IP address will be there.

On Mar 31, 2011, at 4:43 AM, Bart van der Schans wrote:

> Hi,
> 
> Thx Roxy, this would be very useful to have. I'm just wondering about
> the id format. If all the "fields" correspond to something meaningful,
> like host_id, pid, timestamp, etcetera, would it make sense to have
> them in a more human readable format?
> 
> Regards,
> Bart
> 
> On Thu, Mar 31, 2011 at 4:30 AM, Roy Smith <roy#s7labs.com> wrote:

>> Willy,
>>
>> This turned out to be surprisingly straight-forward. Patch attached (against the 1.4.11 sources).
>>
>> To enable generation of the X-Unique-Id headers, you add "unique-id" to a listen stanza in the config file. This doesn't make any sense unless you're in http mode (although my code doesn't check for that, which could reasonably considered a bug). What this does is adds a header that looks like:
>>
>> X-Unique-Id: CB0A6819.4B7D.4D93DFDB.C69B.10
>>
>> to each incoming request. This gets done before the header capture processing happens, so you can use the existing "capture request header" to log the newly added headers. There's nothing magic about the format of the Id code. In the current version, it's just a mashup of the hostid, haproxy pid, a timestamp, and a sequence number. The sequence numbers count up to 1000, and then the leading part is regenerated. I'm sure there's better schemes that could be used.
>>
>> Here's a sample config stanza:
>>
>> listen test-nodes 0.0.0.0:19199
>> mode http
>> option httplog
>> balance leastconn
>> capture request header X-Unique-Id len 64
>> unique-id
>> server localhost localhost:9199 maxconn 8 weight 10 check inter 60s fastinter 60s rise 2
>>
>> If there is already a X-Unique-Id header on the incoming request, it is left untouched.
>>
>> A little documentation:
>>
>> We've got (a probably very typical) web application which consists of many moving parts mashed together. In our case, it's an haproxy front end, an nginx layer (which does SSL conversion and some static file serving), Apache/PHP for the main application logic, and a number of ancillary processes which the PHP code talks to over HTTP (possibly with more haproxies in the middle). Plus mongodb. Each of these moving parts generates a log file, but it's near impossible to correlate entries across the various logs.
>>
>> To fix the problem, we're going to use haproxy to assign every incoming request a unique id. All the various bits and pieces will log that id in their own log files, and pass it along in the HTTP requests they make to other services, which in turn will log it. We're not yet sure how to deal with mongodb, but even if we can't get it to log our ids, we'll still have a very powerful tool for looking at overall performance through the entire application suite.
>>
>> Thanks so much for the assistance you provided, not to mention making haproxy available in the first place. Is there any possibility you could pick this up and integrate it into a future version of haproxy? Right now, we're maintaining this in a private fork, but I'd prefer not to have to do that. I suspect this may also be useful for other people. If there's any modifications I could make which would help you, please let me know.
>>
>>
>>
--
Roy Smith
roy.smith#s7labs.com
Received on 2011/03/31 14:10

This archive was generated by hypermail 2.2.0 : 2011/03/31 14:15 CEST