On Don 27.12.2007 18:12, Willy Tarreau wrote:
>On Thu, Dec 27, 2007 at 05:25:34PM +0100, Aleksandar Lazic wrote:
>
>> A new version is available at http://none.at/test_hash_dis_v3.tgz
>>
>> TABLESIZE is 2048
>> Starting wt_hash
>> time=1485 ms
>> Table size is 130074
>> tot=130074, used=2048, min=37, max=109, avg=63.512695, sd=8.797574
>> Starting SuperFastHash2
>> time=1544 ms
>> Table size is 130074
>> tot=130074, used=2048, min=40, max=91, avg=63.512695, sd=7.914943
>> Starting SuperFastHash
>> time=1549 ms
>> Table size is 130074
>> tot=130074, used=2048, min=40, max=91, avg=63.512695, sd=7.914943
>> Starting haproxy_uri_hash
>> time=1542 ms
>> Table size is 130074
>> tot=130074, used=2048, min=30, max=1096, avg=63.512695, sd=46.208013
>> Starting haproxy_server_hash
>> time=2309 ms
>> Table size is 130074
>> tot=130074, used=1024, min=0, max=313, avg=63.512695, sd=65.571264
>> Starting hashpjw
>> time=2866 ms
>> Table size is 130074
>> tot=130074, used=2025, min=0, max=621, avg=63.512695, sd=96.917314
>> Starting hash_djbx33
>> time=1560 ms
>> Table size is 130074
>> tot=130074, used=2048, min=36, max=87, avg=63.512695, sd=7.846105
>> Starting bernstein
>> time=1562 ms
>> Table size is 130074
>> tot=130074, used=2048, min=39, max=91, avg=63.512695, sd=8.020898
>> Starting fnv_32a_str
>> time=1589 ms
>> Table size is 130074
>> tot=130074, used=2048, min=41, max=92, avg=63.512695, sd=7.707788
>> Starting hashword
>> time=1629 ms
>> Table size is 130074
>> tot=130074, used=2048, min=36, max=93, avg=63.512695, sd=7.933490
>> Starting kr_hash
>> time=1570 ms
>> Table size is 130074
>> tot=130074, used=2048, min=33, max=96, avg=63.512695, sd=8.592775
>> Starting sax_hash
>> time=1587 ms
>> Table size is 130074
>> tot=130074, used=2048, min=40, max=90, avg=63.512695, sd=7.955678
>> Starting fnv_hash
>> time=1623 ms
>> Table size is 130074
>> tot=130074, used=2048, min=39, max=93, avg=63.512695, sd=8.200957
>> Starting oat_hash
>> time=1617 ms
>> Table size is 130074
>> tot=130074, used=2048, min=39, max=95, avg=63.512695, sd=7.757672
>>
>> And wow there are some very bad results from my point of view.
>> Please say that I have overseen something ;-(
>
>No, there's no reason for anything to be wrong. There is simply no
>universal hash algorithm. It depends on the data set, the number of
>buckets, etc...
ACK.
>For instance, when I designed wt_hash, it was better distributed than
>SuperFastHash, slightly less than djbx33. For the speed, it was also
>between both. In this test, we see something different: it is faster
>than both, but shows a worse distribution than both. This is simply
>because we have used very different data sets.
Ok.
>What is important is that the spreadth of the number of entries is
>relatively low around the average. Except for those obviously bad
>above, all are between avg/2 and avg*2.
Hm, I'am not the expert in hash-function but could this be better? For example that a hashfunction generate a more or less equal min=max, could this be the 'perfect hashing' issue?
>What worries me a lot is haproxy_uri_hash. It reported quite correct
>results when the code was contributed (as the comment in backend.h
>tells it), but shows very poor here. Maybe you have a lot of identical
>URIs with just the query string different ?
Yep and it's a real world example.
Some application-server, -developer go this way.
e.g.:
Bea Weblogic:
http://www.bea.com/framework.jsp?CNT=homepage_main.jsp&FP=/content
http://www.bea.com/framework.jsp?CNT=index.htm&FP=/content/services&WT.ac=topnav_services
IBM Websphere
http://www-306.ibm.com/software/websphere/ http://www-306.ibm.com/software/info1/websphere/index.jsp?tab=products/apptransaction http://www-306.ibm.com/software/info1/websphere/index.jsp?tab=landings/appintegration
I think we should reduce the length of the used URL not to stop before the query string, imho.
>BTW, you should ensure that your inputs are unique (sort -u).
Of course ;-)
>Oh, I'm thinking about something else. I've not looked in detail at the
>code to see how the hash result is used, but is it possible that upper
>bits are simply lost in the reduction to 11 bits ? For instance, if a
>hash function returns 32 bits, is the value reduce like (val % 2048) or
>are the bits mixed, like this :
>
> (val ^ (val >> 11) ^ (val >> 21)) & 2047
>
>I ask this because when a hash is good on its output range, it is not
>necessarily the case when the output is truncated. As much as possible,
>all bits should be used equally, just like you would do with a
>pseudo-random generator. Of course, this is not very easy when
>converting 32 bits to 11 bits, but you get the idea :-)
Ok, this I don't know :-(
Do you mean the haproxy_uri_hash fuction with the 11 bits?!
>What made me wonder about this is that if you look, wt_hash is probably
>distributed like a gaussian bell perfectly centered around the average,
>since max/avg = avg/min = 1.716, while tests had shown a very low
>collision rate (<0.01%). Gaussian distributions are generally
>encountered when combining some data with themselves, or when
>truncating them.
Hm, ok what this mean, sorry I'am not so fit with statistiks :-(
Greetings
Aleks Received on 2007/12/28 11:16
This archive was generated by hypermail 2.2.0 : 2007/12/28 11:30 CET