Re: hash functions distribution test

From: Aleksandar Lazic <al-haproxy#none.at>
Date: Thu, 27 Dec 2007 08:30:41 +0100


Hi Willy,

On Don 27.12.2007 07:53, Willy Tarreau wrote:
>Hi Aleks,
>
>On Thu, Dec 27, 2007 at 02:24:43AM +0100, Aleksandar Lazic wrote:
>> Hi all ;-),
>>
>> today I have finished the first attempt to see which hash function
>> distribute well in a hashtable ;-)
>
>OK that's interesting. I've modified your code to report statistics
>instead of the entries values. The number of buckets used, the min and
>max entries per bucket, the avg entries/bucket and the standard
>deviation on the number of entries per bucket are reported. BTW, I had
>to remove 3 functions from the test because they did never complete
>after several minutes :
> - haproxy_uri_hash,
> - haproxy_server_hash,
> - kr_hash
>
>I suspect that there are some collisions which upset the chtbl_insert
>function.
>
>Here's what it reports for an old squid access_log :

Thanks ;-).
Please can you bzip -9 it and tell me, offlist, where I can get it, so that I can investigate to debug this issues.

>willy#pcw:hash_div$ ./test_hash_dis

[snipp]

>We also notice that the time is almost always the same (4 seconds),
>except for the only bad function (hash_pjw) which has not used all of
>its buckets. I really suspect that chtbl_insert() is suboptimal and
>that all the time is lost in it instead of in the hash functions
>themselves, because on the same machine, the hash functions all run
>between 1 and 10 millions hashes per second, while here we're more
>around 50k hashes/s.

I will look into it.

>FNV shows a slightly better distribution and deviation than others, and
>according to my memory, it was about as fast as wt_hash on x86 but
>slower on some achitectures where gcc could not optimize well the
>multiply. However, I think it was easily attackable due to the fact
>that it relies on a simple multiply followed by a xor ; basically if
>you pass to it values in this form,

What this mean, don't use FNV or don't use it with more then N values?!

>I attach the patch to report the stats.

As always thanks I will add it and run the tests again ;-) Thank you very much for the *speedlight* response.

Greetings

Aleks Received on 2007/12/27 08:30

This archive was generated by hypermail 2.2.0 : 2007/12/27 08:45 CET