Hi Willy,
On Son 30.12.2007 08:50, Willy Tarreau wrote:
>Hi Aleks,
>
>On Fri, Dec 28, 2007 at 11:16:13AM +0100, Aleksandar Lazic wrote:
>> >What is important is that the spreadth of the number of entries is
>> >relatively low around the average. Except for those obviously bad
>> >above, all are between avg/2 and avg*2.
>>
>> Hm, I'am not the expert in hash-function but could this be better?
>
>yes, the smallest the distance between min and max, the smoothest the
>distribution. The most important of course is to have the lowest MAX,
>but monitoring both values generally tell you what *may* happen with
>other data sets.
Ok I have know added 3 output values:
I called it "Gaussian bell" ;-)
gb_max=
gb_min=
and the diff between max/avg, avg/min
diff=
The new version is know avalilabe at http://none.at/test_hash_dis_v4.tgz
From the datas below I have the following ranking;-)
SuperFastHash2
APHash
For urls ;-)
>> For example that a hashfunction generate a more or less equal
>> min=max, could this be the 'perfect hashing' issue?
>
>yes, could be. But once again, this will change with other data :-)
Of course ;-)
That's the reason why anybody can download an test it ;-)
>> I think we should reduce the length of the used URL not to stop
>> before the query string, imho.
>
>Not necessarily, it depends on what you want to achieve. For some
>algorithms, it's better to use the full string, for others it's better
>to truncate.
Ok an optin will be nice like:
use [no] query_string
or so for haproxy ;-)
>> Do you mean the haproxy_uri_hash fuction with the 11 bits?!
>
>No I mean that most hash functions report 32-bit values, which means
>that the result is distributed between 0 and 2^32-1. But as the
>functions rely on bit manipulation, there is generally little relation
>between high and low bits, so if you take them separately, they will
>most likely not be much randomized. It's just the global set of bits
>which constitutes a good random value.
>
>If you only use 2048 values by applying a modulo on the value, you end
>up only selecting the lowest 11 bits of the result, completely ignoring
>the higher bits. There are various methods to mix all the bits before
>returning, but all of them consume a lot of CPU cycles.
>
>The alternative well-known solution consists in using a prime integer
>for the hash size. Instead of using 2048, you can either use the
>immediate lower prime integer (2039) or the next one (2053). You can
>set this in TABLESIZE, and carefully remove TABLEMASK and TABLESHIFT
>which are not used anywhere else.
Done, thanks for hint ;-)
>BTW, I've just checked and I can confirm that chtbl_insert() does a
>simple modulo :
>
> bucket = htbl->h(data) % htbl->buckets;
Ok.
>> Hm, ok what this mean, sorry I'am not so fit with statistiks :-(
>
>This just indicates that there is a relation between all returned
>values, and that some of them are returned more often than others. This
>is most likely accentuated by the modulo which eliminates high bits.
Well, that's sounds bad. The most 'test suits' for hashtables use the modulo to find the bucket, what is a better way from your point of view?!
>For your convenience, I've tried to apply %2048 and %2039 to the logs
>of the requests coming to my site (25k requests, all in the same dirs
>because there aren't many things on my site). Here's the diff between
>modulo 2048 and 2039. You will notice that except for sax_hash, djb33x,
>bernstein and fnv_hash, the distribution has narrowed for others (sd
>reduced). Also, note how the poor haproxy_server_hash and hashpjw are
>reduced by this simple operation!
Thanks I have also run the tests now with 2039 and got the follwoing resuslts, with some new hash functions ;-)
I have sorted it by time.
--- TABLESIZE is 2039 Starting haproxy_server_hash time=5620 ms Table size is 244359 tot=244359, used=2039, min=83, max=166, avg=119.843, sd=10.993, gb_max=1.385, gb_min=1.444 diff=0.059 Starting JSHash time=5681 ms Table size is 244359 tot=244359, used=2039, min=84, max=156, avg=119.843, sd=11.221, gb_max=1.302, gb_min=1.427 diff=0.125 Starting SuperFastHash2 time=5694 ms Table size is 244359 tot=244359, used=2039, min=86, max=158, avg=119.843, sd=10.475, gb_max=1.318, gb_min=1.394 diff=0.075 Starting APHash time=5705 ms Table size is 244359 tot=244359, used=2039, min=85, max=156, avg=119.843, sd=10.741, gb_max=1.302, gb_min=1.410 diff=0.108 Starting SuperFastHash time=5707 ms Table size is 244359 tot=244359, used=2039, min=86, max=158, avg=119.843, sd=10.475, gb_max=1.318, gb_min=1.394 diff=0.075 Starting kr_hash time=5716 ms Table size is 244359 tot=244359, used=2039, min=82, max=158, avg=119.843, sd=11.298, gb_max=1.318, gb_min=1.461 diff=0.143 Starting hash_djbx33 time=5725 ms Table size is 244359 tot=244359, used=2039, min=76, max=156, avg=119.843, sd=10.753, gb_max=1.302, gb_min=1.577 diff=0.275 Starting bernstein time=5727 ms Table size is 244359 tot=244359, used=2039, min=87, max=158, avg=119.843, sd=10.880, gb_max=1.318, gb_min=1.378 diff=0.059 Starting sax_hash time=5744 ms Table size is 244359 tot=244359, used=2039, min=84, max=155, avg=119.843, sd=10.792, gb_max=1.293, gb_min=1.427 diff=0.133 Starting haproxy_uri_hash time=5755 ms Table size is 244359 tot=244359, used=2039, min=84, max=159, avg=119.843, sd=10.810, gb_max=1.327, gb_min=1.427 diff=0.100 Starting RSHash time=5779 ms Table size is 244359 tot=244359, used=2039, min=85, max=158, avg=119.843, sd=11.003, gb_max=1.318, gb_min=1.410 diff=0.092 Starting oat_hash time=5764 ms Table size is 244359 tot=244359, used=2039, min=87, max=159, avg=119.843, sd=10.931, gb_max=1.327, gb_min=1.378 diff=0.051 Starting fnv_32a_str time=5784 ms Table size is 244359 tot=244359, used=2039, min=85, max=163, avg=119.843, sd=10.884, gb_max=1.360, gb_min=1.410 diff=0.050 Starting PJWHash time=5789 ms Table size is 244359 tot=244359, used=2039, min=86, max=159, avg=119.843, sd=11.074, gb_max=1.327, gb_min=1.394 diff=0.067 Starting ELFHash time=5789 ms Table size is 244359 tot=244359, used=2039, min=87, max=164, avg=119.843, sd=10.673, gb_max=1.368, gb_min=1.378 diff=0.009 Starting wt_hash time=5796 ms Table size is 244359 tot=244359, used=2039, min=82, max=162, avg=119.843, sd=11.267, gb_max=1.352, gb_min=1.461 diff=0.110 Starting hashword time=5815 ms Table size is 244359 tot=244359, used=2039, min=85, max=153, avg=119.843, sd=10.797, gb_max=1.277, gb_min=1.410 diff=0.133 Starting fnv_hash time=5820 ms Table size is 244359 tot=244359, used=2039, min=79, max=159, avg=119.843, sd=10.846, gb_max=1.327, gb_min=1.517 diff=0.190 Starting hashpjw time=5837 ms Table size is 244359 tot=244359, used=2039, min=90, max=157, avg=119.843, sd=10.828, gb_max=1.310, gb_min=1.332 diff=0.022 --- Now sorted it by sd. --- TABLESIZE is 2039 Starting SuperFastHash2 time=5694 ms Table size is 244359 tot=244359, used=2039, min=86, max=158, avg=119.843, sd=10.475, gb_max=1.318, gb_min=1.394 diff=0.075 Starting SuperFastHash time=5707 ms Table size is 244359 tot=244359, used=2039, min=86, max=158, avg=119.843, sd=10.475, gb_max=1.318, gb_min=1.394 diff=0.075 Starting ELFHash time=5789 ms Table size is 244359 tot=244359, used=2039, min=87, max=164, avg=119.843, sd=10.673, gb_max=1.368, gb_min=1.378 diff=0.009 Starting APHash time=5705 ms Table size is 244359 tot=244359, used=2039, min=85, max=156, avg=119.843, sd=10.741, gb_max=1.302, gb_min=1.410 diff=0.108 Starting hash_djbx33 time=5725 ms Table size is 244359 tot=244359, used=2039, min=76, max=156, avg=119.843, sd=10.753, gb_max=1.302, gb_min=1.577 diff=0.275 Starting sax_hash time=5744 ms Table size is 244359 tot=244359, used=2039, min=84, max=155, avg=119.843, sd=10.792, gb_max=1.293, gb_min=1.427 diff=0.133 Starting hashword time=5815 ms Table size is 244359 tot=244359, used=2039, min=85, max=153, avg=119.843, sd=10.797, gb_max=1.277, gb_min=1.410 diff=0.133 Starting haproxy_uri_hash time=5755 ms Table size is 244359 tot=244359, used=2039, min=84, max=159, avg=119.843, sd=10.810, gb_max=1.327, gb_min=1.427 diff=0.100 Starting hashpjw time=5837 ms Table size is 244359 tot=244359, used=2039, min=90, max=157, avg=119.843, sd=10.828, gb_max=1.310, gb_min=1.332 diff=0.022 Starting fnv_hash time=5820 ms Table size is 244359 tot=244359, used=2039, min=79, max=159, avg=119.843, sd=10.846, gb_max=1.327, gb_min=1.517 diff=0.190 Starting bernstein time=5727 ms Table size is 244359 tot=244359, used=2039, min=87, max=158, avg=119.843, sd=10.880, gb_max=1.318, gb_min=1.378 diff=0.059 Starting fnv_32a_str time=5784 ms Table size is 244359 tot=244359, used=2039, min=85, max=163, avg=119.843, sd=10.884, gb_max=1.360, gb_min=1.410 diff=0.050 Starting oat_hash time=5764 ms Table size is 244359 tot=244359, used=2039, min=87, max=159, avg=119.843, sd=10.931, gb_max=1.327, gb_min=1.378 diff=0.051 Starting haproxy_server_hash time=5620 ms Table size is 244359 tot=244359, used=2039, min=83, max=166, avg=119.843, sd=10.993, gb_max=1.385, gb_min=1.444 diff=0.059 Starting JSHash time=5681 ms Table size is 244359 tot=244359, used=2039, min=84, max=156, avg=119.843, sd=11.221, gb_max=1.302, gb_min=1.427 diff=0.125 Starting RSHash time=5779 ms Table size is 244359 tot=244359, used=2039, min=85, max=158, avg=119.843, sd=11.003, gb_max=1.318, gb_min=1.410 diff=0.092 Starting PJWHash time=5789 ms Table size is 244359 tot=244359, used=2039, min=86, max=159, avg=119.843, sd=11.074, gb_max=1.327, gb_min=1.394 diff=0.067 Starting wt_hash time=5796 ms Table size is 244359 tot=244359, used=2039, min=82, max=162, avg=119.843, sd=11.267, gb_max=1.352, gb_min=1.461 diff=0.110 Starting kr_hash time=5716 ms Table size is 244359 tot=244359, used=2039, min=82, max=158, avg=119.843, sd=11.298, gb_max=1.318, gb_min=1.461 diff=0.143 --- Now sorted it by diff. --- TABLESIZE is 2039 Starting ELFHash time=5789 ms Table size is 244359 tot=244359, used=2039, min=87, max=164, avg=119.843, sd=10.673, gb_max=1.368, gb_min=1.378 diff=0.009 Starting hashpjw time=5837 ms Table size is 244359 tot=244359, used=2039, min=90, max=157, avg=119.843, sd=10.828, gb_max=1.310, gb_min=1.332 diff=0.022 Starting fnv_32a_str time=5784 ms Table size is 244359 tot=244359, used=2039, min=85, max=163, avg=119.843, sd=10.884, gb_max=1.360, gb_min=1.410 diff=0.050 Starting oat_hash time=5764 ms Table size is 244359 tot=244359, used=2039, min=87, max=159, avg=119.843, sd=10.931, gb_max=1.327, gb_min=1.378 diff=0.051 Starting haproxy_server_hash time=5620 ms Table size is 244359 tot=244359, used=2039, min=83, max=166, avg=119.843, sd=10.993, gb_max=1.385, gb_min=1.444 diff=0.059 Starting bernstein time=5727 ms Table size is 244359 tot=244359, used=2039, min=87, max=158, avg=119.843, sd=10.880, gb_max=1.318, gb_min=1.378 diff=0.059 Starting PJWHash time=5789 ms Table size is 244359 tot=244359, used=2039, min=86, max=159, avg=119.843, sd=11.074, gb_max=1.327, gb_min=1.394 diff=0.067 Starting SuperFastHash2 time=5694 ms Table size is 244359 tot=244359, used=2039, min=86, max=158, avg=119.843, sd=10.475, gb_max=1.318, gb_min=1.394 diff=0.075 Starting SuperFastHash time=5707 ms Table size is 244359 tot=244359, used=2039, min=86, max=158, avg=119.843, sd=10.475, gb_max=1.318, gb_min=1.394 diff=0.075 Starting RSHash time=5779 ms Table size is 244359 tot=244359, used=2039, min=85, max=158, avg=119.843, sd=11.003, gb_max=1.318, gb_min=1.410 diff=0.092 Starting haproxy_uri_hash time=5755 ms Table size is 244359 tot=244359, used=2039, min=84, max=159, avg=119.843, sd=10.810, gb_max=1.327, gb_min=1.427 diff=0.100 Starting APHash time=5705 ms Table size is 244359 tot=244359, used=2039, min=85, max=156, avg=119.843, sd=10.741, gb_max=1.302, gb_min=1.410 diff=0.108 Starting wt_hash time=5796 ms Table size is 244359 tot=244359, used=2039, min=82, max=162, avg=119.843, sd=11.267, gb_max=1.352, gb_min=1.461 diff=0.110 Starting JSHash time=5681 ms Table size is 244359 tot=244359, used=2039, min=84, max=156, avg=119.843, sd=11.221, gb_max=1.302, gb_min=1.427 diff=0.125 Starting sax_hash time=5744 ms Table size is 244359 tot=244359, used=2039, min=84, max=155, avg=119.843, sd=10.792, gb_max=1.293, gb_min=1.427 diff=0.133 Starting hashword time=5815 ms Table size is 244359 tot=244359, used=2039, min=85, max=153, avg=119.843, sd=10.797, gb_max=1.277, gb_min=1.410 diff=0.133 Starting kr_hash time=5716 ms Table size is 244359 tot=244359, used=2039, min=82, max=158, avg=119.843, sd=11.298, gb_max=1.318, gb_min=1.461 diff=0.143 Starting fnv_hash time=5820 ms Table size is 244359 tot=244359, used=2039, min=79, max=159, avg=119.843, sd=10.846, gb_max=1.327, gb_min=1.517 diff=0.190 Starting hash_djbx33 time=5725 ms Table size is 244359 tot=244359, used=2039, min=76, max=156, avg=119.843, sd=10.753, gb_max=1.302, gb_min=1.577 diff=0.275 --- Cheers AleksReceived on 2007/12/30 14:18
This archive was generated by hypermail 2.2.0 : 2007/12/30 14:30 CET