2010/2/15 Lars Hellström <Lars...@residenset.net>:
> Tom Jackson skrev:>>>> Hey,>>>> As far as I can tell this entire operation and the logic used to>> justify it is totally messed up.>>>> Unless I'm missing some additional code, one of the basic>> justifications for this change is completely wrong.>>>> That justification is the wiki entry which shows FNV somehow creates a>> hash key distribution that outsmarts the test program "collide". From>> my reading of the collide program it was designed to find collisions.>> Unfortunately the code seems to find X candidates which have the same>> hash value and then create an array.>>>> Guess what! This trick works for any hash function. As far as I can>> tell it works for the new FNV hash and one which I tried the djb hash.>> Yes, but it is extremely simple for the classic Tcl hash, because:>> 1. It has the property that if $x and $y are two equal length strings which> hash to the same 32-bit value, then $a$x$b and $a$y$b will be a pair of> strings which hash to the same value too, for any strings $a and $b.>> 2. There are very many pairs of strings as short as 2 characters which hash> to the same 32-bit value.>> FNV escapes 1 by using ^ to add in the characters, and 2 by using a> multiplier larger than 255. It's probably still possible to find strings $x> and $y such that FNV($x) = FNV($y), but I suspect they'd need to be of> length at least 5, and chances are pretty good that FNV($a$x) != FNV($a$y)> (naively I'd say 255/256, but there could be strings $x and $y which are> less sensitive to the prefix $a).>> With the FNV hash function, an HTTP header parser can to some extent> randomise the results of an attacker by picking a $salt prefix and using> $salt$header as hash entry for data that has to do with the the $header> prefix. With the classic hash function, that has no effect whatsoever; an> attack works just as well no matter what $salt is chosen.
Here is a short testing script which compares the hash function in Tcl
(internal) as well as three external hash functions.
I use the collide proc to generate the array keys and display the
stats for the internal bucket distribution.
Then the keys are fed into the other procs and the resultant hash
values are saved.
The hash values are reduced modulo some prime number close to the
number of buckets used by the internal hash array.
Here's what I noticed:
The original code (current Tcl hash code) is slow to compute [collide
15], almost three minutes. Just changing the internal hash function
sped this up two orders of magnitude: just a few seconds. I guess this
is the result of the collisions?
The internal bucket assignment is far worse of a problem than the hash
function itself. With the exception of hash collisions, which
obviously go into the same bucket no matter what, the bucket
assignment code causes even more collisions.
The external hash functions perform similarly when buckets are
determined by their modulus some prime number. The current Tcl hash
function also does okay with powers of two, but FNV and the DJB hash
do not. When using the prime method, the distributions are all much
better than the internal bucket assignment.
The FNV hash, when used as the internal hash function somehow resists
most of the bucket assignment problems. The results are still worse
than the external FNV using a prime modulus.
Maybe there is a reason to go to all the trouble of finding a good
hash function only to throw away most of the randomness in the higher
order bits. A prime modulus would maintain this randomness as much as
possible.
# Note aj => ajr and ba => bas in collide:
proc collide {level {accum ""}} {
global a
if {$level} {
incr level -1
collide $level "ajr$accum"
collide $level "bas$accum"
} else {
set a($accum) 1
}
}
puts [time {collide 10}]
puts [array statistics a]
set names [array names a]
# Change modulus for different size arrays:
# Examples:
# 19 251 521 1021 2063 4889 10007 16567
namespace eval ::hash {
variable modulus 1021
}
proc ::hash::hashString { string } {
variable modulus
set hash 5381
foreach char [split $string ""] {
scan $char %c char
set hash [expr {(($hash << 5) + $hash) ^ $char}]
set hash [string trimleft [string range $hash end-12 end] 0]
}
return [list $string [expr {$hash % $modulus}] $hash]
}
proc ::hash::hashString2 { string } {
set hash 0
variable modulus
foreach char [split $string ""] {
scan $char %c char
set hash [expr {($hash + ($hash << 3 ) + $char)}]
set hash [string trimleft [string range $hash end-12 end] 0]
}
return [list $string [expr {$hash % $modulus}] $hash]
}
proc ::hash::hashString3 { string } {
variable modulus
set hashPrime 1000193
set hash [expr 0x811c9dc5]
foreach char [split $string ""] {
scan $char %c char
set hash [expr {($hash * $hashPrime ) ^ $char}]
set hash [string trimleft [string range $hash end-12 end] 0]
}
return [list $string [expr {$hash % $modulus}] $hash]
}
foreach word $names {
set hash1 [::hash::hashString $word]
set hash2 [::hash::hashString2 $word]
set hash3 [::hash::hashString3 $word]
incr hashmod1([lindex $hash1 1])
incr hashmod2([lindex $hash2 1])
incr hashmod3([lindex $hash3 1])
}
set desc1 "DJB hash from cdb"
set desc2 "Current Tcl hash"
set desc3 "FNV Hash"
foreach array {1 2 3} {
foreach {mod count} [array get hashmod$array] {
incr bucket${array}($count)
}
}
if {0} {
foreach array {1 2 3} {
puts "hashmod$array [set desc$array]"
foreach name [lsort -integer [array names hashmod$array]] {
puts "$name => [set hashmod${array}($name)]"
}
}
}
puts "Modulus $::hash::modulus (hash%modulus = bucket)"
foreach array {1 2 3} {
puts "Bucket Fill hashmod$array [set desc$array]"
set total 0
set usedBuckets 0
foreach name [lsort -integer [array names bucket$array]] {
set buckets [expr {[set bucket${array}($name)] * $name}]
incr total $buckets
incr usedBuckets [set bucket${array}($name)]
puts "buckets with $name = [set bucket${array}($name)] total $total"
}
puts "buckets with 0 = [expr $::hash::modulus - $usedBuckets]\n\n"
}
------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Tcl-Core mailing list
Tcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tcl-core