John Ousterhout wrote:
>> One very glaring way of looking at these flaws is that the old>> hash function only distinguishes between 2551 different two-byte >> sequences, despite there being 65536 of them; this is less than>> 4%. Conversely, for *any* pair of neighbouring bytes of a key,>> there are on average 24 other pairs of bytes that could be>> substituted without changing the value of the old hash function.>> Yet another way of looking at it (though probably hard to>> formalise): the old hash function only keeps about 3.3 bits>> of "randomness" for every 8 bits of key fed in to it!>> >> This is true. I knew this at the time, and it was an intentional> (and actually correct!) choice. It turns out that typical strings> don't actually have 8 bits of information per byte: for example,> most strings that get hashed either consist of purely alphabetic> stuff or sequences of digits. Even in the case of alphabetic> strings, some characters are much more likely than others, so> the space of strings doesn't even grow by a factor of 26 with> each additional character that's added. For example, how many> 2-character words and numbers are there in the English language?> I'll bet it's a lot less than 2551.>
If there is less entropy in a set of english words than in the set of
extended-ascii characters necessary to encode those words, then one
would expect less entropy in the output than in the output of a random
bunch of binary. That's not a virtue, it's the law.
If, however, there is exploitable entropy in the domain being hashed,
it's no virtue of the hash function that it discards it.
I would also like to point out that the set of identifiers (being that
they are not english words, but often contractions and concatenations
and mungings of them) is not the same as the set of english words ... is
it? Someone should do the old SHRLDU ETAOIN comparison over them.
Colin.
------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Tcl-Core mailing list
Tcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tcl-core