| Store | Cart

Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm

From: Colin McCormack <col...@chinix.com>
Wed, 17 Feb 2010 00:16:08 +1100
John Ousterhout wrote:
>> One very glaring way of looking at these flaws is that the old>> hash  function only distinguishes between 2551 different two-byte >> sequences, despite there being 65536 of them; this is less than>> 4%. Conversely, for *any* pair of neighbouring bytes of a key,>> there are on average 24 other pairs of bytes that could be>> substituted without changing the value of the old hash function.>> Yet another way of looking at it (though probably hard to>> formalise): the old hash function only keeps about 3.3 bits>> of "randomness" for every 8 bits of key fed in to it!>>     >> This is true. I knew this at the time, and it was an intentional> (and actually correct!) choice.  It turns out that typical strings> don't actually have 8 bits of information per byte: for example,> most strings that get hashed either consist of purely alphabetic> stuff or sequences of digits.  Even in the case of alphabetic> strings, some characters are much more likely than others, so> the space of strings doesn't even grow by a factor of 26 with> each additional character that's added.  For example, how many> 2-character words and numbers are there in the English language?> I'll bet it's a lot less than 2551.>   

If there is less entropy in a set of english words than in the set of 
extended-ascii characters necessary to encode those words, then one 
would expect less entropy in the output than in the output of a random 
bunch of binary.  That's not a virtue, it's the law.

If, however, there is exploitable entropy in the domain being hashed, 
it's no virtue of the hash function that it discards it.

I would also like to point out that the set of identifiers (being that 
they are not english words, but often contractions and concatenations 
and mungings of them) is not the same as the set of english words ... is 
it?  Someone should do the old SHRLDU ETAOIN comparison over them.

Colin.


------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Tcl-Core mailing list
Tcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tcl-core

Recent Messages in this Thread
Donal K. Fellows Feb 07, 2010 09:36 am
Eric Melski Feb 08, 2010 06:40 am
Donal K. Fellows Feb 08, 2010 11:21 am
Eric Melski Feb 08, 2010 06:45 pm
Donal K. Fellows Feb 09, 2010 01:47 am
Larry McVoy Feb 09, 2010 02:42 am
Colin McCormack Feb 09, 2010 01:05 pm
Larry McVoy Feb 09, 2010 03:48 pm
Colin McCormack Feb 09, 2010 11:14 pm
Gustaf Neumann Feb 09, 2010 04:15 pm
Donal K. Fellows Feb 10, 2010 12:38 am
Lars Hellström Feb 14, 2010 10:56 pm
Larry McVoy Feb 15, 2010 03:20 am
Donal K. Fellows Feb 15, 2010 10:02 pm
Tom Jackson Feb 15, 2010 08:55 pm
Lars Hellström Feb 15, 2010 11:53 pm
Donal K. Fellows Feb 16, 2010 11:56 am
Colin McCormack Feb 16, 2010 01:01 pm
Eric Melski Feb 09, 2010 11:35 pm
Alexandre Ferrieux Feb 11, 2010 12:08 am
Alexandre Ferrieux Feb 12, 2010 04:57 pm
Gustaf Neumann Feb 14, 2010 04:58 pm
Donal K. Fellows Feb 14, 2010 11:54 pm
Gustaf Neumann Feb 16, 2010 03:25 pm
Donal K. Fellows Feb 16, 2010 04:45 pm
Gustaf Neumann Feb 17, 2010 11:04 am
Kristoffer Lawson Feb 17, 2010 11:18 am
Tom Jackson Feb 16, 2010 06:30 pm
Donal K. Fellows Feb 16, 2010 10:44 pm
Gustaf Neumann Feb 17, 2010 11:49 am
Donal K. Fellows Feb 16, 2010 10:36 pm
John Ousterhout Feb 15, 2010 04:56 am
Donal K. Fellows Feb 15, 2010 03:14 pm
Colin McCormack Feb 16, 2010 01:16 pm
Larry McVoy Feb 15, 2010 04:16 pm
Tom Jackson Feb 18, 2010 09:31 pm
Tom Jackson Feb 17, 2010 06:41 pm
Donal K. Fellows Feb 18, 2010 09:23 am
Gustaf Neumann Feb 18, 2010 12:23 pm
Tom Jackson Feb 17, 2010 01:36 am
Donal K. Fellows Feb 17, 2010 11:56 am
Tom Jackson Feb 16, 2010 09:30 am
Joe English Feb 08, 2010 07:56 pm
Donal K. Fellows Feb 08, 2010 09:56 pm
Messages in this thread