Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm

From: Donal K. Fellows <dona...@manchester.ac.uk>

Mon, 15 Feb 2010 15:14:07 +0000

On 15/02/2010 04:56, John Ousterhout wrote:
> It's possible that the proposed new algorithm is better than the> original Tcl one; it looks like it might use a more complex> approach that tends to spread the information from each character> more quickly across all of the bits of the hash value, whereas> the multiply-by-9 approach only gradually does this.

The original Tcl algorithm is not bad for most keys. It both keeps bits
low down and spreads them up, and it uses operations that don't lose too
much information (i.e., multiplication and addition). The FNV algorithm
is remarkably similar in structure (so advantageous in terms of ease of
transition) but uses a different multiplier that spreads the information
more thoroughly and uses xor rather than addition (though we all know
those are actually closely related operations).

The main issue with the old hash algorithm was that it was too easy to
attack it and turn hash table accesses from O(1) to O(n**2). A
literature search (not very easy; this is not a heavily studied area)
indicates that the FNV hash is quite a bit more resistant; attacks would
require using much longer keys and would be more likely to hit some
other limit.

The other advantage of the FNV algorithm is that it is well-known,
reasonably well-studied, and does a reasonably good job with producing
keys for the space of "short words"; I'm not really the guy saying that
it is quite good, a number of others are (URLs elsewhere in this
thread). I don't know of anyone other than Tcl using the old hash
algorithm (though related variants have been studied and found lacking).

I'm tempted by Bob Jenkins's lookup3 hash, but it's big, complex, and
really wants to know the length of the string first so implementing it
efficiently for string keys is awkward[*]. We could use it for Tcl_Obj
keys, but I'm currently keener on using the same algorithm for both
types of user-visible keys. (I'm also a bit hesitant because of the
complexity of lookup3; it's non-trivial to code.)

[Good advice elided]
> So, if you decide to change, make sure you measure very carefully> on representative datasets, and make sure you've studied the> properties of hashing functions enough to understand exactly> how yours is working and why it is better.

On non-malicious datasets, Tcl's original algorithm works pretty well.
It's the case when someone's deliberately trying to provoke bad
behaviour that concerned me. FNV seems to do better (and has reasonable
distribution behaviour on the non-malicious datasets I've tried it with,
such as the contents of /usr/share/dict/words and the first million
cardinal numbers).

As a side note, changing the hash has flushed out a small number of
places in Tcl where we had code that actually did depend on the order of
hashing. These places were bugs waiting to happen...

Donal.
[* Maintaining binary and source compatibility was a cast-iron
    requirement for this change, so storing the length of a string key
    was not possible; there's nowhere in a Tcl_HashEntry for it to go
    and offsets of fields are baked into user code because of macros. ]

begin:vcard
fn:Donal K. Fellows
n:Fellows;Donal
org:The University of Manchester;Research Computing Services
adr:;;Oxford Road;Manchester;;M13 9PL;United Kingdom
email;internet:dona...@manchester.ac.uk
version:2.1
end:vcard

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev

_______________________________________________
Tcl-Core mailing list
Tcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tcl-core

Recent Messages in this Thread
[TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 07, 2010 09:36 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Eric Melski	Feb 08, 2010 06:40 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 08, 2010 11:21 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Eric Melski	Feb 08, 2010 06:45 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 09, 2010 01:47 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Larry McVoy	Feb 09, 2010 02:42 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Colin McCormack	Feb 09, 2010 01:05 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Larry McVoy	Feb 09, 2010 03:48 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Colin McCormack	Feb 09, 2010 11:14 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Gustaf Neumann	Feb 09, 2010 04:15 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 10, 2010 12:38 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Lars Hellström	Feb 14, 2010 10:56 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Larry McVoy	Feb 15, 2010 03:20 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 15, 2010 10:02 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Tom Jackson	Feb 15, 2010 08:55 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Lars Hellstr�m	Feb 15, 2010 11:53 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 16, 2010 11:56 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Colin McCormack	Feb 16, 2010 01:01 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Eric Melski	Feb 09, 2010 11:35 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Alexandre Ferrieux	Feb 11, 2010 12:08 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Alexandre Ferrieux	Feb 12, 2010 04:57 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Gustaf Neumann	Feb 14, 2010 04:58 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 14, 2010 11:54 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Gustaf Neumann	Feb 16, 2010 03:25 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 16, 2010 04:45 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Gustaf Neumann	Feb 17, 2010 11:04 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Kristoffer Lawson	Feb 17, 2010 11:18 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Tom Jackson	Feb 16, 2010 06:30 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 16, 2010 10:44 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Gustaf Neumann	Feb 17, 2010 11:49 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 16, 2010 10:36 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	John Ousterhout	Feb 15, 2010 04:56 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 15, 2010 03:14 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Colin McCormack	Feb 16, 2010 01:16 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Larry McVoy	Feb 15, 2010 04:16 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Tom Jackson	Feb 18, 2010 09:31 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Tom Jackson	Feb 17, 2010 06:41 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 18, 2010 09:23 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Gustaf Neumann	Feb 18, 2010 12:23 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Tom Jackson	Feb 17, 2010 01:36 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 17, 2010 11:56 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Tom Jackson	Feb 16, 2010 09:30 am
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Joe English	Feb 08, 2010 07:56 pm
Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm	Donal K. Fellows	Feb 08, 2010 09:56 pm

◄ Messages in this thread ►

Previous post: Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm

Next post: Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm

Subscribe to the tcl-core RSS feed

Accounts

List Archives

Feedback & Information

ActiveState

© 2019 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.