| Store | Cart

Re: [TCLCORE] Upgrade of Tcl's String Hash Algorithm

From: Gustaf Neumann <neum...@wu-wien.ac.at>
Sun, 14 Feb 2010 17:58:25 +0100
Am 11.02.10 01:08, schrieb Alexandre Ferrieux:
> Since it appears that timings are tricky,
Apparently. It turns out that in the both mini-benchmarks
(the two "foo" functions) the hash function for string keys
(HashStringKey()) is not called at all (!) in the timed loop
(e.g. Tcl variables are compiled locals). Funny enough,
just by changing the string hash function one can measure
different performance values (maybe due to processor
level caching or similar external effects).

Two more observations:

- At least on x86 Intel, gcc compiles the classical Tcl string hash
    function pretty well. The loop body if the hash function
          "result += (result<<3) + c;"
    results in just two machine instructions (one lea + one add).
    This is hard to beat by assembly coding (if someone is interested, i
    have an apparently slighter faster inline assembly version for x86
    64bit). So, the classical Tcl hash function appears well suited for
    modern compilers, and gcc does very well here (i used gcc 4.2.1).
    FNV needs the slower integer multiply, that's why the FNV
    is slower (measured by Donals hashes.c test).

- For small hash tables, only a few bits of the computed  hash
    values are used. For the initial size, these are only  the last two
    bits. By every hash table resize (RebuildTable) two  more bits are
    added. It is not clear how the properties of hash functions
    propagate, when only a small fraction (e.g. 2, 4, 6 etc. bits) of
    the computed hash value are actually used.

It looks to me as if the original Tcl hash function has quite
good properties for the task it is used.

In order to make hashing faster in Tcl, it is most likely more
promising to look at the Tcl-hash machinery altogether.
The granularity of the functions does not look perfectly
suited for the needs of current Tcl. So the current definitions

==================================
Tcl_HashKeyType {
   ....
   HashKeyProc
   CompareHashKeysProc
   AllocHashEntryProc
   FreeHashEntryProc
}

Tcl_HashEntry * Tcl_CreateHashEntry(...)  {
     ...
     if (tablePtr->keyType == TCL_STRING_KEYS) {
         typePtr = &tclStringHashKeyType;
     } else if (tablePtr->keyType == TCL_ONE_WORD_KEYS) {
         typePtr = &tclOneWordHashKeyType;
     } else if (tablePtr->keyType == TCL_CUSTOM_TYPE_KEYS
         || tablePtr->keyType == TCL_CUSTOM_PTR_KEYS) {
         typePtr = tablePtr->typePtr;
     } else {
         typePtr = &tclArrayHashKeyType;
     }
     ...
==================================

could become something like

==================================
Tcl_HashKeyType {
   ....
   CreateHashEntryProc
   FreeHashEntryProc
}

#define Tcl_CreateHashEntry(tablePtr,..)  
(...)tablePtr->type->CreateHashEntryProc(...)
==================================

to provide tailored create functions for every hashKeyType. Tcl does
not call HashKeyProc, CompareHashKeysProc, or AllocHashEntryProc
outside CreateHashEntry. The common "keyType == ..." sermon used in
several functions of the hash API does not look perfect either and
could be replaced by a macro like the one sketched....

best regards
-gustaf neumann


------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Tcl-Core mailing list
Tcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tcl-core

Recent Messages in this Thread
Donal K. Fellows Feb 07, 2010 09:36 am
Eric Melski Feb 08, 2010 06:40 am
Donal K. Fellows Feb 08, 2010 11:21 am
Eric Melski Feb 08, 2010 06:45 pm
Donal K. Fellows Feb 09, 2010 01:47 am
Larry McVoy Feb 09, 2010 02:42 am
Colin McCormack Feb 09, 2010 01:05 pm
Larry McVoy Feb 09, 2010 03:48 pm
Colin McCormack Feb 09, 2010 11:14 pm
Gustaf Neumann Feb 09, 2010 04:15 pm
Donal K. Fellows Feb 10, 2010 12:38 am
Lars Hellström Feb 14, 2010 10:56 pm
Larry McVoy Feb 15, 2010 03:20 am
Donal K. Fellows Feb 15, 2010 10:02 pm
Tom Jackson Feb 15, 2010 08:55 pm
Lars Hellström Feb 15, 2010 11:53 pm
Donal K. Fellows Feb 16, 2010 11:56 am
Colin McCormack Feb 16, 2010 01:01 pm
Eric Melski Feb 09, 2010 11:35 pm
Alexandre Ferrieux Feb 11, 2010 12:08 am
Alexandre Ferrieux Feb 12, 2010 04:57 pm
Gustaf Neumann Feb 14, 2010 04:58 pm
Donal K. Fellows Feb 14, 2010 11:54 pm
Gustaf Neumann Feb 16, 2010 03:25 pm
Donal K. Fellows Feb 16, 2010 04:45 pm
Gustaf Neumann Feb 17, 2010 11:04 am
Kristoffer Lawson Feb 17, 2010 11:18 am
Tom Jackson Feb 16, 2010 06:30 pm
Donal K. Fellows Feb 16, 2010 10:44 pm
Gustaf Neumann Feb 17, 2010 11:49 am
Donal K. Fellows Feb 16, 2010 10:36 pm
John Ousterhout Feb 15, 2010 04:56 am
Donal K. Fellows Feb 15, 2010 03:14 pm
Colin McCormack Feb 16, 2010 01:16 pm
Larry McVoy Feb 15, 2010 04:16 pm
Tom Jackson Feb 18, 2010 09:31 pm
Tom Jackson Feb 17, 2010 06:41 pm
Donal K. Fellows Feb 18, 2010 09:23 am
Gustaf Neumann Feb 18, 2010 12:23 pm
Tom Jackson Feb 17, 2010 01:36 am
Donal K. Fellows Feb 17, 2010 11:56 am
Tom Jackson Feb 16, 2010 09:30 am
Joe English Feb 08, 2010 07:56 pm
Donal K. Fellows Feb 08, 2010 09:56 pm
Messages in this thread