demerphq wrote:
> On 5 December 2012 23:57, bulk88 <bul...@hotmail.com> wrote:>> Craig A. Berry wrote:>>>>>> It doesn't look to me like anything special will be required to build>>> in this situation. If the compiler supplies uint64_t or equivalent,>>> that's what SipHash will use. If the compiler has to generate two or>>> three hardware instructions to perform an operation on a 64-bit>>> integer because its target doesn't have 64-bit integer instructions,>>> so be it.>>>>>> It looks like the operations involved in SipHash are all bit shifting>>> and bit flipping (no exponentiation, no converting back and forth to>>> double, etc.), so it shouldn't be too much of a strain if those>>> operations have to be done by combining operations on two, 32-bit>>> pieces of string instead of being a single operation on one, 64-bit>>> piece of string, though of course benchmarks would be welcome.>>>> I wrote a benchmark (hack and compiler warnings galore, Win32 Visual C only,>> SipHash also does a buffer overrun read on the 4 byte PL_hash_seed in my> > Huh?! PL_hash_seed should be 16 bytes under SipHash.> >> 8c6c6997cf2a8cd5e947a61f94ca02dd8b963334 Perl 5.17 (murmur3), after>> PL_hash_seed in the image are 28 bytes null padding bytes to align a static>> critical section lock that is after PL_hash_seed).> > I dont understand this. Can you explain?
My Perl, at the commit level I mentioned, uses murmur3, so to
PL_hash_seed is 4 bytes long. Using SipHash, without any HV code, in a
benchmark, WITHOUT recompiling the interp, will result in SipHash
overruning PL_hash_seed's 4 byte allocation and using 12 garbage bytes
after PL_hash_seed as part of its seed, in my benchmark, its a
disclaimer. I will post the code in another post in this thread. I
forgot to attach it, even though I wrote a sentence assuming I had
attached it ("I wrote a benchmark (hack and compiler warnings galore,
Win32 Visual C only...").
> >> Anyway, there is a huge>> difference between SipHash on 32 bit x86 and one at a time, SipHash is 7>> times slower.> > I did benchmarks of various hash operations and length and it was> nothing close to 7 times slower.> >> _________________________________________________________________>> SIPHASH time=23.227197, opt= -O1 -G7 -GL -Oi -Og hashsum=847454976>> ONE_AT_A_TIME time=3.284225, opt= -O1 -G7 -GL -Oi -Og hashsum=1202428160>> _________________________________________________________________>>>> Counting instruction bytes for SipHash between the 1st>> QueryPerformanceCounter and the 2nd QueryPerformanceCounter, resulted in>> 2291 bytes of machine instructions and 793 instructions (semi hand counted)>> (this includes the overhead of the 2 benchmark loops, and selecting strings>> and their lengths from the arrays). Counting the same way for one at a time,>> gave me 109 instruction bytes and 35 instructions (semi hand counted). It is>> not 2 or 3 more machine instructions per C level operation, 793 / 35 = 22.6.>> They are different algorithms so its apples to oranges comparison, but 21>> times longer in machine code, 22 times more machine instructions, and 7>> times longer by clock time.> > So you are saying this is because of 64 bit simulation?>
Probably, due to 22 times more instructions being executed/not in L1
cache/etc. Its a bit interesting its 22 times more instructions, yet
only 7 times longer by time, gotta love modern CPUs that do superscalar
execution.