Author Topic: dynamic hash tables (paper) (Read 498 times)

DiTBho · « **on:** July 24, 2023, 02:23:36 pm »

let me know what do you think about the attached paper

dynamic-hash-table.pdf (1238.57 kB - downloaded 49 times.)

ejeffrey · « **Reply #1 on:** July 24, 2023, 04:21:41 pm »

It's considered polite to provide a text summary and brief explanation of your particular interest rather than just link and run. Especially for an over 30 year old paper that isn't exactly a new result.

Anyway my thoughts so far are basically that in memory hash tables have had tremendous development in the last 30+ years and the applications for this particular approach seem a bit specialized at best. It describes a linearly growing hash table that works by appending new buckets on the end one at a time to avoid rebuilding. That seems interesting but how often can you linearly expand a data structure in memory but you couldn't just as well make it larger to start? So they add an extent allocator underneath, adding an extra level of indirection. That's fine but if you want to store really huge tables it's going to run out of space unless you make more layers of indirection at which point you actually have a B tree.

AndyBeez · « **Reply #2 on:** July 24, 2023, 04:22:49 pm »

TLDR

What is your use case? This is either how Google works or it has been gathering dust on a shelf since 1988.

DiTBho · « **Reply #3 on:** July 24, 2023, 04:44:40 pm »

Quote from: AndyBeez on July 24, 2023, 04:22:49 pm

What is your use case? This is either how Google works or it has been gathering dust on a shelf since 1988.

There is no "use-case", at the moment

I don't know exactly why but none of my algorithmic books has ever mentioned "dynamic hashes"; I found them myself today and I was simply pleased to know that if until yesterday I automatically preferred to implement everything with B+tree, starting tomorrow I can rethink the use of hash-tables without the usual bias (not worth it) and evaluate if it gives any good advantages.

I still have to digest the paper, but there's always something new, right? Well, I wanted to share this.

brucehoult · « **Reply #4 on:** July 24, 2023, 07:24:43 pm »

Quote from: ejeffrey on July 24, 2023, 04:21:41 pm

Anyway my thoughts so far are basically that in memory hash tables have had tremendous development in the last 30+ years

Very true.

Chiefly, the quality and speed of hash functions has vastly improved, with near-cryptographic quality available cheaply, meaning every bit of the hash is equally random, allowing simple power of 2 table sizes instead of prime number table sizes.

Quote

It describes a linearly growing hash table that works by appending new buckets on the end one at a time to avoid rebuilding. That seems interesting but how often can you linearly expand a data structure in memory but you couldn't just as well make it larger to start? So they add an extent allocator underneath, adding an extra level of indirection. That's fine but if you want to store really huge tables it's going to run out of space unless you make more layers of indirection at which point you actually have a B tree.

On a 32 bit machine, two levels of pointer arrays and 4K blocks for both pointer arrays and data gives 1024 pointers in the root block, 1,048,576 pointers in the 2nd layer, and 4 GB of data (hash table in this case) in the final layer. Ok, 4 GB minus epsilon, since 1025 of the 1,048,576 4K pages are used up by the pointer array blocks.

This is a pretty feasible solution on 32 bit machines, and one I've used myself to implement all potentially "large" data types on small machines: arrays, stretchy arrays, strings. It vastly simplifies your memory allocator / GC to only have to deal with blocks of size 16, 32, 64, 128, 256, 512, 1024, 2048, 4096 (9 sizes). The last (or only) block at each pointer layer and the data layer can be smaller than the full 4K.

It's not even slow, because the top level index block and whichever 2nd level index blocks you're using at any given time will be in L1 cache. And with always a known fixed 2 levels (NOT the case with a B-tree), the lookup code is branch-free. It's a few instructions, but trivial compared to calculating the hash function.

DiTBho · « **Reply #5 on:** July 25, 2023, 12:37:26 pm »

Holy Paper somewhat theoretically digested.
Today I am going to implement my first dynamic hashtable on a MIPS32 board.
Let's see how it goes in practice.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: dynamic hash tables (paper) (Read 498 times)

DiTBho

dynamic hash tables (paper)

ejeffrey

Re: dynamic hash tables (paper)

AndyBeez

Re: dynamic hash tables (paper)

DiTBho

Re: dynamic hash tables (paper)

brucehoult

Re: dynamic hash tables (paper)

DiTBho

Re: dynamic hash tables (paper)

Share me