Author Topic: dynamic hash tables (paper)  (Read 498 times)

0 Members and 1 Guest are viewing this topic.

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4129
  • Country: gb
dynamic hash tables (paper)
« on: July 24, 2023, 02:23:36 pm »
let me know what do you think about the attached paper  :D

* dynamic-hash-table.pdf (1238.57 kB - downloaded 49 times.)
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online ejeffrey

  • Super Contributor
  • ***
  • Posts: 3828
  • Country: us
Re: dynamic hash tables (paper)
« Reply #1 on: July 24, 2023, 04:21:41 pm »
It's considered polite to provide a text summary and brief explanation of your particular interest rather than just link and run.  Especially for an over 30 year old paper that isn't exactly a new result.

Anyway my thoughts so far are basically that in memory hash tables have had tremendous development in the last 30+ years and the applications for this particular approach seem a bit specialized at best.  It describes a linearly growing hash table that works by appending new buckets on the end one at a time to avoid rebuilding.  That seems interesting but how often can you linearly expand a data structure in memory but you couldn't just as well make it larger to start?  So they add an extent allocator underneath, adding an extra level of indirection.  That's fine but if you want to store really huge tables it's going to run out of space unless you make more layers of indirection at which point you actually have a B tree.
 

Offline AndyBeez

  • Frequent Contributor
  • **
  • Posts: 856
  • Country: nu
Re: dynamic hash tables (paper)
« Reply #2 on: July 24, 2023, 04:22:49 pm »
TLDR :(

What is your use case? This is either how Google works or it has been gathering dust on a shelf since 1988.
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4129
  • Country: gb
Re: dynamic hash tables (paper)
« Reply #3 on: July 24, 2023, 04:44:40 pm »
What is your use case? This is either how Google works or it has been gathering dust on a shelf since 1988.

There is no "use-case", at the moment :D

I don't know exactly why but none of my algorithmic books has ever mentioned "dynamic hashes"; I found them myself today and I was simply pleased to know that if until yesterday I automatically preferred to implement everything with B+tree, starting tomorrow I can rethink the use of hash-tables without the usual bias (not worth it) and evaluate if it gives any good advantages.

I still have to digest the paper, but there's always something new, right? Well, I wanted to share this.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4310
  • Country: nz
Re: dynamic hash tables (paper)
« Reply #4 on: July 24, 2023, 07:24:43 pm »
Anyway my thoughts so far are basically that in memory hash tables have had tremendous development in the last 30+ years

Very true.

Chiefly, the quality and speed of hash functions has vastly improved, with near-cryptographic quality available cheaply, meaning every bit of the hash is equally random, allowing simple power of 2 table sizes instead of prime number table sizes.

Quote
It describes a linearly growing hash table that works by appending new buckets on the end one at a time to avoid rebuilding.  That seems interesting but how often can you linearly expand a data structure in memory but you couldn't just as well make it larger to start?  So they add an extent allocator underneath, adding an extra level of indirection.  That's fine but if you want to store really huge tables it's going to run out of space unless you make more layers of indirection at which point you actually have a B tree.

On a 32 bit machine, two levels of pointer arrays and 4K blocks for both pointer arrays and data gives 1024 pointers in the root block, 1,048,576 pointers in the 2nd layer, and 4 GB of data (hash table in this case) in the final layer. Ok, 4 GB minus epsilon, since 1025 of the 1,048,576 4K pages are used up by the pointer array blocks.

This is a pretty feasible solution on 32 bit machines, and one I've used myself to implement all potentially "large" data types on small machines: arrays, stretchy arrays, strings. It vastly simplifies your memory allocator / GC to only have to deal with blocks of size 16, 32, 64, 128, 256, 512, 1024, 2048, 4096 (9 sizes). The last (or only) block at each pointer layer and the data layer can be smaller than the full 4K.

It's not even slow, because the top level index block and whichever 2nd level index blocks you're using at any given time will be in L1 cache. And with always a known fixed 2 levels (NOT the case with a B-tree), the lookup code is branch-free. It's a few instructions, but trivial compared to calculating the hash function.
 
The following users thanked this post: SiliconWizard, DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4129
  • Country: gb
Re: dynamic hash tables (paper)
« Reply #5 on: July 25, 2023, 12:37:26 pm »
Holy Paper somewhat theoretically digested.
Today I am going to implement my first dynamic hashtable on a MIPS32 board.
Let's see how it goes in practice.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf