Git Product home page Git Product logo

vebtrees's Introduction

Van Emde Boas Tree / Priority Queue

About

This project is about implementing the very efficient priority queue data structure proposed by Peter van Emde Boas, supporting following operations:

Operation Time
IsEmpty O(1)
Min O(1)
Max O(1)
Member O(log log u)
Successor O(log log u)
Predecessor O(log log u)
Insert O(log log u)
Delete O(log log u)

Note: the predecessor operation is not implemented yet.

As you can see, the special thing about van Emde Boas trees is that they scale with the universe size (upper bound of the largest number to possibly insert), rather than with the number of items inserted. Moreover O(log log u) makes those operations almost as efficient as constant time (e.g. log_2(log_2(2^64)) = log_2(64) = 6).

Usage

If you haven't done already, install dotnet to your dev machine. Following commands show how to install dotnet onto Ubuntu 20.04.

wget https://packages.microsoft.com/config/ubuntu/20.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
rm -rf packages-microsoft-prod.deb

sudo apt-get update; \
  sudo apt-get install -y apt-transport-https && \
  sudo apt-get update && \
  sudo apt-get install -y dotnet-sdk-5.0

# official tutorial: https://docs.microsoft.com/de-de/dotnet/core/install/linux-ubuntu#2004-

Download the source code by cloning this git repository.

git clone https://github.com/Bonifatius94/VebTrees
cd VebTrees

Now you can run the benchmark tests using following command:

# run the unit tests
dotnet test --runtime linux-x64 --configuration Release

# run the benchmarks
dotnet run --project VebTrees.Benchmark --configuration Release --runtime win-x64

License

This software is available under the terms of the MIT license.

vebtrees's People

Contributors

bonifatius94 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

vebtrees's Issues

Add CI/CD Pipeline (Build + Test + Release)

Issue:

  • there's no server-side unified build pipeline
  • for releasing it would be nice to have a fully automated release mechanism

Suggested Solution:

  • create a GitHub Workflow configuration
    • use the official .NET Docker build env image
    • run the 'dotnet publish' command to build, test, package
    • if everthing passes, create a GitHub release and attach the binaries to it

Global Is No VEB Tree

Issue:

  • the global structure is just a bit array of sqrt(u) size
  • this cannot provide the performance of O(log log u) -> make the global structure a vEB tree as well

Suggested Solution:

  • like the local structure, use vEB children as well for global
  • check if the performance is (measurably) better after this fix

Reduce Locals Alloc Time Complexity

Issue:

  • allocating the locals array uses O(sqrt(u)) time because it sets an array of O(sqrt(u)) local structure entries to null for each pointer
  • but this exceeds the nice time complexity of the vEB tree of O(log log(u)) by far

Suggested Solution:

  • allocate the array, but don't set every entry to null pointers, etc.
  • in each vEB operation, only one local per tree layer is touched, so locals can be pointed to on-the-fly
  • moreover every insert/delete only creates/deletes max. one local / global node
  • problem: this kind of allocation requires unsafe memory operations, so it's questionable whether it's even possible in C#
  • a possible compromise could be only pre-allocating the topmost layers on vEB tree initialization, such that the array size of locals can be seen as a constant (e.g. several hundred items), making the lazy-alloc's time complexity feasible

Add Unit + Benchmarking Tests

Issue:

  • tests are rather informal, it's just a little simulation to play around with
  • improving performance requires proper automated testing

Suggested Solution:

  • add unit tests to ensure each operation works as expected (cover edge cases as well)
  • therefore create a solution, move the lib into an extra folder and create an xunit project for testing
  • transform the Program.cs script into performance tests to measure allocated RAM and computation time

Definition of Done:

  • there are unit tests covering each operation
  • there are performance tests measuring space and time complexity for a real-world use case

Make Max Storage More Efficient

Issue:

  • the vEB tree uses O(u log(u)) bits, but O(u) bits is possible with the same time complexity
  • idea: use the vEB tree only as adjacency and attach binary heaps at the bottom (using a composed local/global approach)
  • therefore only manage an universe of u/log(u) by the vEB tree and manage the rest inside binary search trees, each of size log(u), supporting all operations in O(log log u)
  • overall, all operations remain within their nice time complexity of O(log log u) or even O(1) while only using O(u) bits

Suggested Solution:

  • combine an in-order, bitwise binary search tree with the vEB tree data structure (note: binary search trees support all operations in a time complexity of O(log n) and use O(n) bits storage for managing up to n items having ids within [1, n])
  • therefore create a new local/global structure on top of the vEB tree, using the vEB as global and binary search trees as locals
    • the combined structure has u/log(u) children, each responsible for max. log(u) elements which requires only O(log u) bits and supports all operations in O(log log u) time
    • this results in a data structure with O(u/log(u) * log(u)) = O(u) bits that still supports all operations in O(log log u) time
  • unfortunately there's no out-of-the-box bitwise binary search tree in C#, so let's implement our own as follows:
    • keep track of inserted items by stroring a single bit for each item, telling whether an item is inserted or not
    • use the item's numeric value as index to address its bit in a bit vector (of length n) -> only O(n) bits
    • all items are at the position as if the binary search tree was fully-allocated (regardless of parent node existence) to simplify addressing by indexes
    • for tree navigation, use an additional adjacency bitvector indicating whether a subtree has any children
    • with the help of this additional invariant, the tree can be traversed as a usual binary search tree because it is guaranteed that there's at least one child smaller/larger when the adjacency bit is set
    • also, updating the adjacency bitvector doesn't cause any additional cost exceeding the general O-notation of O(log n) because inserting sets max. O(log n) bits and deleting vanishes max. O(log n) bits (and the bits changed are always located on the deterministic path between tree root and node to be inserted/deleted)
    • therefore all tree operations can be implemented in O(log n) time and the bitvectors only use O(n) bits which actually provides the structure properties that make the storage-efficient vEB tree structure possible
    • keep in mind that the binary search tree can only store 2^x - 1 items, given a universe of x bits, so the 0 needs to be handled separately using an extra bit

Add Bitwise Tree Leafs

Issue:

  • the vEB base case cascading down to universe size 2 is really unefficient
  • universes with size <= 64 can be handled using a bitboard

Suggested Solution:

  • implement a tree leaf data structure as ulong bitboard
  • the existence of an id can be checked by bitboard & (1ul << id) which is O(1)
  • the successor can be implemented using bit masks and leading zeros like
    least_significant_bit(bitboard & (0xFFFFFFFFFFFFFFFF << (id+1))) which is O(1)
  • the predecessor is also easy using the bitmask (1 << id) - 1 and most_significant_bit() which is O(1)
  • insert / delete can be done by bitboard ^= (1 << id) which is O(1)
  • min / max can be retrieved using least_significant_bit(bitboard) / most_significant_bit(bitboard) which is O(1)

Add Predecessor Function

Issue:

  • the current implementation does not contain a predecessor function
  • it could be useful to implement one though

Suggested Solution:

  • recall the lecture's exercise and implement the logic discovered there
  • make sure it's properly working by implementing a sorting test (like for successor)

Stablilize Insert/Delete Operations

Issue:

  • undefined behavior when inserting items twice or deleting non-existent items
  • desired behavior:
    • hard fail mode: catch those cases and throw an exception
    • soft fail mode: ignore those cases and just do nothing (pref.)

Suggested Solution:

  • implement a wrapper function that calls member() once to check if the item is present
  • WARNING: don't put this into the recursive function as this would cause performance of O(log u) instead of O(log log u)

Support Succ/Pred in Const Time

Issue:

  • successor and predecessor functions require O(log log u) time
  • could be O(1) when using an additional linked list for that

Suggested Solution:

  • add an array with succ/pred tuples pointing to each other
  • make the vEB tree additionally update those pointers properly when inserting/deleting
  • when looking something up, use the more efficient succ/pred pointers
  • otherwise when updating the pointers, use the old implementation internally to search the vEB tree

Add Lazy Tree Node Allocation

Issue:

  • when creating a vEB tree, all children nodes are created as well, cascading down to the base case
  • this is not even necessary because vEB could allocate memory in constant time only when a specific child node is needed
  • the insert() operation would still remain within the nice O(log log u) time complexity

Suggested Solution:

  • implement an allocation approach initializing an array in constant time (see computer scienece script)
    • requires something like malloc(): hard to do in C#, use unsafe blocks and Marshal.AllocHGlobal()
  • only create the local / global arrays when the structure contains more than one element
  • make sure to free this memory when deleting as the unsafe allocation yields unmanaged storage

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.