Git Product home page Git Product logo

bloom_filter's Introduction

Bloom filter

A bloom filter is a data-structure that can be used to check if a set contains an element. It uses way less memory than a conventional set data-structure by sacrificing accuracy.

Say we are building a log-structured merge-tree, we can use a bloom filter to find out if the LSM-tree contains a particular key in O(1) time in most cases, the downside is that sometimes the bloom filter would say that the LSM-tree contains a key, but it actually does not and we would go searching for the value that's mapped to the key and never actually find it.

How it works

A bloom filter is just a bit-set that uses n deterministic hash functions to add elements to it.

empty bit-set

Adding elements to the set

To add the key bob to the set, we run the key through each of the n hash functions and map the hash function output to one of the positions in the bit-set and for each position, we flip the bit to 1.

bit-set after bob was added to the bloom filter

Finding out if the set contains an element

To find out if the set contains the key bob, we run the key through each of the n hash functions again -- since the hash functions must be deterministic they will always map to the same position in the bit-set -- and check if the bit is set to 1 for each of the bit-set positions we reached after running the key through the hash functions. If every hash function maps to a bit set to 1, it means the key is in the set.

bob is in the set because every hash function mapped it to a bit set to 1

alice is not in the set because not every hash function mapped to a bit set to 1

False positives

Since collisions can happen some keys will be mapped to bits that were set to 1 when other keys were added to the set. In this case, the bloom filter will say that it contains the key even though it does not.

the bit-set after bob was added to it

since john maps to the same bits as bob and the bits were set to 1 after bob was added to the set, we got a false positive

References

Network Applications of Bloom Filters: A Survey - https://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf

bloom_filter's People

Contributors

poorlydefinedbehaviour avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.