Bloom filter

A bloom filter is a data-structure that can be used to check if a set contains an element. It uses way less memory than a conventional set data-structure by sacrificing accuracy.

Say we are building a log-structured merge-tree, we can use a bloom filter to find out if the LSM-tree contains a particular key in O(1) time in most cases, the downside is that sometimes the bloom filter would say that the LSM-tree contains a key, but it actually does not and we would go searching for the value that's mapped to the key and never actually find it.

How it works

A bloom filter is just a bit-set that uses n deterministic hash functions to add elements to it.

empty bit-set

Adding elements to the set

To add the key bob to the set, we run the key through each of the n hash functions and map the hash function output to one of the positions in the bit-set and for each position, we flip the bit to 1.

bit-set after bob was added to the bloom filter

Finding out if the set contains an element

To find out if the set contains the key bob, we run the key through each of the n hash functions again -- since the hash functions must be deterministic they will always map to the same position in the bit-set -- and check if the bit is set to 1 for each of the bit-set positions we reached after running the key through the hash functions. If every hash function maps to a bit set to 1, it means the key is in the set.

bob is in the set because every hash function mapped it to a bit set to 1

alice is not in the set because not every hash function mapped to a bit set to 1

False positives

Since collisions can happen some keys will be mapped to bits that were set to 1 when other keys were added to the set. In this case, the bloom filter will say that it contains the key even though it does not.

the bit-set after bob was added to it

since john maps to the same bits as bob and the bits were set to 1 after bob was added to the set, we got a false positive

References

Network Applications of Bloom Filters: A Survey - https://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf

poorlydefinedbehaviour / bloom_filter Goto Github PK

bloom_filter's Introduction

Bloom filter

How it works

Adding elements to the set

Finding out if the set contains an element

False positives

References

bloom_filter's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent