The bloomfilter from jnkhunter

What is a bloom filter?

A bloom filter is a probabilistic data structure that is useful for collecting a massive set of data and then later determining if a specific piece of data is a member of the set.

A Java Set (java.util.Set) is the usual and obvious choice for this type of data structure. So why would a bloom filter be needed? Bloom filters are extremely memory efficient. Where a java.util.Set can take up 200MB and more to store say 1 million UUID's in memory, a bloom filter may use only 1MB to store the elements.

The trade-off is that a bloom filter will produce false positives when querying a bloom filter for membership of a specific piece of data. In other words, a bloom filter may return "true" for an item that is not in the set. This is mainly a side effect of hash function collisions. Getting false positives sounds really bad at first, but is tolerable when absolute precision is not needed. For instance in a web analytics application, you may need to quickly determine if a url has been visited in between Map Reduce jobs. You can use a bloom filter to keep a running tally of recently visited URLs in memory, and interrogate an in-memory bloom filter for membership of the URL exactly at the time it was visited. This is possible because the bloom filter requires very little memory to store millions and even billions of items.

There are lots of great libraries out there that provide bloom filters (Google's guava being one of them). The implementation here is very naive, and will be improved over time.

Important properties of a bloom filter:

Probability of a false positive depends on the density of the 1's in the array and the 3 of hash functions. The probability of a collision (false positive) is (Fraction of 1's)^(# of hash functions). As the number of 1's increases, obviously the chances of a collision increase.
The number of 1's is approximately the number of elements inserted * number of hash functions Collisions lower this number slightly

For a great introduction to bloom filters: https://www.youtube.com/watch?v=qBTdukbzc78

jnkhunter / bloomfilter Goto Github PK

bloomfilter's Introduction

What is a bloom filter?

Important properties of a bloom filter:

bloomfilter's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent