Git Product home page Git Product logo

bloomfilter's Introduction

What is a bloom filter?

A bloom filter is a probabilistic data structure that is useful for collecting a massive set of data and then later determining if a specific piece of data is a member of the set.

A Java Set (java.util.Set) is the usual and obvious choice for this type of data structure. So why would a bloom filter be needed? Bloom filters are extremely memory efficient. Where a java.util.Set can take up 200MB and more to store say 1 million UUID's in memory, a bloom filter may use only 1MB to store the elements.

The trade-off is that a bloom filter will produce false positives when querying a bloom filter for membership of a specific piece of data. In other words, a bloom filter may return "true" for an item that is not in the set. This is mainly a side effect of hash function collisions. Getting false positives sounds really bad at first, but is tolerable when absolute precision is not needed. For instance in a web analytics application, you may need to quickly determine if a url has been visited in between Map Reduce jobs. You can use a bloom filter to keep a running tally of recently visited URLs in memory, and interrogate an in-memory bloom filter for membership of the URL exactly at the time it was visited. This is possible because the bloom filter requires very little memory to store millions and even billions of items.

There are lots of great libraries out there that provide bloom filters (Google's guava being one of them). The implementation here is very naive, and will be improved over time.

Important properties of a bloom filter:

  • Probability of a false positive depends on the density of the 1's in the array and the 3 of hash functions. The probability of a collision (false positive) is (Fraction of 1's)^(# of hash functions). As the number of 1's increases, obviously the chances of a collision increase.

  • The number of 1's is approximately the number of elements inserted * number of hash functions Collisions lower this number slightly

For a great introduction to bloom filters: https://www.youtube.com/watch?v=qBTdukbzc78

bloomfilter's People

Contributors

jnkhunter avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.