Git Product home page Git Product logo

libbf's Introduction

libbf is a C++11 library which implements various Bloom filters, including:

  • Basic
  • Counting
  • Spectral MI
  • Spectral RM
  • Bitwise
  • A^2
  • Stable

Synopsis

#include <iostream>
#include <bf.h>

int main()
{
  bf::basic_bloom_filter b(0.8, 100);

  // Add two elements.
  b.add("foo");
  b.add(42);

  // Test set membership
  std::cout << b.lookup("foo") << std::endl;  // 1
  std::cout << b.lookup("bar") << std::endl;  // 0
  std::cout << b.lookup(42) << std::endl;     // 1

  // Remove all elements.
  b.clear();
  std::cout << b.lookup("foo") << std::endl;  // 0
  std::cout << b.lookup(42) << std::endl;     // 0

  return 0;
}

Requirements

  • A C++11 compiler (GCC >= 4.7 or Clang >= 3.2)
  • CMake (>= 2.8)
  • Boost (>= 1.46) (optional, for unit testing)

Installation

The build process uses CMake, wrapped in autotools-like scripts. The configure script honors the CXX environment variable to select a specific C++compiler. For example, to compile libbf with Clang, install it under PREFIX, and use a Boost installation in the custom prefix PREFIX, use the following commands:

export CXX=clang++
./configure --prefix=PREFIX --with-boost=PREFIX
make
make test
make install

Documentation

The most recent version of the Doxygen API documentation exists at http://mavam.github.io/libbf/api. Alternatively, you can build the documentation locally via make doc and then browse to doc/gh-pages/api/index.html.

Usage

After having installed libbf, you can use it in your application by including the header file bf.h and linking against the library. All data structures reside in the namespace bf and the following examples assume:

using namespace bf;

Each Bloom filter inherits from the abstract base class bloom_filter, which provides addition and lookup via the virtual functions add and lookup. These functions take an object as argument, which serves a light-weight view over sequential data for hashing.

For example, if you can create a basic Bloom filter with a desired false-positive probability and capacity as follows:

// Construction.
bloom_filter* bf = new basic_bloom_filter(0.8, 100);

// Addition.
bf->add("foo");
bf->add(42);

// Lookup.
assert(bf->lookup("foo") == 1);
assert(bf->lookup(42) == 1);

// Remove all elements from the Bloom filter.
bf->clear();

In this case, libbf computes the optimal number of hash functions needed to achieve the desired false-positive rate which holds until the capacity has been reached (80% and 100 distinct elements, in the above example). Alternatively, you can construct a basic Bloom filter by specifying the number of hash functions and the number of cells in the underlying bit vector:

bloom_filter* bf = new basic_bloom_filter(make_hasher(3), 1024);

Since not all Bloom filter implementations come with closed-form solutions based on false-positive probabilities, most constructors use this latter form of explicit resource provisioning.

In the above example, the free function make_hasher constructs a hasher-an abstraction for hashing objects k times. There exist currently two different hasher, a default_hasher and a double_hasher. The latter uses a linear combination of two pairwise-independent, universal hash functions to produce the k digests, whereas the former merely hashes the object k times.

Evaluation

libbf also ships with a small Bloom filter tool bf in the test directory. This program supports evaluation the accuracy of the different Bloom filter flavors with respect to their false-positive and false-negative rates. Have a look at the console help (-h or --help) for detailed usage instructions.

The tool operates in two phases:

  1. Read input from a file and insert it into a Bloom filter
  2. Query the Bloom filter and compare the result to the ground truth

For example, consider the following input file:

foo
bar
baz
baz
foo

From this input file, you can generate the real ground truth file as follows:

sort input.txt | uniq -c | tee query.txt
   1 bar
   2 baz
   2 foo

The tool bf will compute false-positive and false-negative counts for each element, based on the ground truth given. In the case of a simple counting Bloom filter, an invocation may look like this:

bf -t counting -m 2 -k 3 -i input.txt -q query.txt | column -t

Yielding the following output:

TN  TP  FP  FN  G  C  E
0   1   0   0   1  1  bar
0   1   0   1   2  1  baz
0   1   0   2   2  1  foo

The column headings denote true negatives (TN), true positives (TP), false positives (FP), false negatives (FN), ground truth count (G), actual count (C), and the queried element. The counts are cumulative to support incremental evaluation.

License

libbf comes with a BSD-style license (see COPYING for details).

libbf's People

Contributors

mavam avatar

Watchers

James Cloos avatar Chen Qin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.