Git Product home page Git Product logo

sandpiper's Introduction

Loopy Belief Propagation

The project contains an implementation of Loopy Belief Propagation, a popular message passing algorithm for performing inference in probabilistic graphical models. It provides exact inference for graphical models without loops. While inference for graphical models with loops is approximate, in practice it is shown to work well. Our implementation is generic and operates on factor graph representation of graphical models. It handles factors of any order, and variable domains of any size. In addition, we provide specialized implementation for pairwise factors. The algorithm is implemented with Apache Spark GraphX, and thus can scale to large scale models. Further, it supports computations in log scale for numerical stability.

Building

Clone and build with Maven:

git clone https://github.com/HewlettPackard/sandpiper.git
mvn package

The library will appear in the target folder. By default it is compiled against Scala 2.11.8 and Spark 1.6.1. If you need a different Scala/Spark versions, please modify pom.xml.

BP for Factor Graph

We use factor graph format derived from libDAI with small modification (example):

# number of factors in the file
7

# factor id preceded with 3 hashes (unique, must not intersect with variable name/id)
### 5
# number of vars
1
# name of vars
1
# number of values of vars
2
# number of non-zero entries in factor table
2
# non-zero factor table entries
0  1
1  1

Data parallel loading requires the factors to be partitioned across multiple files.

Code

import sparkle.graph._
// Load graph from the input path with factor files
val graph = Utils.loadLibDAIToFactorGraph(sc, inputPath)
// Run Belief Propagation on the graph with maximum iterations "maxIterations" and epsilon "epsilon"
val beliefs = BP(graph, maxIterations, epsilon)

Command line

./spark-submit --master spark://MASTER:PORT --jars belief-propagation.jar --class sparkle.graph.BP INPUT_PATH OUTPUT_PATH N_ITERATIONS EPSILON

BP for pairwise factors

We use two file format. Each line of the first file contains variable id and its prior delimited with space (example):

1 1.0 1.0
2 1.0 0.0
...

The second file contains pairwise factors with first variable id, second variable id, and the factor in column-major format delimited with space (example):

1 2 1.0 0.9 0.9 1.0 
2 3 0.1 1.0 1.0 0.1 
...

No additional effort is required for enabling data parallel loading since each variable and factor are stored on separate lines and can automatically be processed by Spark in parallel.

Code

import sparkle.graph._
// Load graph from two input files
val graph = PairwiseBP.loadPairwiseGraph(sc, variableFile, factorFile)
// Run Belief Propagation on the graph with maximum iterations "maxIterations" and epsilon "epsilon"
val beliefs = PairwiseBP(graph, maxIterations, epsilon)

Command line

./spark-submit --master spark://MASTER:PORT --jars belief-propagation.jar --class sparkle.graph.PairwiseBP VARIABLES_FILE FACTORS_FILE OUTPUT_PATH N_ITERATIONS EPSILON

References

  1. Belief propagation algorithm
  2. Yedidia, J.S.; Freeman, W.T.; Weiss, Y., "Understanding Belief Propagation and Its Generalizations" in Exploring Artificial Intelligence in the New Millennium, Lakemeyer, G. and Nebel, B., Eds., ISBN: 1-55860-811-7, chapter 8, pp. 239-236, Morgan Kaufmann Publishers, January 2003. (pdf) [An excellent description of BP]
  3. LibDAI file format is used with explicit factor ids: example.
  4. Talk by the project authors: Alexander Ulanov, Manish Marwah "Malicious site detection with large scale belief propagation", Strata+Hadoop, March 2017 (slides)
  5. [BigData 2017] Alexander Ulanov, Manish Marwah, Mijung Kim, Roshan Dathathri, Carlos Zubieta, and Jun Li. Sandpiper: Scaling Probabilistic Inferencing to Large Scale Graphical Models, In IEEE Big Data , Boston, MA, Dec. 2017.

How to contribute

Contributors are required to sign the Corporate Contributor License Agreement

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.