Git Product home page Git Product logo

sandpiper's Introduction

Loopy Belief Propagation

The project contains an implementation of Loopy Belief Propagation, a popular message passing algorithm for performing inference in probabilistic graphical models. It provides exact inference for graphical models without loops. While inference for graphical models with loops is approximate, in practice it is shown to work well. Our implementation is generic and operates on factor graph representation of graphical models. It handles factors of any order, and variable domains of any size. In addition, we provide specialized implementation for pairwise factors. The algorithm is implemented with Apache Spark GraphX, and thus can scale to large scale models. Further, it supports computations in log scale for numerical stability.

Building

Clone and build with Maven:

git clone https://github.com/HewlettPackard/sandpiper.git
mvn package

The library will appear in the target folder. By default it is compiled against Scala 2.11.8 and Spark 1.6.1. If you need a different Scala/Spark versions, please modify pom.xml.

BP for Factor Graph

We use factor graph format derived from libDAI with small modification (example):

# number of factors in the file
7

# factor id preceded with 3 hashes (unique, must not intersect with variable name/id)
### 5
# number of vars
1
# name of vars
1
# number of values of vars
2
# number of non-zero entries in factor table
2
# non-zero factor table entries
0  1
1  1

Data parallel loading requires the factors to be partitioned across multiple files.

Code

import sparkle.graph._
// Load graph from the input path with factor files
val graph = Utils.loadLibDAIToFactorGraph(sc, inputPath)
// Run Belief Propagation on the graph with maximum iterations "maxIterations" and epsilon "epsilon"
val beliefs = BP(graph, maxIterations, epsilon)

Command line

./spark-submit --master spark://MASTER:PORT --jars belief-propagation.jar --class sparkle.graph.BP INPUT_PATH OUTPUT_PATH N_ITERATIONS EPSILON

BP for pairwise factors

We use two file format. Each line of the first file contains variable id and its prior delimited with space (example):

1 1.0 1.0
2 1.0 0.0
...

The second file contains pairwise factors with first variable id, second variable id, and the factor in column-major format delimited with space (example):

1 2 1.0 0.9 0.9 1.0 
2 3 0.1 1.0 1.0 0.1 
...

No additional effort is required for enabling data parallel loading since each variable and factor are stored on separate lines and can automatically be processed by Spark in parallel.

Code

import sparkle.graph._
// Load graph from two input files
val graph = PairwiseBP.loadPairwiseGraph(sc, variableFile, factorFile)
// Run Belief Propagation on the graph with maximum iterations "maxIterations" and epsilon "epsilon"
val beliefs = PairwiseBP(graph, maxIterations, epsilon)

Command line

./spark-submit --master spark://MASTER:PORT --jars belief-propagation.jar --class sparkle.graph.PairwiseBP VARIABLES_FILE FACTORS_FILE OUTPUT_PATH N_ITERATIONS EPSILON

References

  1. Belief propagation algorithm
  2. Yedidia, J.S.; Freeman, W.T.; Weiss, Y., "Understanding Belief Propagation and Its Generalizations" in Exploring Artificial Intelligence in the New Millennium, Lakemeyer, G. and Nebel, B., Eds., ISBN: 1-55860-811-7, chapter 8, pp. 239-236, Morgan Kaufmann Publishers, January 2003. (pdf) [An excellent description of BP]
  3. LibDAI file format is used with explicit factor ids: example.
  4. Talk by the project authors: Alexander Ulanov, Manish Marwah "Malicious site detection with large scale belief propagation", Strata+Hadoop, March 2017 (slides)
  5. [BigData 2017] Alexander Ulanov, Manish Marwah, Mijung Kim, Roshan Dathathri, Carlos Zubieta, and Jun Li. Sandpiper: Scaling Probabilistic Inferencing to Large Scale Graphical Models, In IEEE Big Data , Boston, MA, Dec. 2017.

How to contribute

Contributors are required to sign the Corporate Contributor License Agreement

sandpiper's People

Contributors

avulanov avatar magg avatar manish-marwah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sandpiper's Issues

Where are the message destination Factor or Variable excluded in message aggregations?

sorry this is not an issue at all, i just didn't see any other way to communicate.

i am looking at the source code and it looks really great! it is a pleasure to read. but i am struggling to understand one thing: when a variable sends a message to a factor its some sort of aggregation over all messages the variable received from factors excluding the factor the message is about to be send to. similarly when a factor sends a message to a variable it is also an aggregation over messages received from variables excluding the variable the message is about to be send to. where in the code do i see this exclusion behavior? i have a hard time finding it.

thanks for you help! best, koert

Pairwise factors?

Hi,

This is not exactly an issue. I am using the PairwiseBP algorithm for HTTP log files. But unable to understand the factor values in edges. And in sample data also its not very much clear. Can you please provide a little clarity on how are we getting edge factors?

Thanks

java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;

Hi,

I tried running the BP algorithm with the provided example. I am using spark 1.6.1. Also, build the code using steps provided but getting following error:

java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
	at sparkle.graph.PairwiseBP$.apply(PairwiseBP.scala:37)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:50)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:52)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:54)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:56)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:62)
	at $iwC$$iwC$$iwC.<init>(<console>:64)
	at $iwC$$iwC.<init>(<console>:66)
	at $iwC.<init>(<console>:68)
	at <init>(<console>:70)
	at .<init>(<console>:74)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Following is my code:

import sparkle.graph._

val variableFile = "/home/user/sandpiper/data/vertex4.txt"
val factorFile = "/home/user/sandpiper/data/edge4.txt"

val graph = PairwiseBP.loadPairwiseGraph(sc, variableFile, factorFile)

val maxIterations = 50
val epsilon = 1e-3
val beliefs = PairwiseBP(graph, maxIterations, epsilon)

Please help me regarding the same.

Are there any limitations with the input FG?

Thanks for your effort on this.

I am wondering if there are any limitations on the input FG in terms of size or shape? I modified your input FG to have few factors, it fails with null pointers at the point of loading the graph.

Also, would be able to tell me how you decide the factor values in the input FG for BP?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.