hewlettpackard / sandpiper Goto Github PK

Implementation of the Loopy Belief Propagation algorithm for Apache Spark

License: Apache License 2.0

Scala 100.00%

sandpiper's Introduction

Loopy Belief Propagation

The project contains an implementation of Loopy Belief Propagation, a popular message passing algorithm for performing inference in probabilistic graphical models. It provides exact inference for graphical models without loops. While inference for graphical models with loops is approximate, in practice it is shown to work well. Our implementation is generic and operates on factor graph representation of graphical models. It handles factors of any order, and variable domains of any size. In addition, we provide specialized implementation for pairwise factors. The algorithm is implemented with Apache Spark GraphX, and thus can scale to large scale models. Further, it supports computations in log scale for numerical stability.

Building

Clone and build with Maven:

git clone https://github.com/HewlettPackard/sandpiper.git
mvn package

The library will appear in the target folder. By default it is compiled against Scala 2.11.8 and Spark 1.6.1. If you need a different Scala/Spark versions, please modify pom.xml.

BP for Factor Graph

We use factor graph format derived from libDAI with small modification (example):

# number of factors in the file
7

# factor id preceded with 3 hashes (unique, must not intersect with variable name/id)
### 5
# number of vars
1
# name of vars
1
# number of values of vars
2
# number of non-zero entries in factor table
2
# non-zero factor table entries
0  1
1  1

Data parallel loading requires the factors to be partitioned across multiple files.

Code

import sparkle.graph._
// Load graph from the input path with factor files
val graph = Utils.loadLibDAIToFactorGraph(sc, inputPath)
// Run Belief Propagation on the graph with maximum iterations "maxIterations" and epsilon "epsilon"
val beliefs = BP(graph, maxIterations, epsilon)

Command line

./spark-submit --master spark://MASTER:PORT --jars belief-propagation.jar --class sparkle.graph.BP INPUT_PATH OUTPUT_PATH N_ITERATIONS EPSILON

BP for pairwise factors

We use two file format. Each line of the first file contains variable id and its prior delimited with space (example):

1 1.0 1.0
2 1.0 0.0
...

The second file contains pairwise factors with first variable id, second variable id, and the factor in column-major format delimited with space (example):

1 2 1.0 0.9 0.9 1.0 
2 3 0.1 1.0 1.0 0.1 
...

No additional effort is required for enabling data parallel loading since each variable and factor are stored on separate lines and can automatically be processed by Spark in parallel.

Code

import sparkle.graph._
// Load graph from two input files
val graph = PairwiseBP.loadPairwiseGraph(sc, variableFile, factorFile)
// Run Belief Propagation on the graph with maximum iterations "maxIterations" and epsilon "epsilon"
val beliefs = PairwiseBP(graph, maxIterations, epsilon)

Command line

./spark-submit --master spark://MASTER:PORT --jars belief-propagation.jar --class sparkle.graph.PairwiseBP VARIABLES_FILE FACTORS_FILE OUTPUT_PATH N_ITERATIONS EPSILON

References

Belief propagation algorithm
Yedidia, J.S.; Freeman, W.T.; Weiss, Y., "Understanding Belief Propagation and Its Generalizations" in Exploring Artificial Intelligence in the New Millennium, Lakemeyer, G. and Nebel, B., Eds., ISBN: 1-55860-811-7, chapter 8, pp. 239-236, Morgan Kaufmann Publishers, January 2003. (pdf) [An excellent description of BP]
LibDAI file format is used with explicit factor ids: example.
Talk by the project authors: Alexander Ulanov, Manish Marwah "Malicious site detection with large scale belief propagation", Strata+Hadoop, March 2017 (slides)
[BigData 2017] Alexander Ulanov, Manish Marwah, Mijung Kim, Roshan Dathathri, Carlos Zubieta, and Jun Li. Sandpiper: Scaling Probabilistic Inferencing to Large Scale Graphical Models, In IEEE Big Data , Boston, MA, Dec. 2017.

How to contribute

Contributors are required to sign the Corporate Contributor License Agreement

sandpiper's People

Contributors

Stargazers

Watchers

Forkers

huiwenhan magg avulanov wmastersonv manish-marwah semanticbeeng flyingwing hibiscuses clarkclk imilesmile aixuedegege aprilxiaoyanliu andrewrwilliams hazemsoliman115 titsuki frankuc peakcrosser7

sandpiper's Issues

Where are the message destination Factor or Variable excluded in message aggregations?

sorry this is not an issue at all, i just didn't see any other way to communicate.

i am looking at the source code and it looks really great! it is a pleasure to read. but i am struggling to understand one thing: when a variable sends a message to a factor its some sort of aggregation over all messages the variable received from factors excluding the factor the message is about to be send to. similarly when a factor sends a message to a variable it is also an aggregation over messages received from variables excluding the variable the message is about to be send to. where in the code do i see this exclusion behavior? i have a hard time finding it.

thanks for you help! best, koert

Pairwise factors?

Hi,

This is not exactly an issue. I am using the PairwiseBP algorithm for HTTP log files. But unable to understand the factor values in edges. And in sample data also its not very much clear. Can you please provide a little clarity on how are we getting edge factors?

Thanks

java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;

Hi,

I tried running the BP algorithm with the provided example. I am using spark 1.6.1. Also, build the code using steps provided but getting following error:

java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
	at sparkle.graph.PairwiseBP$.apply(PairwiseBP.scala:37)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:50)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:52)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:54)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:56)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:62)
	at $iwC$$iwC$$iwC.<init>(<console>:64)
	at $iwC$$iwC.<init>(<console>:66)
	at $iwC.<init>(<console>:68)
	at <init>(<console>:70)
	at .<init>(<console>:74)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Following is my code:

import sparkle.graph._

val variableFile = "/home/user/sandpiper/data/vertex4.txt"
val factorFile = "/home/user/sandpiper/data/edge4.txt"

val graph = PairwiseBP.loadPairwiseGraph(sc, variableFile, factorFile)

val maxIterations = 50
val epsilon = 1e-3
val beliefs = PairwiseBP(graph, maxIterations, epsilon)

Please help me regarding the same.

Are there any limitations with the input FG?

Thanks for your effort on this.

I am wondering if there are any limitations on the input FG in terms of size or shape? I modified your input FG to have few factors, it fails with null pointers at the point of loading the graph.

Also, would be able to tell me how you decide the factor values in the input FG for BP?

hewlettpackard / sandpiper Goto Github PK

sandpiper's Introduction

Loopy Belief Propagation

Building

BP for Factor Graph

BP for pairwise factors

References

How to contribute

sandpiper's People

Contributors

Stargazers

Watchers

Forkers

sandpiper's Issues

Where are the message destination Factor or Variable excluded in message aggregations?

Pairwise factors?

java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;

Are there any limitations with the input FG?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent