Git Product home page Git Product logo

spark-drools-example's Introduction

#Spark Drools Example#

(Work In Progress)

This project shows a simple example of how to integrate drools into an Apache Spark job. The steps are pretty simple:

  1. Load a knowledge base using any sources supported by drools
  2. Broadcast the knowledge base to all workers
  3. Use the broadcasted rules within the process

Broadcasting the rules ensures that they are only loaded and compiled once, sent to workers in their compiled form, and then reused throughout the job.

##Prerequisites##

##Setup## The included Vagrantfile(www.vagrantup.com) will spin up a VM with spark and a Java 1.8 build environment.

In the project's top-level directory, run the following commands:

vagrant up
vagrant ssh

If you are unable to ssh on your machine, make sure the ssh server is started.

Once ssh'ed into vagrant you can run this example by doing the following:

cd /vagrant
mvn package
/opt/spark-1.4.1-bin-hadoop2.6/bin/spark-submit --class "com.awesome.App" --master local[4] target/SparkDroolsExample-1.0-SNAPSHOT.jar

spark-drools-example's People

Contributors

cpitman avatar justincohler avatar mautematico avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-drools-example's Issues

Next steps

Hello,

I am investigating a similar approach.
Can you share some of your finding or experience regarding the integration of Spark and Drools?

thank you

ConcurrentModificationException

When running on apache spark 1.6.1, I'm getting a java.util.ConcurrentModificationException

I'll try to provide detailed steps to reproduce, but I think the issue is while

 Broadcast<KieBase> broadcastRules = sc.broadcast(rules);

ConcurrentModificationException while running in spark cluster.

"Caused by: java.util.ConcurrentModificationException",
"\tat java.util.Vector$Itr.checkForComodification(Vector.java:1184)",
"\tat java.util.Vector$Itr.next(Vector.java:1137)",
"\tat com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:92)",
"\tat com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)",
"\tat com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)",
"\tat com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)",
"\t... 44 more",

drools integration spark scala

I am new to drools, can you please advise how to run drools on scala, unable to run you example.

I kept kmodule.xml in resources/META-INF

<?xml version="1.0" encoding="UTF-8"?>
<kmodule xmlns="http://jboss.org/kie/6.0.0/kmodule">
    <kbase name="AuditKBase" default="true" packages="com.srinu.study.etl.framework.drools">
        <ksession name="AuditKSession" type="stateless" default="true" />
    </kbase>
</kmodule>

drl file in resource

package com.srinu.study.etl.framework.drools;

import com.srinu.study.etl.framework.Transaction

rule "Approve Good Credit"
  when
    a: Transaction(creditScore > 600)
  then
    a.setApproved(true);
end

Scala code

package com.srinu.study.etl.framework.drools

import com.srinu.study.etl.framework.logging.Logging
import com.typesafe.config.Config
import org.apache.spark.sql._
import org.kie.api.{KieBase, KieServices}
import org.kie.api.runtime.KieContainer
import org.kie.api.runtime.StatelessKieSession
import org.kie.internal.command.CommandFactory


case class Transaction(sno:Int,first_name:String,last_name:String,requestAmount:Int,creditScore:Int)


class DroolsRules(spark: SparkSession, config: Config, inputDate: String, drlFilePath: String) extends Logging
  with Serializable {
  private var approved = false
  def applyDrools(): Unit = {

    import spark.implicits._

    val inputData = Seq((1, "John", "Doe", 10000, 568),
      (2, "John", "Greg", 12000, 654),
      (3, "Mary", "Sue", 100, 568),
      (4, "Greg", "Darcy", 1000000, 788),
      (5, "Jane", "Stuart", 10, 788))
    val applicants = inputData.toDF("sno","first_name","last_name","requestAmount","creditScore").as[Transaction]
    applicants.show()
     val rules:KieBase= loadRules()
  }
/*
  public static KieBase loadRules() {
    KieServices kieServices = KieServices.Factory.get();
    KieContainer kieContainer = kieServices.getKieClasspathContainer();

    return kieContainer.getKieBase();
  }
 */
  def loadRules(): KieBase = {
    val kieServices:KieServices = KieServices.Factory.get()
    val kieContainer:KieContainer = kieServices.getKieClasspathContainer()
    kieContainer.getKieBase()
  }
  /*
   public static Applicant applyRules(KieBase base, Applicant a) {
    StatelessKieSession session = base.newStatelessKieSession();
    session.execute(CommandFactory.newInsert(a));
    return a;
  }
   */
  def setApproved(_approved: Boolean): Unit = {
    approved = _approved
  }

  def applyRules( base:KieBase, trans:Transaction):Transaction={
    val session:StatelessKieSession = base.newStatelessKieSession()
    session.execute(CommandFactory.newInsert(trans))
    trans
  }
}

job failing @ val kieContainer:KieContainer = kieServices.getKieClasspathContainer()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.