Git Product home page Git Product logo

jpmml-cascading's Introduction

JPMML-Cascading Build Status

[Cascading application framework] (http://www.cascading.org) library for scoring PMML models on Apache Hadoop.

Features

JPMML-Cascading is a thin wrapper around [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) libraries.

Installation

Library

JPMML-Cascading library JAR file is released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.1.1 (16 March, 2014).

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-cascading</artifactId>
	<version>1.1.1</version>
</dependency>

Hadoop job

Enter the project root directory and build using [Apache Maven] (http://maven.apache.org/):

mvn clean install

The build produces two JAR files:

  • pmml-cascading/target/pmml-cascading-1.1-SNAPSHOT.jar - Library JAR file.
  • pmml-cascading-example/target/example-1.1-SNAPSHOT-job.jar - Hadoop job JAR file.

Usage

Library

The [JPMML-Model] (https://github.com/jpmml/jpmml-model) library provides facilities for loading PMML schema version 3.X and 4.X documents into an instance of org.dmg.pmml.PMML:

// Use SAX filtering to transform PMML schema version 3.X and 4.X documents to PMML schema version 4.2 document
Source source = ImportFilter.apply(...);

PMML pmml = JAXBUtil.unmarshalPMML(source);

// Transform default SAX Locator information to java.io.Serializable form
pmml.accept(new SourceLocationTransformer());

The [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) library provides facilities for obtaining a proper instance of org.jpmml.evaluator.ModelEvaluator:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);

ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

The JPMML-Cascading library itself provides Cascading assembly planner class org.jpmml.cascading.PMMLPlanner, which integrates the specified org.jpmml.evaluator.ModelEvaluator instance into the specified Cascading flow instance. Internally, the heavy-lifting is handled by Cascading function class org.jpmml.cascading.PMMLFunction. The argument fields of the function match the active fields in the [MiningSchema element] (http://www.dmg.org/v4-2/MiningSchema.html). The output fields of the function match the target fields in the [MiningSchema element] (http://www.dmg.org/v4-2/MiningSchema.html), plus all the output fields in the [Output element] (http://www.dmg.org/v4-2/Output.html).

ModelEvaluator<?> modelEvaluator = ...;

FlowDef flowDef = ...;

PMMLPlanner pmmlPlanner = new PMMLPlanner(modelEvaluator);
pmmlPlanner.setRetainOnlyActiveFields();

flowDef = flowDef.addAssemblyPlanner(pmmlPlanner);

Please see [the example application] (https://github.com/jpmml/jpmml-cascading/blob/master/pmml-cascading-example/src/main/java/org/jpmml/cascading/Main.java) for full picture.

Hadoop job

The Hadoop job JAR file contains a single executable class org.jpmml.cascading.Main. It expects three arguments: 1) the name of the PMML file in local filesystem, 2) the Cascading Hfs specification of the source resource and 3) the Cascading Hfs specification of the sink resource:

For example, the following command scores the PMML file P:/cascading/model.pmml by reading arguments from the input file P:/cascading/input.tsv (TSV data format) and writing results to the output directory P:/cascading/output:

hadoop jar example-1.1-SNAPSHOT-job.jar P:/cascading/model.pmml file:///P:/cascading/input.tsv file:///P:/cascading/output

License

JPMML-Cascading is dual-licensed under the [GNU Affero General Public License (AGPL) version 3.0] (http://www.gnu.org/licenses/agpl-3.0.html) and a commercial license.

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-cascading's People

Contributors

vruusmann avatar

Watchers

James Cloos avatar Bruce Shin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.