Git Product home page Git Product logo

jpmml-sklearn's Introduction

JPMML-SkLearn

Java library and command-line application for converting [Scikit-Learn] (http://scikit-learn.org/) models to PMML.

Features

Prerequisites

The Python side of operations

Python installation can be validated as follows:

import sklearn, pandas, sklearn_pandas, joblib, numpy

print(sklearn.__version__)
print(pandas.__version__)
print(sklearn_pandas.__version__)
print(joblib.__version__)
print(numpy.__version__)

The JPMML-SkLearn side of operations

  • Java 1.7 or newer.

Installation

Enter the project root directory and build using [Apache Maven] (http://maven.apache.org/):

mvn clean install

The build produces an executable uber-JAR file target/converter-executable-1.2-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use Python to train a model.
  2. Serialize the model in pickle data format to a file in a local filesystem.
  3. Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.

The Python side of operations

Load data to a pandas.DataFrame object:

import pandas

iris_df = pandas.read_csv("Iris.csv")

Describe data and data pre-processing actions by creating an appropriate sklearn_pandas.DataFrameMapper object:

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn_pandas import DataFrameMapper
from sklearn2pmml.decoration import ContinuousDomain

iris_mapper = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler(), PCA(n_components = 3)]),
    ("Species", None)
])

iris = iris_mapper.fit_transform(iris_df)

Train an appropriate estimator object:

from sklearn.feature_selection import SelectKBest
from sklearn.ensemble.forest import RandomForestClassifier
from sklearn.pipeline import Pipeline

iris_X = iris[:, 0:3]
iris_y = iris[:, 3]

iris_estimator = Pipeline([
    ("selector", SelectKBest(k = 2)),
    ("estimator", RandomForestClassifier(min_samples_leaf = 5))
])
iris_estimator.fit(iris_X, iris_y)

Serialize the sklearn_pandas.DataFrameMapper object and estimator object in pickle data format:

from sklearn.externals import joblib

joblib.dump(iris_mapper, "mapper.pkl", compress = 9)
joblib.dump(iris_estimator, "estimator.pkl", compress = 9)

Please see the test script file [main.py] (https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/main.py) for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting the estimator pickle file estimator.pkl to a PMML file estimator.pmml:

java -jar target/converter-executable-1.2-SNAPSHOT.jar --pkl-input estimator.pkl --pmml-output estimator.pmml

Converting the sklearn_pandas.DataFrameMapper pickle file mapper.pkl and the estimator pickle file estimator.pkl to a PMML file mapper-estimator.pmml:

java -jar target/converter-executable-1.2-SNAPSHOT.jar --pkl-mapper-input mapper.pkl --pkl-estimator-input estimator.pkl --pmml-output mapper-estimator.pmml

Getting help:

java -jar target/converter-executable-1.2-SNAPSHOT.jar --help

License

JPMML-SkLearn is licensed under the [GNU Affero General Public License (AGPL) version 3.0] (http://www.gnu.org/licenses/agpl-3.0.html). Other licenses are available on request.

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-sklearn's People

Contributors

vruusmann avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.