Git Product home page Git Product logo

r2pmml's Introduction

R2PMML

R package for converting R models to PMML

Features

This package complements the standard pmml package:

  • It supports several model types (eg. gbm, iForest, ranger, xgb.Booster) that are not supported by the standard pmml package.
  • It is extremely fast and memory efficient. For example, it can convert a typical randomForest model to a PMML file in a few seconds time, whereas the standard pmml package requires several hours to do the same.

Prerequisites

  • Java 1.7 or newer. The Java executable must be available on system path.

Installation

Installing the package from its GitHub repository using the devtools package:

library("devtools")

install_git("git://github.com/jpmml/r2pmml.git")

Usage

Base functionality

Loading the package:

library("r2pmml")

Training and exporting a simple randomForest model:

library("randomForest")
library("r2pmml")

data(iris)

# Train a model using raw Iris data
iris.rf = randomForest(Species ~ ., data = iris, ntree = 7)
print(iris.rf)

# Export the model to PMML
r2pmml(iris.rf, "iris_rf.pmml")

Data pre-processing

The r2pmml function takes an optional argument preProcess, which associates the model with data pre-processing transformations.

Training and exporting a more sophisticated randomForest model:

library("caret")
library("randomForest")
library("r2pmml")

data(iris)

# Create a preprocessor
iris.preProcess = preProcess(iris, method = c("range"))

# Use the preprocessor to transform raw Iris data to pre-processed Iris data
iris.transformed = predict(iris.preProcess, newdata = iris)

# Train a model using pre-processed Iris data
iris.rf = randomForest(Species ~., data = iris.transformed, ntree = 7)
print(iris.rf)

# Export the model to PMML.
# Pass the preprocessor as the `preProcess` argument
r2pmml(iris.rf, "iris_rf.pmml", preProcess = iris.preProcess)

Model formulae

Alternatively, it is possible to associate lm, glm and randomForest models with data pre-processing transformations via model formulae.

Supported model formula features:

  • Interaction terms.
  • base::I(..) function terms:
    • Logical operators &, | and !.
    • Relational operators ==, !=, <, <=, >= and >.
    • Arithmetic operators +, -, / and *.
    • Exponentiation operators ^ and **.
    • The is.na function.
    • Arithmetic functions abs, ceiling, exp, floor, log, log10, round and sqrt.
  • base::cut() and base::ifelse() function terms.
  • plyr::revalue() and plyr::mapvalues() function terms.

Training and exporting a glm model:

library("plyr")
library("r2pmml")

# Load and prepare the Auto-MPG dataset
auto = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data", quote = "\"", header = FALSE, na.strings = "?", row.names = NULL, col.names = c("mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model_year", "origin", "car_name"))
auto$origin = as.factor(auto$origin)
auto$car_name = NULL
auto = na.omit(auto)

# Train a model
auto.glm = glm(mpg ~ (. - horsepower - weight - origin) ^ 2 + I(displacement / cylinders) + cut(horsepower, breaks = c(0, 50, 100, 150, 200, 250)) + I(log(weight)) + revalue(origin, replace = c("1" = "US", "2" = "Europe", "3" = "Japan")), data = auto)

# Export the model to PMML
r2pmml(auto.glm, "auto_glm.pmml")

Package ranger

Training and exporting a ranger model:

library("ranger")
library("r2pmml")

data(iris)

# Train a model.
# Keep the forest data structure by specifying `write.forest = TRUE`
iris.ranger = ranger(Species ~ ., data = iris, num.trees = 7, write.forest = TRUE)
print(iris.ranger)

# Export the model to PMML.
# Pass the training dataset as the `dataset` argument
r2pmml(iris.ranger, "iris_ranger.pmml", dataset = iris)

Package xgboost

Training and exporting an xgb.Booster model:

library("xgboost")
library("r2pmml")

data(iris)

iris_X = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1

# Generate XGBoost feature map
iris.fmap = genFMap(iris_X)

# Generate XGBoost DMatrix
iris.DMatrix = genDMatrix(iris_y, iris_X)

# Train a model
iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)

# Export the model to PMML.
# Pass the feature map as the `fmap` argument.
# Pass the name and category levels of the target field as `response_name` and `response_levels` arguments, respectively.
# Pass the value of missing value as the `missing` argument
# Pass the optimal number of trees as the `ntreelimit` argument (analogous to the `ntreelimit` argument of the `xgb::predict.xgb.Booster` function)
r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

Advanced functionality

Tweaking JVM configuration:

Sys.setenv(JAVA_TOOL_OPTIONS = "-Xms4G -Xmx8G")

r2pmml(iris.rf, "iris_rf.pmml")

Employing a custom converter class:

r2pmml(iris.rf, "iris_rf.pmml", converter = "com.mycompany.MyRandomForestConverter", converter_classpath = "/path/to/myconverter-1.0-SNAPSHOT.jar")

De-installation

Removing the package:

remove.packages("r2pmml")

License

R2PMML is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.

Additional information

Please contact [email protected]

r2pmml's People

Contributors

vruusmann avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.