Git Product home page Git Product logo

dlhub_sdk's Introduction

DLHub SDK

CI Coverage Status PyPI version Documentation Status

DLHub SDK contains a Python interface to the Data and Learning Hub for Science (DLHub). These interfaces include functions for quickly describing a model in the correct schema for DLHub, and discovering or using models that other scientists have published.

Installation

DLHub SDK is on PyPi, and can be installed using pip

pip install dlhub-sdk

Documentation

The full documentation for dlhub_sdk is avilable on Read the Docs

Example Usage

The following sections are short introductions to using the DLHub SDK.

Discovering and Running Models

Users interact with DLHub by submitting HTTP requests to a REST API. In an effort to make using this API simple, the DLHub SDK contains a client that provides a Python API to these requests and hides the tedious operations involved in making an HTTP call from Python.

To create the client, call

from dlhub_sdk.client import DLHubClient

client = DLHubClient()

The client makes it simple to find interesting machine learning models. For example, you can get all of the models on DLHub by

d = client.get_servables()

That command will return a Pandas DataFrame of models, which looks something like:

name description id
0 mnist_tiny_example MNIST Digit Classifier with a small NN 123
1 mnist CNN acheiving 99.25% on the MNIST test data 111
2 formation_energy Predict the formation enthalpy of a material given its composition 112

Once you get the name of a model, it can be run thorugh the client as well:

client.run('ryan_globusid/noop', inputs='my data')

Publishing a Model

As a simple example, we will show how to submit a machine learning model created based on the Iris Dataset. Full scripts for this example model are in /examples/iris.

Describe the Model

For brevity, we will upload much less metadata about a model created using Scikit-Learn.

We simply load in a Scikit-Learn model from a pickle file, and then provide a minimal amount of information about it.

from dlhub_sdk.models.servables.sklearn import ScikitLearnModel

model_info = ScikitLearnModel.create_model('model.pkl', n_input_columns=len(data.columns) - 1,
                                           classes=data['species'].unique())

#    Describe the model
model_info.set_title("Example Scikit-Learn Model")
model_info.set_name("iris_svm")
model_info.set_domains(["biology"])

The SDK will inspect the pickle file to determine the type of the model and the version of scikit-learn that was used to create it.

{
  "datacite": {
    "creators": [],
    "titles": [
      {
        "title": "Example Scikit-Learn Model"
      }
    ],
    "publisher": "DLHub",
    "publicationYear": "2018",
    "identifier": {
      "identifier": "10.YET/UNASSIGNED",
      "identifierType": "DOI"
    },
    "resourceType": {
      "resourceTypeGeneral": "InteractiveResource"
    }
  },
  "dlhub": {
    "version": "0.1",
    "domains": ["biology"],
    "visible_to": [
      "public"
    ],
    "id": null,
    "name": "iris_svm",
    "files": {
        "model": "model.pkl",
        "other": []
    },
    "dependencies": {
      "python": {
        "scikit-learn": "0.19.1"
      }
    }
  },
  "servable": {
    "methods": {
      "run": {
        "input": {
          "type": "ndarray",
          "description": "List of records to evaluate with model. Each record is a list of 4 variables.",
          "shape": [
            null,
            4
          ],
          "item_type": {
            "type": "float"
          }
        },
        "output": {
          "type": "ndarray",
          "description": "Probabilities for membership in each of 3 classes",
          "shape": [
            null,
            3
          ],
          "item_type": {
            "type": "float"
          }
        },
        "parameters": {},
        "method_details": {
          "method_name": "_predict_proba"
        }
      }
    },
    "shim": "sklearn.ScikitLearnServable",
    "type": "Scikit-learn estimator",
    "model_type": "SVC",
    "model_summary": "SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,\n  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',\n  max_iter=-1, probability=True, random_state=None, shrinking=True,\n  tol=0.001, verbose=False)",
    "options": {
      "serialization_method": "pickle",
      "is_classifier": true,
      "classes": [
        "setosa",
        "versicolor",
        "virginica"
      ]
    }
  }
}

At this point, we are ready to publish both the model and dataset on DLHub.

Publishing to DLHub

You can publish a model to DLHub by first reading in the metadata from file and then calling the client:

from dlhub_sdk.models import BaseMetadataModel
from dlhub_sdk.client import DLHubClient
import json

# Read the model description
with open('model.json') as fp:
    model = BaseMetadataModel.from_dict(json.load(fp)) 

# Publish the model to DLHub
client = DLHubClient()
client.publish_servable(model)
print('Model published to DLHub. ID:', model.dlhub_id)

When you call this script, the DLHub client will assign your model a unique identifier and the model will soon be available for you to use via DLHub.

Project Support

This material is based upon work supported by Laboratory Directed Research and Development (LDRD) funding from Argonne National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357.

dlhub_sdk's People

Contributors

ascourtas avatar blaiszik avatar ericblau avatar ethantruelove avatar isaac-darling avatar kjschmidt913 avatar maxtuecke avatar nathanpruyne avatar ribhavb avatar ryanchard avatar wardlt avatar zhuozhaoli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.