Git Product home page Git Product logo

cortex's Introduction

Deploy machine learning models in production

Cortex is an open source platform that takes machine learning models—trained with nearly any framework—and turns them into production web APIs in one command.

installdocsexampleswe're hiringemail uschat with us

Demo


Quickstart

Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to install Cortex on your AWS account before getting started.


Step 1: Configure your deployment

Define a deployment and an api resource. A deployment specifies a set of APIs that are deployed together. An api makes a model available as a web service that can serve real-time predictions. The configuration below will download the model from the cortex-examples S3 bucket. You can run the code that generated the model here.

# cortex.yaml

- kind: deployment
  name: text

- kind: api
  name: generator
  model: s3://cortex-examples/text-generator/gpt-2/124M
  request_handler: handler.py

Step 2: Add request handling

The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also decode the inference output. This can be implemented in a request handler file using the pre_inference and post_inference functions:

# handler.py

from encoder import get_encoder
encoder = get_encoder()


def pre_inference(sample, metadata):
    context = encoder.encode(sample["text"])
    return {"context": [context]}


def post_inference(prediction, metadata):
    response = prediction["sample"]
    return encoder.decode(response)

Step 3: Deploy to AWS

Deploying to AWS is as simple as running cortex deploy from your CLI. cortex deploy takes the declarative configuration from cortex.yaml and creates it on the cluster. Behind the scenes, Cortex containerizes the model, makes it servable using TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.

$ cortex deploy

deployment started

You can track the status of a deployment using cortex get. The output below indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.

$ cortex get generator --watch

status   up-to-date   available   requested   last update   avg latency
live     1            1           1           8s            123ms

url: http://***.amazonaws.com/text/generator

Step 4: Serve real-time predictions

Once you have your endpoint, you can make requests:

$ curl http://***.amazonaws.com/text/generator \
    -X POST -H "Content-Type: application/json" \
    -d '{"text": "machine learning"}'

Machine learning, with more than one thousand researchers around the world today, are looking to create computer-driven machine learning algorithms that can also be applied to human and social problems, such as education, health care, employment, medicine, politics, or the environment...

Any questions? chat with us.


More examples


Key features

  • Autoscaling: Cortex automatically scales APIs to handle production workloads.

  • Multi framework: Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more.

  • CPU / GPU support: Cortex can run inference on CPU or GPU infrastructure.

  • Rolling updates: Cortex updates deployed APIs without any downtime.

  • Log streaming: Cortex streams logs from deployed models to your CLI.

  • Prediction monitoring: Cortex monitors network metrics and tracks predictions.

  • Minimal declarative configuration: Deployments are defined in a single cortex.yaml file.

cortex's People

Contributors

deliahu avatar ospillinger avatar vishalbollu avatar 1vn avatar caleb-kaiser avatar hassenio avatar rbromley10 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.