Git Product home page Git Product logo

gitbook's Introduction

description
Open-source prediction infrastructure for data scientists

Welcome to Aqueduct

Aqueduct is open-source prediction infrastructure built for data scientists, by data scientists. With Aqueduct, data scientists can instantaneously deploy machine learning models to the cloud, connect those models to data and business systems, and gain visibility into the performance of their prediction pipelines -- all from the comfort of a Python notebook.

For more on why we're building prediction infrastructure for data scientists see the-aqueduct-philosophy.md.

The core abstraction in Aqueduct is a Workflow, which is a sequence of Artifacts (data) that are transformed by Operators (compute). The input Artifact(s) for a Workflow is typically loaded from a database, and the output Artifact(s) are typically persisted back to a database. Each Workflow can either be run on a fixed schedule or triggered on-demand.

The 25-line code snippet below is all you need to create your first prediction pipeline:

import aqueduct as aq
from aqueduct import op, metric
import pandas as pd
# Need to install torch and transformers
#!pip install torch transformers
from transformers import pipeline
import torch

client = aq.Client("YOUR_API_KEY", "localhost:8080")

# This function takes in a DataFrame with the text of user review of
# hotels and returns a DataFrame that has the sentiment of the review.
# This function users the `pipeline` interface from HuggingFace's 
# Transformers package. 
@op()
def sentiment_prediction(reviews):
    model = pipeline("sentiment-analysis")
    predicted_sentiment = model(list(reviews["review"]))
    return reviews.join(pd.DataFrame(predicted_sentiment))

# Load a connection to a database -- here, we use the `aqueduct_demo`
# database, for which you can find the documentation here:
# https://docs.aqueducthq.com/example-workflows/demo-data-warehouse
demo_db = client.integration("aqueduct_demo")

# Once we have a connection to a database, we can run a SQL query against it.
reviews_table = demo_db.sql("select * from hotel_reviews;")

# Next, we apply our annotated function to our data -- this tells Aqueduct 
# to create a workflow spec that applied `sentiment_prediction` to `reviews_table`.
sentiment_table = sentiment_prediction(reviews_table)

# When we call `.save()`, Aqueduct will take the data in `sentiment_table` and 
# write the results back to any database you specify -- in this case, back to the 
# `aqueduct_demo` DB.
sentiment_table.save(demo_db.config(table="sentiment_pred", update_mode="replace"))

# In Aqueduct, a metric is a numerical measurement of a some predictions. Here, 
# we calculate the average sentiment score returned by our machine learning 
# model, which is something we can track over time.


# In Aqueduct, a metric is a numerical measurement of a some predictions. Here, 
# we calculate the average sentiment score returned by our machine learning 
# model, which is something we can track over time.
@metric
def average_sentiment(reviews_with_sent):
    return (reviews_with_sent["label"] == "POSITIVE").mean()

avg_sent = average_sentiment(sentiment_table)

# Once we compute a metric, we can set upper and lower bounds on it -- if 
# the metric exceeds one of those bounds, an error will be raised.
avg_sent.bound(lower=0.5)

# We can also request system level metrics such as runtime.
# These can be instiated from a table artifact and represent the runtime of the previous @op that ran on it
sentiment_runtime_metric = sentiment_table.system_metric("runtime")

# Now we can request for the runtime.
# We can also apply bounds on this metric just as any other.
sentiment_runtime_metric.get()


# And we're done! With a call to `publish_flow`, we've created a full workflow
# that calculates the sentiment of hotel reviews, creates a metric over those
# predictions, and sets a bound on that metric.
client.publish_flow(name="hotel_sentiment", artifacts=[sentiment_table, avg_sent])

For more on this pipeline, check our Quickstart Guide.

Core Concepts

Example Workflows

Guides

API Reference

gitbook's People

Contributors

kenxu95 avatar vsreekanti avatar eunice-chan avatar jegonzal avatar hsubbaraj-spiral avatar gitbook-bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.