Git Product home page Git Product logo

roapi's Introduction

ROAPI

build Documentation

ROAPI automatically spins up read-only APIs for static datasets without requiring you to write a single line of code. It builds on top of Apache Arrow and Datafusion. The core of its design can be boiled down to the following:

  • Query frontends to translate SQL, GraphQL and REST API queries into Datafusion plans.
  • Datafusion for query plan execution.
  • Data layer to load datasets from a variety of sources and formats with automatic schema inference.
  • Response encoding layer to serialize intermediate Arrow record batch into various formats requested by client.

See below for a high level diagram:

roapi-design-diagram

Installation

cargo install --git https://github.com/roapi/roapi --branch main --bin roapi-http

Pre-built docker images are also available at ghcr.io/roapi/roapi-http.

Usage

Quick start

Spin up APIs for test_data/uk_cities_with_headers.csv and test_data/spacex-launches.json:

roapi-http \
    --table 'uk_cities:test_data/uk_cities_with_headers.csv' \
    --table 'spacex_launches:test_data/spacex-launches.json'

Or using docker:

docker run -t --rm -p 8080:8080 ghcr.io/roapi/roapi-http:latest --addr 0.0.0.0:8080 \
    --table 'uk_cities:test_data/uk_cities_with_headers.csv' \
    --table 'spacex_launches:test_data/spacex-launches.json'

Query tables using SQL, GraphQL or REST:

curl -X POST -d "SELECT city, lat, lng FROM uk_cities LIMIT 2" localhost:8080/api/sql
curl -X POST -d "query { uk_cities(limit: 2) {city, lat, lng} }" localhost:8080/api/graphql
curl "localhost:8080/api/tables/uk_cities?columns=city,lat,lng&limit=2"

Get inferred schema for all tables:

curl 'localhost:8080/api/schema'

Config file

You can also configure multiple table sources using YAML config, which supports more advanced format specific table options:

addr: 0.0.0.0:8084
tables:
  - name: "blogs"
    uri: "test_data/blogs.parquet"

  - name: "ubuntu_ami"
    uri: "test_data/ubuntu-ami.json"
    option:
      format: "json"
      pointer: "/aaData"
      array_encoded: true
    schema:
      columns:
        - name: "zone"
          data_type: "Utf8"
        - name: "name"
          data_type: "Utf8"
        - name: "version"
          data_type: "Utf8"
        - name: "arch"
          data_type: "Utf8"
        - name: "instance_type"
          data_type: "Utf8"
        - name: "release"
          data_type: "Utf8"
        - name: "ami_id"
          data_type: "Utf8"
        - name: "aki_id"
          data_type: "Utf8"

  - name: "spacex_launches"
    uri: "https://api.spacexdata.com/v4/launches"
    option:
      format: "json"

  - name: "github_jobs"
    uri: "https://jobs.github.com/positions.json"

To run serve tables using config file:

roapi-http -c ./roapi.yml

See config documentation for more options including using Google spreadsheet as a table source.

Response serialization

By default, ROAPI encodes responses in JSON format, but you can request different encodings by specifying the ACCEPT header:

curl -X POST \
    -H 'ACCEPT: application/vnd.apache.arrow.stream' \
    -d "SELECT launch_library_id FROM spacex_launches WHERE launch_library_id IS NOT NULL" \
    localhost:8080/api/sql

REST API query interface

You can query tables through REST API by sending GET requests to /api/tables/{table_name}. Query operators are specified as query params.

REST query frontend currently supports the following query operators:

  • columns
  • sort
  • limit
  • filter

To sort column col1 in ascending order and col2 in descending order, set query param to: sort=col1,-col2.

To find all rows with col1 equal to string 'foo', set query param to: filter[col1]='foo'. You can also do basic comparisons with filters, for example predicate 0 <= col2 < 5 can be expressed as filter[col2]gte=0&filter[col2]lt=5.

GraphQL query interface

To query tables using GraphQL, send the query through POST request to /api/graphql endpoint.

GraphQL query frontend supports the same set of operators supported by REST query frontend. Here how is you can apply various operators in a query:

{
    table_name(
        filter: {
            col1: false
            col2: { gteq: 4, lt: 1000 }
        }
        sort: [
            { field: "col2", order: "desc" }
            { field: "col3" }
        ]
        limit: 100
    ) {
        col1
        col2
        col3
    }
}

SQL query interface

To query tables using a subset of standard SQL, send the query through POST request to /api/sql endpoint. This is the only query interface that supports table joins.

Features

Query layer:

  • REST API GET
  • GraphQL
  • SQL
  • join between tables
  • support filter on nested struct fields
  • index
  • protocol
    • gRPC
    • MySQL
    • Postgres

Response serialization:

  • JSON application/json
  • Arrow application/vnd.apache.arrow.stream
  • msgpack

Data layer:

Misc:

  • auto gen OpenAPI doc for rest layer
  • query input type conversion based on table schema
  • stream arrow encoding response
  • authentication layer

Development

The core of ROAPI, including query frontends and data layer, lives in the self-contained columnq crate. It takes queries and outputs Arrow record batches. Data sources will also be loaded and stored in memory as Arrow record batches.

The roapi-http crate wraps columnq with a HTTP based API layer. It serializes Arrow record batches produced by columnq into different formats based on client request.

Building ROAPI with simd optimization requires nightly rust toolchain.

Build Docker image

docker build --build-arg RELEASE=main --rm -t ghcr.io/roapi/roapi-http:latest .

You can set RELEASE variable to any git reference to build for a specific version.

roapi's People

Contributors

houqp avatar ad-m avatar clemherreman avatar hibuz avatar lguzzon avatar

Stargazers

young kwon avatar

Watchers

young kwon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.