Git Product home page Git Product logo

gcs-tools's Introduction

GCS Tools

Build Status GitHub license

Raison d'être:

Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-cli, proto-tools for Scio's Protobuf in Avro file, and magnolify-tools for Magnolify code generation, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.

It uses your existing OAuth2 credentials and allows authentication via a browser.

Usage:

You can install the tools via our Homebrew tap on Mac.

brew tap spotify/public
brew install gcs-avro-tools gcs-parquet-cli gcs-proto-tools gcs-magnolify-tools
avro-tools tojson <GCS_PATH>
parquet-cli cat <GCS_PATH>
proto-tools tojson <GCS_PATH>
magnolify-tools <avro|parquet> <GCS_PATH>

Or build them yourself.

sbt assembly
java -jar avro-tools/target/scala-2.13/avro-tools-*.jar tojson <GCS_PATH>
java -jar parquet-cli/target/scala-2.13/parquet-cli-*.jar cat <GCS_PATH>
java -jar proto-tools/target/scala-2.13/proto-tools-*.jar cat <GCS_PATH>
java -jar magnolify-tools/target/scala-2.13/magnolify-tools-*.jar <avro|parquet> <GCS_PATH>

How it works:

To make avro-tools and parquet-cli work with GCS we need:

GCS connector won't pick up your local gcloud configuration, and instead expects settings in core-site.xml.

gcs-tools's People

Contributors

nevillelyh avatar ajitgogul avatar regadas avatar syodage avatar clairemcginty avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.