Git Product home page Git Product logo

csv2avro's Introduction

CSV2Avro

Convert CSV files to Avro like a boss.

CircleCI

Installation

$ gem install csv2avro

or if you prefer to live on the edge, just clone this repository and build it from scratch.

Usage

Basic

$ csv2avro --schema ./spec/support/schema.avsc ./spec/support/data.csv

This will process the data.csv file and creates a data.avro file and a data.bad file with a report of the bad rows.

You can override the bad rows report file location with the --bad-rows [BAD_ROWS] option.

Streaming

$ cat ./spec/support/data.csv | csv2avro --schema ./spec/support/schema.avsc --bad-rows ./spec/support/data.bad > ./spec/support/data.avro

This will process the input stream and push the avro data to the output stream. If you're working with streams you will need to specify the --bad-rows location.

Advanced features

AWS S3 storage

aws s3 cp s3://csv-bucket/transactions.csv - | csv2avro --schema ./transactions.avsc --bad-rows ./transactions.bad | aws s3 cp - s3://avro-bucket/transactions.avro

This will stream your file stored in AWS S3, converts the data and pushes it back to S3. For more information, please check the AWS CLI documentation.

Convert compressed files

gunzip -c ./spec/support/data.csv.gz | csv2avro --schema ./spec/support/schema.avsc --bad-rows ./spec/support/data.bad > ./spec/support/data.avro

This will uncompress the file and converts it to avro, leaving the original file intact.

More

For a full list of available options, run csv2avro --help

$ csv2avro --help
Version 1.3.0 of CSV2Avro
Usage: csv2avro [options] [file]
    -s, --schema SCHEMA              A file containing the Avro schema. This value is required.
    -b, --bad-rows [BAD_ROWS]        The output location of the bad rows report file.
    -d, --delimiter [DELIMITER]      Field delimiter. If none specified, then comma is used as the delimiter.
    -l, --line-ending [LINE_ENDING]  Line ending character used as row separator in CSV parsing
    -a [ARRAY_DELIMITER],            Array field delimiter. If none specified, then comma is used as the delimiter.
        --array-delimiter
    -D, --write-defaults             Write default values.
    -c, --stdout                     Output will go to the standard output stream, leaving files intact.
    -h, --help                       Prints help

Contributing

  1. Fork it ( https://github.com/sspinc/csv2avro/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

csv2avro's People

Contributors

peterableda avatar rgabo avatar tmichel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.