Git Product home page Git Product logo

yelp_data_pipeline_avro_util's Introduction

Data Pipeline Avro Util

What is it?

The Data Pipeline Avro utility package provides a Pythonic interface for reading and writing Avro schemas. It also provides an enum class for metadata that we've found useful to include in our schemas.

Download and Install

git clone [email protected]:Yelp/data_pipeline_avro_util.git
pip install data_pipeline_avro_util

Tests

Running unit tests

make test

Usage

Using Avro Schema Builder::

from data_pipeline_avro_util.avro_builder import AvroSchemaBuilder
from data_pipeline_avro_util.data_pipeline.avro_meta_data import AvroMetaDataKeys

avro_builder = AvroSchemaBuilder()
avro_builder.begin_record(
    name="test_name",
    namespace="test_namespace",
    doc="test_doc"
)
avro_builder.add_field(
    name = "key1",
    typ = "string",     # datatype of this field is string
    doc="test_doc1",
    metadata={
        AvroMetaDataKeys.PRIMARY_KEY: 1     # first primary key
    }
)
avro_builder.add_field(
    name = "key2",
    typ = "string",
    doc="test_doc2"
)
record_json = avro_builder.end()
print record_json

    {
        "type": "record",
        "namespace": "test_namespace",
        "name": "test_name",
        "doc": "test_doc",
        "fields": [
            {"type": "string", "doc": "test_doc1", "name": "key1", "pkey": True},
            {"type": "string", "doc": "test_doc2", "name": "key2"}
        ]
    }

Disclaimer

We're still in the process of setting up this package as a stand-alone. There may be additional work required to run code and integrate with other applications.

License

Data Pipeline Avro Util is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Contributing

Everyone is encouraged to contribute to Data Pipeline Avro Util by forking the Github repository and making a pull request or opening an issue.

yelp_data_pipeline_avro_util's People

Contributors

abrarsheikh avatar amitskatti avatar bowu5 avatar jian-yelp avatar shazeline avatar tomelm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.