Git Product home page Git Product logo

hadoop-connectors's Introduction

hadoop-connectors

Apache Hadoop connectors for Pravega.

Description

Implementation of a Hadoop input format for Pravega (with wordcount examples). It leverages Pravega batch client to read all existing events in parallel.

Build

The build script handles Pravega as a source dependency, meaning that the connector is linked to a specific commit of Pravega (as opposed to a specific release version) in order to faciliate co-development. This is accomplished with a combination of a git submodule and the use of Gradle's composite build feature.

Cloning the repository

When cloning the connector repository, be sure to instruct git to recursively checkout submodules, e.g.:

git clone --recurse-submodules https://github.com/pravega/hadoop-connectors.git

To update an existing repository:

git submodule update --init --recursive

Building Pravega

Pravega is built automatically by the connector build script.

Building Hadoop Connector

Build the connector:

./gradlew build (w/o dependencies)
./gradlew shadowJar (w/ dependencies)

Test

./gradlew test

Usage

        Configuration conf = new Configuration();

        // optional to set start and end positions
        // generally, start positions are set to the end positions in previous job,
        // so only new generated events will be processed, otherwise, start from very beginning if not set
        conf.setStrings(PravegaInputFormat.START_POSITIONS, startPos);
        // fetch end positions
        String endPos = PravegaInputFormat.fetchLatestPositionsJson("tcp://127.0.0.1:9090", "myScope", "myStream");
        conf.setStrings(PravegaInputFormat.END_POSITIONS, endPos);

        conf.setStrings(PravegaInputFormat.SCOPE_NAME, "myScope");
        conf.setStrings(PravegaInputFormat.STREAM_NAME, "myStream");
        conf.setStrings(PravegaInputFormat.URI_STRING, "tcp://127.0.0.1:9090");
        conf.setStrings(PravegaInputFormat.DESERIALIZER, io.pravega.client.stream.impl.JavaSerializer.class.getName());

        Job job = new Job(conf);
        job.setInputFormatClass(PravegaInputFormat.class);

        // FYI, Key class is 'EventKey', but you won't need it at most of the time.

hadoop-connectors's People

Contributors

eronwright avatar yangb8 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.