Git Product home page Git Product logo

kafka-connect-rockset's Introduction

Kafka Connect for Rockset

Build Status

Kafka Connect for Rockset is a Kafka Connect Sink. This connector helps you load your data from Kafka Streams into Rockset collections and runs in both standalone and distributed mode. Only valid JSON and Avro documents can be read from Kafka Streams and written to Rockset collections by Kafka Connect for Rockset.

API Version

This connector is a sink connector that makes use of the 2.0.0-cp1 version of the Kafka Connect API.

Requirements

  1. Kafka version 1.0.0+.

  2. Java 8+.

  3. An active Rockset account.

Build

  1. Clone the repo from https://github.com/rockset/kafka-connect-rockset

  2. Verify that Java8 JRE or JDK is installed.

  3. Run mvn package. This will build the jar in the /target directory. The name will be kafka-connect-rockset-[VERSION]-SNAPSHOT-jar-with-dependencies.jar.

Usage

  1. Start your Kafka cluster and confirm it is running.

  2. Kafka Connect can be run in standalone or distributed modes. In both modes, there is one configuration file that controls Kafka connect and a separate set of configuration for Rockset specific parameters. Depending on whether you are trying to run locally (standalone) or distributed, you will want to edit the appropriate configuration file - $KAFKA_HOME/config/connect-standalone.properties or $KAFKA_HOME/config/connect-distributed.properties respectively.

  3. In the config file mentioned above, adjust the values as shown below. For more information on installing Kafka Connect plugins please refer to the Confluent Documentation.

Name Value
bootstrap.servers <list-of-kafka-brokers>
plugin.path /path/to/rockset/sink/connector.jar
  1. In addition to this, if you are dealing with JSON files in your stream already, you can turn off schema enforcement and conversion that Kafka connect provides by setting the following properties in the config file.
Name Value
key.converter org.apache.kafka.connect.storage.StringConverter
value.converter org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable false
value.converter.schemas.enable false

If you have Avro files in your stream, you can use the AvroConverter that Kafka connect provides. You will also want to set the schema registry url of the converter to a comma-separated list of URLs for Schema Registry instances, which will typically be on port 8081. For more information, see the Confluent Documentation.

Name Value
key.converter io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url <list-of-schema-registry-instances>
value.converter io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url <list-of-schema-registry-instances>

There are sample configuration files that you can use directly in the config/ directory in this project. You can verify that your settings match the ones provided in there.

  1. Place the jar file created by mvn package (kafka-connect-rockset-[VERSION]-SNAPSHOT-jar-with-dependencies.jar) in or under the location specified in plugin.path
  2. If you are running in standalone mode modify the configuration file in the config/ directory in this repository and set the required parameters (see below). Run $KAFKA_HOME/bin/connect-standalone.sh ./config/connect-standalone.properties ./config/connect-rockset-sink.properties to start Kafka Connect with Rockset configured. This is sufficient for testing and should let you run a local Kafka Connect worker that uses the configuration provided in ./config/connect-rockset-sink.properties in this repository to write JSON documents from Kafka to Rockset.
  3. Alternately, if you're running in distributed mode, you'll run: $KAFKA_HOME/bin/connect-distributed.sh ./config/connect-distributed.properties to start Kafka Connect. You can then configure parameters associated with Rockset using Kafka Connect's REST API.
curl -i http://localhost:8083/connectors -H "Content-Type: application/json" -X POST -d '{
    "name": "rockset-sink",
    "config":{
      "connector.class": "rockset.RocksetSinkConnector",
      "tasks.max": "20",
      "rockset.task.threads": "5",
      "topics": "<your-kafka-topics separated by commas>",
      "rockset.integration.key": "<rockset-kafka-integration-key>"
      "rockset.apiserver.url": "https://api.rs2.usw2.rockset.com",
      "format": "json"
    }
}'
  1. In distributed mode, use the following commands to check status, and manage connectors and tasks:

# List active connectors
curl http://localhost:8083/connectors

# Get rockset-sink connector info
curl http://localhost:8083/connectors/rockset-sink

# Get rockset-sink connector config info
curl http://localhost:8083/connectors/rockset-sink/config

# Delete rockset-sink connector
curl http://localhost:8083/connectors/rockset-sink -X DELETE

# Get rockset-sink connector task info
curl http://localhost:8083/connectors/rockset-sink/tasks

See the the Confluent documentation for more REST examples.

Configuration

Parameters

Required Parameters

Name Description Default Value
name Connector name. A consumer group with this name will be created with tasks to be distributed evenly across the connector cluster nodes.
connector.class The Java class used for executing the connector logic. rockset.RocksetSinkConnector
tasks.max The number of tasks generated to handle data collection jobs in parallel. The tasks will be spread evenly across all Rockset Kafka Connector nodes.
topics List of comma-separated Kafka topics that should be watched by this Rockset Kafka Connector.
rockset.apiserver.url URL of the Rockset API Server to connect to. https://api.rs2.usw2.rockset.com
rockset.integration.key Integration Key authenticates the connector to write into Rockset collections.
format Format of your data. Currently json and avro are supported. json

Optional Parameters

Name Description Default Value
rockset.task.threads Number of threads that each task should spawn when writing to Rockset. 5

License

Rockset Connect for Kafka is licensed under the Apache License 2.0.

kafka-connect-rockset's People

Contributors

kwadhwa18 avatar haneeshr avatar joe-el-khoury avatar foxish avatar dhruba avatar jklegar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.