Git Product home page Git Product logo

conduit-connector-vitess's Introduction

Conduit Connector Vitess

General

The Vitess connector is one of Conduit plugins. It provides both, a source and a destination Vitess connector.

Prerequisites

How to build it

Run make build.

Testing

Run make test to run all the unit and integration tests, which require Docker and Compose V2 to be installed and running. The command will handle starting and stopping docker containers for you.

Source

The Vitess Source Connector connects to a VTGate with the provided address, keyspace and tabletType and starts creating records for each change detected in a table.

Upon starting, the source takes a snapshot of a given table in the database, then switches into CDC mode. In CDC mode, the plugin reads events from a VStream. In order for this to work correctly, binlog must be enabled on MySQL instances you use with Vitess.

Snapshot Capture

When the connector first starts, snapshot mode is enabled. The connector reads all rows of a table in batches via SELECT with keyset pagination by orderingColumn, limiting the rows by batchSize and ordering them by the orderingColumn. Once all rows in that initial snapshot are read the connector switches into CDC mode.

This behavior is enabled by default, but can be turned off by adding "snapshot": false to the Source configuration.

Change Data Capture

This connector implements CDC features for Vitess by connecting to a VStream that listens to changes in the configured table. Every detected change is converted into a record and returned in the call to Read.

The connector in the CDC mode retrieves all the available shards from Vitess and tracks changes from all of them. If a reshard occurs, the connector will see the change and will listen for events from the new shards.

Warning Changes that happen during a snapshot are not captured.

Configuration Options

name description required default
address The address pointed to a VTGate instance.
Format: hostname:port
true
table The name of the table that the connector should read from. true
keyspace The keyspace specifies a VTGate keyspace. true
orderingColumn The name of a column that the connector will use for ordering rows. Its values must be unique and suitable for sorting, otherwise, the snapshot won't work correctly. true
keyColumn Column name that records should use for their Key fields. false primary key of a table or value of the orderingColumn
username Username of a VTGate user.
Required if your VTGate instance has a static authentication enabled.
false
password Password of a VTGate user.
Required if your VTGate instance has a static authentication enabled.
false
tabletType Specifies a VTGate tablet type. false primary
columns Comma separated list of column names that should be included in each Record's payload.
If the field is not empty it must contain a value of the orderingColumn field.
false all columns
batchSize Size of rows batch. Min is 1 and max is 100000. false 1000
maxRetries The number of reconnect retries the connector will make before giving up if a connection goes down. false 3
retryTimeout The time period that will be waited between retries. false 1
snapshot The field determines whether or not the connector will take a snapshot of the entire table before starting CDC mode. false true

Key handling

If the keyColumn field is not provided, then the connector will attempt to look up the primary key column of the table. If that can't be determined it will use a value of the orderingColumn field.

Columns

If no column names are provided in the config, then the connector will assume that all columns in the table should be returned. If columns are changed (renamed, etc.), the connector will see those changes and use the updated columns.

Known limitations

Changing a table name during a connector update can cause quite unexpected results. Therefore, it's highly not recommended to do this.

Destination

The Vitess Destination takes a record.Record and parses it into a valid SQL query. The Destination is designed to handle different payloads and keys. Because of this, each record is individually parsed and upserted.

Table name

If a record contains a table property in its metadata it will be inserted in that table, otherwise it will fall back to use the table configured in the connector. Thus, a Destination can support multiple tables in a the same connector, as long as the user has proper access to those tables.

Upsert Behavior

If the target table already contains a record with the same key, the Destination will upsert with its current received values. Since the keys must be unique, this can lead to overwriting and potential data loss, so the keys must be correctly assigned from the Source.

In case if there is no key, the record will be simply appended.

Configuration Options

name description required default
address The address pointed to a VTGate instance.
Format: hostname:port
true
table The name of the table that the connector should write to. true
keyColumn Column name used to detect if the target table already contains the record. true
keyspace Specifies a VTGate keyspace. true
username Username of a VTGate user.
Required if your VTGate instance has a static authentication enabled.
false
password Password of a VTGate user.
Required if your VTGate instance has a static authentication enabled.
false
tabletType Specifies a VTGate tablet type. false primary
maxRetries The number of reconnect retries the connector will make before giving up if a connection goes down. false 3
retryTimeout The time period that will be waited between retries. false 1

conduit-connector-vitess's People

Contributors

adamhaffar avatar alyakimenko avatar dependabot[bot] avatar hariso avatar lovromazgon avatar serhiispivak avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conduit-connector-vitess's Issues

Feature: improve tests to check for error values

Feature description

Currently, most tests check if an error was returned but not for the error itself. Changing this behavior and checking error values wherever the wantErr bool is used would be good.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.