CD-Stream is a cross-database CDC driven replicator tool that currently supports replication between MySQL and Postgres.
- Timed Data extraction (Straight forward ETLs) using selects on a production database can be costly and intensive.
- Cron jobs might have to be scheduled and what if they fail too?
In the current version, the support is provided for replication from MySQL and loading the data onto Postgres and new . The loading jobs are queued in redis and processed automatically; thanks to rq workers.
Check if binary logging is enabled in your source database. Issue the following command in your source database to verify:
Mysql:
select variable_value as "BINARY LOGGING STATUS (log-bin) :: " from information_schema.global_variables where variable_name='log_bin';
If the above command returns "OFF", make sure that the following lines are added to the /etc/mysql/mysql.conf.d and restart the mysql service:
log_bin = mysql-bin
expire_logs_days = 10
max_binlog_size = 100M
Safety first - Put your hard hats on !
- Clone the project and Initialize a virtual environment.
$ git clone https://github.com/datawrangl3r/cd-stream.git
$ cd cd-stream
$ python3 -m venv .
$ source bin/activate
$ pip install -r requirements.txt
- Configure the streamsql.yml - Tailor it based on your needs
EXTRACTION:
ENGINE: mysql
HOST: localhost
PORT: 3306
USER: root
PASS: password
DB: SOURCEDB
COMMIT:
ENGINE: postgres
HOST: localhost
PORT: 5432
USER: postgres
PASS: password
DB: TARGETDB
QUEUE:
ENGINE: REDIS
HOST: localhost
- Initialize rq workers in the background:
$ rq worker &
- Start Replication and Data Load (Use Supervisor if needed)
$ python main.py &