Git Product home page Git Product logo

pglogical-poc's Introduction

pglogical-poc

Replicate from PostgreSQL 11.5 to 11.10 using pglogical 2.2.2

Dependencies

# install timeout command on macOS
brew install coreutils

Setup

In this PoC we logically replicate from a PostgreSQL 11.5 to a PostgreSQL 11.10. Both instances running in Docker containers and communicating with each other. Both have pglogical 2.2.2 installed.

# start containers
docker-compose up -d

# in case we need to rebuilt the images use
docker-compose up -d --build

# running services
docker-compose ps

# stop containers
docker-compose down --rmi all

For a more realistic setup there are three tables created: users, posts, and comments, where comments has a foreign key for posts and posts has a foreign key for users. The goal of this PoC is to move everything related to a specific user: 1 row from users, x rows from posts, and y rows from comments.

pglogical currently doesn’t support sub-queries in the row_filter. So we need an alternative approach.

invalid row_filter expression "post_id = IN (SELECT id FROM posts WHERE user_id = 1)"

For simplicity we choose de-normalizing the foreign key relation from comments to posts to users by adding a user_id column to the comments table directly, that will be populated with the randomly chosen user_id values set in posts table.

Now run replication queries:

# first for the provider:
#   - pglogical.create_node
#   - pglogical.create_replication_set
#   - pglogical.replication_set_add_table
docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication -f /replication.sql

# second for the subscriber:
#   - pglogical.create_node
#   - pglogical.create_subscription
docker exec -it pglogical-poc-pgsubscriber-1 \
  psql -U postgres -d pg_logical_replication_results -f /replication.sql

And finally, check if the correct number of posts was replicated based on the arbitrary row filter user_id = 1:

# get number of posts having `user_id = 1`
docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication \
    -c 'SELECT COUNT(*) FROM posts WHERE user_id = 1;'
 count
-------
    19
(1 row)

# get number of replicated posts
docker exec -it pglogical-poc-pgsubscriber-1 \
  psql -U postgres -d pg_logical_replication_results \
    -c 'SELECT COUNT(*) FROM posts;'
 count
-------
    19
(1 row)

The actual number of posts can differ between runs, as the initial data is generated randomly. The important thing is that the two numbers are indeed equal.

Try to add more posts and comments to the provider instance and check if the replication worked.

docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication \
    -c 'INSERT INTO posts (SELECT generate_series(1001, 2000), FLOOR(random() * 50) + 1);'
INSERT 0 1000

docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication \
    -c 'INSERT INTO comments (SELECT generate_series(201, 400), FLOOR(random()* 1000) + 1, 1, (ROUND(random())::int)::boolean);'
INSERT 0 200

docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication \
    -c 'UPDATE comments
SET user_id = subquery.user_id
FROM (
  SELECT posts.user_id, comments.id
  FROM posts
  INNER JOIN comments ON posts.id = comments.post_id
) AS subquery
WHERE comments.id = subquery.id;'
UPDATE 400

Some useful SQL queries:

-- show subscription status
SELECT * FROM pglogical.show_subscription_status(
  subscription_name := 'pglogical_subscription'
);

-- show subscription table
SELECT * FROM pglogical.show_subscription_table(
  subscription_name := 'pglogical_subscription',
  relation := 'example'
);

-- show `pglogical` relations
\dt pglogical.

-- describe `pglogical.local_sync_status`
\d+ pglogical.local_sync_status

-- show local sync status
SELECT sync_status
  FROM pglogical.local_sync_status
  WHERE sync_nspname = 'public' AND sync_relname = 'example';

The sync states are defined here and mean the following:

  • \0: SYNC_STATUS_NONE (No sync)
  • i: SYNC_STATUS_INIT (Ask for sync)
  • s: SYNC_STATUS_STRUCTURE (Sync structure)
  • d: SYNC_STATUS_DATA (Sync data)
  • c: SYNC_STATUS_CONSTAINTS (Sync constraints)
  • w: SYNC_STATUS_SYNCWAIT (Table sync is waiting to get OK from main thread)
  • u: SYNC_STATUS_CATCHUP (Catching up)
  • y: SYNC_STATUS_SYNCDONE (Sync finished at LSN)
  • r: SYNC_STATUS_READY (Sync done)

Determine replication status

# check replication slots on provider
docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication
psql (11.5 (Debian 11.5-3.pgdg90+1))
Type "help" for help.

pg_logical_replication=# \x
Expanded display is on.
pg_logical_replication=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+----------------------------------------------------
slot_name           | pgl_pg_logic194f0de_provider_pglogical_subscription
plugin              | pglogical_output
slot_type           | logical
datoid              | 16384
database            | pg_logical_replication
temporary           | f
active              | t
active_pid          | 103
xmin                |
catalog_xmin        | 577
restart_lsn         | 0/1826A30
confirmed_flush_lsn | 0/1826A68

pg_logical_replication=# exit

Find column descriptions here.

# check current WAL insert LSN
docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication
psql (11.5 (Debian 11.5-3.pgdg90+1))
Type "help" for help.

pg_logical_replication=# \x
Expanded display is on.
pg_logical_replication=# SELECT pg_current_wal_insert_lsn();
-[ RECORD 1 ]-------------+----------
pg_current_wal_insert_lsn | 0/18264A8

pg_logical_replication=# exit
# check replication status on provider
docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication
psql (11.5 (Debian 11.5-3.pgdg90+1))
Type "help" for help.

pg_logical_replication=# \x
Expanded display is on.
pg_logical_replication=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 101
usesysid         | 10
usename          | postgres
application_name | pglogical_subscription
client_addr      | 192.168.128.3
client_hostname  |
client_port      | 58410
backend_start    | 2021-03-17 16:48:24.83939+00
backend_xmin     |
state            | streaming
sent_lsn         | 0/18264A8
write_lsn        | 0/18264A8
flush_lsn        | 0/18264A8
replay_lsn       | 0/18264A8
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

pg_logical_replication=# SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn);
-[ RECORD 1 ]--+------
pg_size_pretty | 24 MB

pg_logical_replication=# SELECT
pg_logical_replication-#   pg_current_wal_insert_lsn(),
pg_logical_replication-#   replay_lsn,
pg_logical_replication-#   pg_size_pretty(pg_current_wal_insert_lsn() - replay_lsn::pg_lsn) AS diff
pg_logical_replication-# FROM pg_stat_replication;
-[ RECORD 1 ]-------------+----------
pg_current_wal_insert_lsn | 0/1826588
replay_lsn                | 0/1826588
diff                      | 0 bytes

pg_logical_replication=# exit

Find column descriptions here.

# check local sync status on subscriber
docker exec -it pglogical-poc-pgsubscriber-1 \
  psql -U postgres -d pg_logical_replication_results
psql (11.10 (Debian 11.10-1.pgdg90+1))
Type "help" for help.

pg_logical_replication_results=# \x
Expanded display is on.
pg_logical_replication_results=# SELECT * FROM pglogical.local_sync_status;
-[ RECORD 1 ]--+-----------
sync_kind      | f
sync_subid     | 2875150205
sync_nspname   | public
sync_relname   | users
sync_status    | r
sync_statuslsn | 0/183A6D8
-[ RECORD 2 ]--+-----------
sync_kind      | f
sync_subid     | 2875150205
sync_nspname   | public
sync_relname   | posts
sync_status    | r
sync_statuslsn | 0/183A6D8
-[ RECORD 3 ]--+-----------
sync_kind      | f
sync_subid     | 2875150205
sync_nspname   | public
sync_relname   | comments
sync_status    | r
sync_statuslsn | 0/183A6D8
-[ RECORD 4 ]--+-----------
sync_kind      | d
sync_subid     | 2875150205
sync_nspname   |
sync_relname   |
sync_status    | r
sync_statuslsn | 0/0

pg_logical_replication_results=# exit

Metrics & Alerting

To monitor the overall performance and the replications in particular PostgreSQL Server Exporter is used to export metrics in the Prometheus format. A valid example configuration file for Prometheus can be found here. The effective configuration file is this one.

The following endpoints provide metrics respectively:

After having setup the initial dataset, we can see some values for how many tuples have been inserted (pg_stat_database_tup_inserted) in this graph as an example.

The mappings into the Prometheus format provided by PostgreSQL Server Exporter can be found here and the default custom queries added can be found here. If you don’t want to see the default metrics, set the environment variable PG_EXPORTER_DISABLE_DEFAULT_METRICS to true.

Prometheus will be configured with some example alerts:

  • PostgreSQLMaxConnectionsReached
  • PostgreSQLHighConnections
  • PostgreSQLDown
  • PostgreSQLSlowQueries
  • PostgreSQLQPS
  • PostgreSQLCacheHitRatio

Their definition can be found here.

Monitoring

In conjunction with Prometheus, Grafana can be used to monitor a whole bunch of different metrics provided by a variety of data sources. The custom credentials for Grafana are admin:s3cr3t. Provisioning capabilities are used to configure Prometheus as data source and also already create a useful dashboard.

Benchmarking

pgbench can be used to perform a benchmark. This is a 2-step process. First you need to initialize the database and then you can run the benchmark itself.

$ # init pgbench by creating the necessary tables
$ docker exec -it pglogical-poc-pgprovider-1 \
    pgbench -U postgres -d pg_logical_replication -i

$ # run pgbench
$ docker exec -it pglogical-poc-pgprovider-1 \
    pgbench -U postgres -d pg_logical_replication -c 10 -T 300
# ...
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 300 s
number of transactions actually processed: 148141
latency average = 20.252 ms
tps = 493.772765 (including connections establishing)
tps = 493.777011 (excluding connections establishing)

Replication

We are aiming to utilize pgbench for our replication example. A good candidate is the pgbench_history table, that is holding almost ~2.4k records per teller (after running pgbench) and there have been 10 different tellers created. One caveat exist: This table doesn’t have a primary key, so we can only replicate INSERT statements. For this example it’s sufficient, though.

pgbench creates the following data:

  • pgbench_accounts: 100,000 accounts all related to branch 1 with a balance of 0
  • pgbench_branches: 1 branch with a total balance of 0
  • pgbench_history: a kind of ledger that is empty after initialization
  • pgbench_tellers: 10 tellers all related to branch 1 with a balance of 0

After having leveraged make start init replicate (or for short make run), we can check if the pgbench_history table in pgsubscriber is filled.

# count on provider side
docker exec -it pglogical-poc-pgprovider-1 \
  psql -U postgres -d pg_logical_replication \
    -c 'SELECT COUNT(*) FROM pgbench_history WHERE tid = 1;'
 count
-------
  2631
(1 row)

# count on subscriber side
docker exec -it pglogical-poc-pgsubscriber-1 \
  psql -U postgres -d pg_logical_replication_results \
    -c 'SELECT COUNT(*) FROM pgbench_history WHERE tid = 1;'
 count
-------
  2631
(1 row)

Convenience

You can use the following make targets to simplify processes:

  • build: Build containers
  • start: Start containers
  • wait: Wait for databases to be ready
  • init: Init databases with pgbench
  • reset: Reset databases
  • replicate: Run replication
  • run: Runs start, init and replicate
  • list: List running containers
  • stop: Stop containers
  • clean: Remove containers

Debugging

For debugging purposes we can run a Docker image that only provides the psql client as follows:

$ # run `psql` within the same docker network and access `pgprovider`
$ docker run -it --net pglogical-poc_default --rm jbergknoff/postgresql-client \
    "postgresql://postgres:s3cr3t@pgprovider:5432?sslmode=disable"

Resources

Postgres

Grafana

pglogical-poc's People

Contributors

m99coder avatar reediculous456 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jwhiles sverch

pglogical-poc's Issues

Still not working

Hi Wesley @reediculous456 ,

thanks for your super quick reaction. After having merged your PR it does still not work on my machine. I don’t think the ports are the problem as the two Postgres instances have distinct hosts.

pgsubscriber_1  | 2021-03-16 18:56:32.386 GMT [78] ERROR:  could not connect to the postgresql server: could not connect to server: Connection refused
pgsubscriber_1  | 		Is the server running on host "pgsubscriber" (192.168.192.3) and accepting
pgsubscriber_1  | 		TCP/IP connections on port 5433?
pgsubscriber_1  |
pgsubscriber_1  | 2021-03-16 18:56:32.386 GMT [78] DETAIL:  dsn was:  host=pgsubscriber port=5433 dbname=pg_logical_replication_results user=replicate password=qwertz
pgsubscriber_1  | 2021-03-16 18:56:32.386 GMT [78] STATEMENT:  SELECT pglogical.create_subscription(
pgsubscriber_1  | 	  subscription_name := 'subscription',
pgsubscriber_1  | 	  replication_sets := array['hashes'],
pgsubscriber_1  | 	  provider_dsn := 'host=pgprovider port=5432 dbname=pg_logical_replication user=replicate password=qwertz'
pgsubscriber_1  | 	);
pgsubscriber_1  | psql:/docker-entrypoint-initdb.d/init.sql:29: ERROR:  could not connect to the postgresql server: could not connect to server: Connection refused
pgsubscriber_1  | 	Is the server running on host "pgsubscriber" (192.168.192.3) and accepting
pgsubscriber_1  | 	TCP/IP connections on port 5433?
pgsubscriber_1  |
pgsubscriber_1  | DETAIL:  dsn was:  host=pgsubscriber port=5433 dbname=pg_logical_replication_results user=replicate password=qwertz

My Docker setup is the following:

docker version
Client: Docker Engine - Community
 Cloud integration: 1.0.9
 Version:           20.10.5
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        55c4c88
 Built:             Tue Mar  2 20:13:00 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.5
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       363e9a8
  Built:            Tue Mar  2 20:15:47 2021
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Does it work for you?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.