cdlsoftware / pg-mv-fast-refresh Goto Github PK

postgres MV fast refresh module

License: MIT License

Shell 11.54% PLpgSQL 88.46%

pg-mv-fast-refresh's Introduction

Postgres materialized View Fast Refresh module

This project enables Postgres fast refresh capability using materialised view logs to track changes and offer an alternative to the complete refresh.

The fast refresh process was designed to be installed into its own schema that contains the functions needed to run the MV process, with three data dictionary tables and 3 roles.

The workflow for the MV log creation is shown in the diagram below:

The workflow for the MV creation is shown in the diagram below:

Installing the module

The install of the fast refresh functions is designed to live in its own schema in the database that is specified via the MODULEOWNER parameter.

To install the MV code, you need to navigate to the folder where the repo has been downloaded and edit the module_set_variables.sh file. This is where all the variables are stored for where we want to install the fast refresh functions.

The SOURCEUSERNAME/SOURCEPASSWORD & MVUSERNAME/MVPASSWORD parameters are not needed to install the fast refresh functions they are used for the test harness set-up.

cd pg-mv-fast-refresh
vi module_set_variables.sh

MODULEOWNER=<MODULEOWNER> - The module owner username
MODULE_HOME=<MODULE_HOME> - The Module home path 
MODULEOWNERPASS=<MODULEOWNERPASS> - Password for module owner PGRS_MVIEW
INSTALL_TYPE=<INSTALL_TYPE>	- The install type for example FULL (New install) - UPDATE (Update existing install). Default value set to FULL
HOSTNAME=<HOSTNAME> - Hostname for database
PORT=<PORT>	 - port for database
DBNAME=<DBNAME>	 - Database Name
PGUSERNAME=<PGUSERNAME> - DB username for the module installation run
PGPASSWORD=<PGPASSWORD> - DB username password for the module installation run
SOURCEUSERNAME=<SOURCEUSERNAME> - DB username for the source tables for the MV
SOURCEPASSWORD=<SOURCEPASSWORD> - DB password for the source tables user
MVUSERNAME=<MVUSERNAME> - DB username for the MV owner
MVPASSWORD=<MVPASSWORD> - DB password for the MV owner
LOG_FILE=<LOG_PATH> - Path to logfile output location

Here is an example of the parameter settings used the test case: we have an RDS instance pg-tonytest.test.com with a database testpoc and a master username dbamin

The fast refresh functions will be installed under the schema testpoc by the install package.

We then have a source schema testpocsource. This is where the source data tables will go for the test harness and a testpocmv, which is the schema where the MV will be built.

export MODULEOWNER=testpoc
export MODULE_HOME=/var/lib/pgsql/pg-mv-fast-refresh
export MODULEOWNERPASS=xxxxxxx
export INSTALL_TYPE=FULL
export HOSTNAME=pg-tonytest.test.com
export PORT=5432
export DBNAME=postgres
export PGUSERNAME=dbadmin
export PGPASSWORD=xxxxxxx
export SOURCEUSERNAME=testpocsource
export SOURCEPASSWORD=xxxxxxx
export MVUSERNAME=testpocmv
export MVPASSWORD=xxxxxxx
export LOG_FILE=/tmp/fast_refresh_module_install_`date +%Y%m%d-%H%M`.log

Now change the permissions on the script runCreateFastRefreshModule.sh to execute and then run.

chmod 700 runCreateFastRefreshModule.sh
$ ./runCreateFastRefreshModule.sh
Check log file - /tmp/fast_refresh_module_install_20191119-1423.log
$

This should just take seconds to run. When it’s complete, check the log file in the location you set. In my example, it’s in /tmp. The status is shown at the bottom; below is the example of the run I performed.

cat /tmp/fast_refresh_module_install_20191119-1423.log
INFO: Set variables
INFO: LOG_FILE parameter set to /tmp/fast_refresh_module_install_20191119-1423.log
INFO: MODULEOWNER parameter set to testpoc
INFO: PGUSERNAME parameter set to dbadmin
INFO: HOSTNAME parameter set to pg-tonytest.test.com
INFO: PORT parameter set to 5432
INFO: DBNAME parameter set to postgres
INFO: MODULE_HOME parameter set to /var/lib/pgsql/pg-mv-fast-refresh
INFO: INSTALL_TYPE parameter set to FULL
INFO: Run testpoc schema build script
INFO: Connect to postgres database postgres via PSQL session
…….. cut lines………..
GRANT
INFO: Running Module Deployment Error Checks
INFO: All Objects compiled successfully
INFO: No Errors Found
INFO: Completed Module Deployment Error Checks

After this install the functions will be installed under the MODULEOWNER schema.

Removing the module

To uninstall the module just execute the dropFastRefreshModule.sh script and it will prompt you to ask if you want to remove the module schema.

$ ./dropFastRefreshModule.sh
Are you sure you want to remove the module schema - testpoc (y/n)?y
yes selected the schemas - testpoc will be dropped
INFO: Drop Module Schema complete check logfile for status - /tmp/dropFastRefreshModule_20191119-1430.log

Test Harness Install

There is a test harness script create_test_harness.sh that will create six tables and insert some data into the tables and then create a complex MV. The script is exceuted as below

The SOURCEUSERNAME/SOURCEPASSWORD & MVUSERNAME/MVPASSWORD parameters are needed in the module_set_variables.sh. The SOURCEUSERNAME is the schema where the base tables will be created and the MVUSERNAME is the schema where the materialized view will be created.

$ pwd
/var/lib/pgsql/pg-mv-fast-refresh/test_harness
$ ./create_test_harness.sh
INFO: Build Complete check logfile for status - /tmp/test_harness_install_20191119-1425.log

If you check the output of the log file you will see the objects being created and the MV being created.

Test Harness Removal

To remove the test harness just execute the drop_test_harness.sh script and this will remove the test objects.

$ ./drop_test_harness.sh
INFO: Drop Complete check logfile for status - /tmp/test_harness_drop_20191119-1428.log

Pipeline Checks

There is a pipeline checks scripts that will install the module, create some test data and build 91 materialized view's then drop all the objects, schemas and users. This is mandatory to run if you want to contribute to the code it confirms that the modules will deploy ok and the MV's create with no errors.

$ ./run_pipeline_checks.sh all
Starting pipeline script with option all
Starting time - Mon Dec 16 12:54:08 UTC 2019
Stage 1: Creating the fast refresh module objects in schemas testpoc
Stage 2: Creating the schemas testpocdata and testpocview
Stage 3: Creating the test objects and data in schema testpocdata
Stage 4: Creating the MV logs in testpocdata
Stage 5: Creating 90 test MV's in schema testpocview
Stage 6: Test phase
Stage 6.1: Update 1 row and refresh all MV's including running 1 materialized view in parallel 4
Stage 7: Dropping the test harness objects
Stage 8: Check for problems
Pipeline run type all ran with no issues
Run completion time - Mon Dec 16 12:55:02 UTC 2019

pg-mv-fast-refresh's People

Contributors

Stargazers

Watchers

Forkers

rischanlab visuadotid beyondessential adeshatole emersongaudencio ramiachouri mattdevdba frankydee trmsmy mainakibui

pg-mv-fast-refresh's Issues

ERROR: duplicate key value violates unique constraint "pk_test_17"

During the fast refresh process i am seeing the following error.

The below is based on a simplified scenario example of what i'm seeing on my database.

CONTEXT: PL/pgSQL function inline_code_block line 12 at RAISE
ERROR: duplicate key value violates unique constraint "pk_test_17"
DETAIL: Key (top_id)=(1234567) already exists.

After adding additional debugging into the fast refresh module. The mv$executeMVFastRefresh function which is used to apply the log changes to the mview table is failing when executing the following routine.

Type of log change: DML TYPE = INSERT for outer join rows
Calls function mv$insertouterjoinrows

DELETE FROM dd_test.mv_test WHERE c_m_row$ IN ( SELECT c.m_row$ FROM a_tab a
LEFT JOIN b_tab b ON a.top_id = b.top_id
LEFT JOIN c_tab c ON a.medium_id = c.medium_id
LEFT JOIN d_tab d ON a.low_id= d.low_id
WHERE
d.m_row$ IN ( SELECT UNNEST($1)))

This considers the previous join INSERT and remove the rows based on the previous c_m_row$ column.

Then, Call function mv$insertouterjoinrows

INSERT INTO dd_test.mv_test ( cols..... )
SELECT cols ...
FROM a_tab a
LEFT JOIN b_tab b ON a.top_id = b.top_id
LEFT JOIN c_tab c ON a.medium_id = c.medium_id
LEFT JOIN d_tab d ON a.low_id= d.low_id
WHERE
d.m_row$ IN ( SELECT UNNEST($1))

When the INSERT runs we get this error:-

CONTEXT: PL/pgSQL function inline_code_block line 12 at RAISE
ERROR: duplicate key value violates unique constraint "pk_mv_test_17"
DETAIL: Key (top_id)=(1234567) already exists.

Conclusion:

In the function description it states

When inserting data into a complex materialized view, it is possible that a previous insert has already inserted the row that we are about to insert if that row is the subject of an outer join or is a parent of multiple new rows. When applying updates to the materialized view it is possible that the row being updated has subsequently been deleted, so before we can apply an update we have to ensure that the base row still exists. So to remove the possibility of duplicate rows we have to look to see if this situation has occurred.

So, at the moment we only consider the previous join condition m_row$ uuid's and do not consider other inserts may have already inserted the same row or multiple rows doing this process.

Composite primary keys

If your materialized view is linked to a composite primary key - this potentially could hit some refresh issues. Logic specific to our business needs has been added to handle this scenario for certain materialized views. This has been labelled CDL Specific in the comments if you'd like to review what has been done.

This will needs reviewing in full for a permanent solution. 95% of our materialized reviews don't use composite primary keys.

Exception in function mv$addrow$tomv$table when trying to add two mv sharing the same base table

Hello,
I'm trying to create different mv which uses a common base table.

DO
INFO: Exception in function mv$addRow$ToMv$Table
INFO: Error 42701:- column "on_m_row$" of relation "mv_merchants_by_bd" already exists:
INFO: Error Context:
ALTER TABLE spcdsview.mv_merchants_by_bd
ADD COLUMN on_m_row$ UUID
ERROR: 42701
CONTEXT: PL/pgSQL function "mv$addrow$tomv$table"("mv$allconstants",text,text,text[],text[],text,text) line 66 at RAISE
SQL statement "SELECT
pViewColumns,
pSelectColumns
FROM
mv$addRow$ToMv$Table
(
rConst,
pOwner,
pViewName,
tAliasArray,
tRowidArray,
tViewColumns,
tSelectColumns
)"
PL/pgSQL function "mv$creatematerializedview"(text,text,text,text,text,boolean) line 111 at SQL statement
PL/pgSQL function inline_code_block line 25 at assignment

Is it possible to have more than one view referencing to the same table in current implementation?

Thanks

INFO: Exception in function mv$insertOuterJoinRows INFO: Error 22004:- query string argument of EXECUTE is null:

INFO: Exception in function mv$insertOuterJoinRows
INFO: Error 22004:- query string argument of EXECUTE is null:
INFO: Error Context:

INFO: Exception in function mv$executeMVFastRefresh
INFO: Error P0001:- 22004:
INFO: Exception in function mv$refreshMaterializedViewFast
INFO: Error P0001:- P0001:
INFO: Error Context:
SELECT m_row$,sequence$,dmltype$ FROM comfort0_live.log$_test_table WHERE bitmap$[1] & 32768 = 32768 ORDER BY sequence$
INFO: Exception in function mv$refreshMaterializedViewFull
INFO: Error P0001:- P0001:
ERROR: P0001

Root Cause:-

The defect is down to the function mv$checkIfOuterJoinedTable - this checks the table array value whether it exists in the outer join array. This does not consider if the table is part of an inner join condition and outer join condition. So, what's happening is when we've got both in a query it's treating the inner join table as an outer join.

Solution:
Changed the function to pass the outer join array value not full array and confirm if the table value matches the outer join position value which should always be the case based on the the outer join array only being populated with a value if it's an outer join table if not there is a value of null which means this is an inner join table.

Problems with dropping MV logs

We’ve notice some problems with the drops of the MV’s using the functions and also dropping the MV logs did you test this out? We’re making these all test points for our pipeline so in future all this will be tested for any code changes.

So with the MV’s they drop ok but the numbers in the array field for the pg$mview_logs table look very wrong, check the output below I create 66 MVs and then drop them see the output from the pg$mview_logs.

-bash-4.1$ psql --host=$HOSTNAME --port=$PORT --username=$MVUSERNAME --dbname=$DBNAME
psql (9.5.12, server 11.5)
WARNING: psql major version 9.5, server major version 11.
Some psql features might not work.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> select * from testpoc."pg$mview_logs"
;
owner | pglog$_name | table_name | trigger_name | pg_mview_bitmap
-------------+-------------+------------+--------------+-----------------
testpocdata | log$_test1 | test1 | trig$_test1 | {0}
testpocdata | log$_test2 | test2 | trig$_test2 | {0}
testpocdata | log$_test3 | test3 | trig$_test3 | {0}
testpocdata | log$_test4 | test4 | trig$_test4 | {0}
testpocdata | log$_test5 | test5 | trig$_test5 | {0}
testpocdata | log$_test6 | test6 | trig$_test6 | {0}
(6 rows)

postgres=> DO
postgres-> $do$
postgres$> DECLARE
postgres$> tStartTime TIMESTAMP := clock_timestamp();
postgres$> cResult CHAR(1) := NULL;
postgres$> iTableCounter SMALLINT := 10;
postgres$> pSqlStatement TEXT;
postgres$> BEGIN
postgres$> pSqlStatement := 'SELECT test1.id test1_id,
postgres$> test1.lookup_id test1_lookup_id,
postgres$> test1.code test1_code,
postgres$> test2.id test2_id,
postgres$> test2.description test2_desc,
postgres$> test2.metavals_id test2_metavals_id,
postgres$> test2.age test2_age,
postgres$> test3.lookup_id test3_lookup_id,
postgres$> test3.lookup_code test3_lookup_code,
postgres$> test3.lookup_description test3_lookup_desc,
postgres$> test4.metavals_id test4_metavals_id,
postgres$> test4.code test4_code,
postgres$> test4.description test4_desc,
postgres$> test5.id test5_id,
postgres$> test5.rep_ind test5_rep_ind,
postgres$> test5.trans_id test5_trans_id,
postgres$> test6.trans_id test6_trans_id,
postgres$> test6.payment_reference test6_payment_ref
postgres$> FROM
postgres$> test1
postgres$> INNER JOIN test2 ON test1.id = test2.id
postgres$> LEFT JOIN test3 ON test1.lookup_id = test3.lookup_id
postgres$> LEFT JOIN test4 ON test2.metavals_id = test4.metavals_id
postgres$> INNER JOIN test5 ON test1.id = test5.id
postgres$> LEFT JOIN test6 ON test5.trans_id = test6.trans_id';
postgres$>
postgres$> FOR iTableCounter IN 10 .. 75
postgres$> LOOP
postgres$> cResult := mv$createMaterializedView
postgres$> (
postgres$> pViewName => 'mvtesting' || iTableCounter,
postgres$> pSelectStatement => pSqlStatement,
postgres$> pOwner => 'testpocview',
postgres$> pFastRefresh => TRUE
postgres$> );
postgres$> END LOOP;
postgres$>
postgres$> RAISE NOTICE 'Simple Snapshot Creation took %', clock_timestamp() - tStartTime;
postgres$> END
postgres$> $do$;
NOTICE: Simple Snapshot Creation took 00:00:04.714461
CONTEXT: PL/pgSQL function inline_code_block line 45 at RAISE
DO
postgres=> select * from testpoc."pg$mview_logs"
;
owner | pglog$_name | table_name | trigger_name | pg_mview_bitmap
-------------+-------------+------------+--------------+-------------------------
testpocdata | log$_test1 | test1 | trig$_test1 | {9223372036854775807,7}
testpocdata | log$_test2 | test2 | trig$_test2 | {9223372036854775807,7}
testpocdata | log$_test3 | test3 | trig$_test3 | {9223372036854775807,7}
testpocdata | log$_test4 | test4 | trig$_test4 | {9223372036854775807,7}
testpocdata | log$_test5 | test5 | trig$_test5 | {9223372036854775807,7}
testpocdata | log$_test6 | test6 | trig$_test6 | {9223372036854775807,7}
(6 rows)

postgres=> DO
postgres-> $do$
postgres$> DECLARE
postgres$> cResult CHAR(1) := NULL;
postgres$> iTableCounter SMALLINT := 10;
postgres$> BEGIN
postgres$> FOR iTableCounter IN 10 .. 75
postgres$> LOOP
postgres$> cResult := mv$removeMaterializedView( 'mvtesting' || iTableCounter, 'testpocview' );
postgres$> END LOOP;
postgres$> END
postgres$> $do$;
DO
postgres=> select * from testpoc."pg$mview_logs"
;
owner | pglog$_name | table_name | trigger_name | pg_mview_bitmap
-------------+-------------+------------+--------------+---------------------------
testpocdata | log$_test1 | test1 | trig$_test1 | {6917529027641081856,-10}
testpocdata | log$_test2 | test2 | trig$_test2 | {6917529027641081856,-10}
testpocdata | log$_test3 | test3 | trig$_test3 | {6917529027641081856,-10}
testpocdata | log$_test4 | test4 | trig$_test4 | {6917529027641081856,-10}
testpocdata | log$_test5 | test5 | trig$_test5 | {6917529027641081856,-10}
testpocdata | log$_test6 | test6 | trig$_test6 | {6917529027641081856,-10}
(6 rows)

postgres=>

Now dropping the logs I would not expect them to drop when we have these wild numbers but I’ve just tried creating the logs with no MV’s and then tried dropping the logs and we get errors see below.

postgres=> select * from testpoc."pg$mview_logs"
postgres-> ;
owner | pglog$_name | table_name | trigger_name | pg_mview_bitmap
-------------+-------------+------------+--------------+-----------------
testpocdata | log$_test1 | test1 | trig$_test1 | {0}
testpocdata | log$_test2 | test2 | trig$_test2 | {0}
testpocdata | log$_test3 | test3 | trig$_test3 | {0}
testpocdata | log$_test4 | test4 | trig$_test4 | {0}
testpocdata | log$_test5 | test5 | trig$_test5 | {0}
testpocdata | log$_test6 | test6 | trig$_test6 | {0}
(6 rows)

postgres=> DO
$do$
DECLARE
cResult CHAR(1) := NULL;
BEGIN
cResult := mv$removeMaterializedViewLog( 'test1', 'testpocdata' );
cResult := mv$removeMaterializedViewLog( 'test2', 'testpocdata' );
cResult := mv$removeMaterializedViewLog( 'test3', 'testpocdata' );
cResult := mv$removeMaterializedViewLog( 'test4', 'testpocdata' );
cResult := mv$removeMaterializedViewLog( 'test5', 'testpocdata' );
cResult := mv$removeMaterializedViewLog( 'test6', 'testpocdata' );
END
$do$;
INFO: Exception in function mv$removeMaterializedViewLog
CONTEXT: PL/pgSQL function "mv$removematerializedviewlog"(text,text) line 60 at RAISE
PL/pgSQL function inline_code_block line 5 at assignment
INFO: Error 42883:- operator does not exist: bigint[] = smallint:
CONTEXT: PL/pgSQL function "mv$removematerializedviewlog"(text,text) line 61 at RAISE
PL/pgSQL function inline_code_block line 5 at assignment
ERROR: 42883
CONTEXT: PL/pgSQL function "mv$removematerializedviewlog"(text,text) line 62 at RAISE
PL/pgSQL function inline_code_block line 5 at assignment

Problems with group by queries and partitioned tables

Hello,
I have encountered two bugs when using this module.
If the underlying tables are partitioned, or the MV is a group by query, the proccess return an error and VM will be no create

Thanks&Regards

creatematerialisedview function not created

Hi,
I'm not able to see creatematerialisedview under functions after running runCreateFastRefreshModule.sh.

Log file is attached:
fast_refresh_module_install_20240124-1608.log

Following are the functions that are created:

Exception in function mv$clearPgMvLogTableBits Error 40P01:- deadlock detected

Deadlock error occurs when multiple mview refreshes against the same log is trying to clear the same mview log table bits. When there is a larger backlog of data to clear we've seen this deadlock issue on small occasions.

This is the error:-

INFO: Exception in function mv$clearPgMvLogTableBits
INFO: Error 40P01:- deadlock detected:
INFO: Error Context:
UPDATE test_schema.log$_test SET bitmap$[1] = bitmap$[1] - 2 WHERE sequence$ IN ( SELECT sequence$ FROM test_schema.log$_test WHERE bitmap$[1] & 2 = 2 AND sequence$ <= 2293)
INFO: Exception in function mv$refreshMaterializedViewFast
INFO: Error P0001:- 40P01:
INFO: Error Context:
SELECT m_row$,sequence$,dmltype$ FROM comfort0_live.log$_prtyinst WHERE bitmap$[1] & 2 = 2 ORDER BY sequence$
INFO: Exception in function mv$refreshMaterializedViewFull
INFO: Error P0001:- P0001:
ERROR: P0001