citusdata / pg_shard Goto Github PK

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement

Home Page: https://github.com/citusdata/citus

License: GNU Lesser General Public License v3.0

Makefile 0.31% Shell 0.58% C 92.39% C++ 0.36% PLpgSQL 5.08% SQLPL 1.27%

pg_shard's Issues

Permit use by non-superusers

Currently pg_shard requires superuser access for most activities, and all shards are created as owned by the superuser with superuser-only permissions. In order to make pg_shard work with DBAAS, we'll need to decouple superuser access requirements. It should require superuser to create the extension, but superuser access should not be required to create the shards, provided that the user has create table access on each shard database.

Add package to PostgreSQL package managers

The obvious one is PGXN, though pex has caught my attention as a more familiar alternative (it's similar to Homebrew, from what I can tell).

Queries with CTEs produce misleading error

This is a pretty simple bug: queries with a CTE generate an error message about JOINs (Joins are currently unsupported) rather than one about CTEs. Check out this test output to see what I mean.

I'd expect the error message to say something about CTEs being unsupported.

VOLATILE functions accepted in non-partition key qual. clauses

This is pretty problematic. pg_shard only checks for constant expressions in clauses involving the partition key, meaning anything goes in other clauses. Take this table:

CREATE TABLE limit_orders (
  id bigint PRIMARY KEY,
  symbol text NOT NULL,
  bidder_id bigint NOT NULL,
  placed_at timestamp NOT NULL,
  kind order_side NOT NULL,
  limit_price decimal NOT NULL DEFAULT 0.00 CHECK (limit_price >= 0.00)
);

SELECT master_create_distributed_table('limit_orders', 'id');

Trying to INSERT into this table using non-constant expressions will fail:

INSERT INTO limit_orders VALUES (2036, 'GOOG', 5634, now(), 'buy', random());
# ERROR:  cannot plan sharded modification containing values which are not constants or constant expressions

But pg_shard happily accepts a random() appearing in a DELETE command's clauses:

DELETE FROM limit_orders WHERE id = 246 AND bidder_id = (random() * 1000);

I believe the root cause here is the fact that UPDATE and DELETE share code with SELECT. It's probably OK that SELECT accepts random() in its WHERE clause: for SELECT queries, we need only execute on a single replica, so the fact that calls to random() on two replicas will return different values is irrelevant.

But for DELETE and UPDATE this is an issue: replicas will diverge. We should either always require constant expressions or work towards evaluating function calls on the master before executing remote portions of the plan.

Run an outreach campaign

Identify more talk venues

We've done FOSDEM and will be doing SFPUG shortly, but we should have a ticket like this occasionally to keep us looking for new opportunities.

Detect unsupported partition key types during dist. table creation

If a type lacks a hash function, we should not allow tables to be distributed using it as the type of their partition key. Right now, doing so results in the cryptic cache lookup failed for function 0 during INSERT commands.

master_create_distributed_table should detect these types, refuse to create a distributed table, and print a helpful message.

Avoid copy-pasting ruleutils files

Because ruleutils.c's get_shard_query_def does not have external linkage, pg_shard copy-pastes ruleutils.c from the appropriate PostgreSQL release—with minimal modifications—to enable deparsing a given query back into one with shard identifiers.

These modifications are not likely to be accepted upstream as-is—they reference a shardId field we've added to deparse_context—but PostgreSQL may be open to exposing get_shard_query_def itself for external use. We'd need to:

Devise a method that permits us to use get_shard_query_def without further modifications
Produce a patch for PostgreSQL to expose get_shard_query_def

Citus treats pg_shard tables as local when missing metadata

From @sumedhpathak on November 20, 2014 0:27

If someone doesn't copy over the metadata from pg_shard to CitusDB, but sets use_citusdb_select_logic to true, in CitusDB we treat all pg_shard tables as local tables and read the empty table on the master node, returning 0 results.

This is minor, but can be a usability issue with CitusDB/pg_shard integration.

Consider using binary protocol

We have the option of using the Binary protocol between the master and the worker nodes using PQexecParams. We currently use PQexec, which defaults to TEXT. We thus pay the cost of converting the results from text back to the internal Postgres format, which has performance implications.

We should develop a prototype to test performance of the binary protocol and switch to it if that is worthwhile.

Expose remote error fields to client

When I attempt an INSERT that violates a primary key, the message seen by the client is a generic WARNING about a bad result from the worker node rather than the more meaningful message originally produced by the remote error:

INSERT INTO members VALUES (1, 2, 3);

INSERT INTO members VALUES (1, 2, 3);

# WARNING:  Bad result from shard1.demo:5432
# DETAIL:  Remote message: duplicate key value violates unique constraint "members_id_key_10007"
# WARNING:  Bad result from shard8.demo:5432
# DETAIL:  Remote message: duplicate key value violates unique constraint "members_id_key_10007"
# ERROR:  could not modify any active placements

This is a problem because:

many languages don't capture the detail, just the error message
the error message is cryptic in the extreme

Set up Travis CI and Coveralls

Just your basic single-version build with coverage.

Generalize COPY-to-INSERT trigger

I've written a COPY trigger used by some of our customers which turns a COPY command into many INSERT commands using a temporary table with the same schema as a sharded table.

Unfortunately it's hardcoded to a specific schema. The first step towards users being able to COPY to a sharded table is to generalize this script for any table.

Prevent deletion of metadata when still in use

Running DROP EXTENSION will remove pg_shard's metadata tables, including information about the placement and health of shards. This behavior should only be possible with the CASCADE modifier: otherwise the extension should refuse to be dropped until all distributed tables have been removed.

Perhaps we can look into the dependency tracking available in PostgreSQL's catalog tables to satisfy this need.

Expand sub-SELECT support

Look into what @metdos is going to do and implement something similar in pg_shard.

Unable to restore from backup made with pg_dump

Though pg_shard correctly marks its metadata tables so they are included in the output of pg_dump, these dumps are unusable because the relation_id column of pg_shard's metadata tables is of type oid. Because there is no guarantee a given relation will have identical oids, the oids in the metadata tables will not match any relations, rendering the dump unusable.

I found an old thread that insinuates pg_dump will produce a textual representation of tables that works across a restore boundary if the column is of type regclass instead of oid. As far as lower-level code is concerned, these two types are identical (so we will not need to change any implementation). By switching to regclass we can give users the ability to rebuild a master in an identical fashion to restoring a normal PostgreSQL instance from backup.

Improve metadata synchronization

Give Citus users a general UDF to sync pg_shard metadata with Citus catalogs.

Support in-tree builds

Since the PostgreSQL codebase has such a large library of functions (handling everything from file access to memory allocation) many static analyzers would require substantial "model files" to be able to meaningfully analyze pg_shard and other PostgreSQL plugins.

I've discovered that by building extensions "in-tree" the analyzer can do its thing without any intervention from us: since it has all the source it can reason fully about most paths.

My plan is to create 9.3- and 9.4-based branches in our PostgreSQL fork and add pg_shard and cstore_fdw to the contrib directory as git submodules. By performing a giant build of PostgreSQL and our extensions, we can help analyzers (such as Coverity Scan) do their best.

Run an outreach campaign

Identify individuals or groups for whom we believe pg_shard could be a win and reach out to find whether they are interested in collaboration.

Improve error handling during failed modification

pg_shard's modification logic assumes that any total failure is due to something transient that a retry might overcome. In many cases, an INSERT or UPDATE could fail due to a constraint check, which is not something that a simple retry will overcome without something else changing.

See #31 for an example of what I mean. In its example, the client sees:

# WARNING:  Bad result from shard1.demo:5432
# DETAIL:  Remote message: duplicate key value violates unique constraint "members_id_key_10007"
# WARNING:  Bad result from shard8.demo:5432
# DETAIL:  Remote message: duplicate key value violates unique constraint "members_id_key_10007"
# ERROR:  could not modify any active placements

A well-written application might want to handle the uniqueness violation in a special fashion, but all pg_shard gives it is a generic error about not being able to modify any placements.

We probably want to try a modification on a placement, then:

If the error is in the class of things we think a user cares about (constraints, etc.), we fail-fast and throw them the error
If the error is network related or otherwise "transient", we continue with the remaining shards. If any modification completes, we mark the transient-failure shard as bad

At a higher level, we need to handle modification outcomes in a ternary fashion:

Total Success — the modification completed successfully
Application Failure — the modification returned successfully, but the remote DB raised an error
Infrastructure Failure — the modification didn't even complete, or did so with a network error

Only the third case is deserving of a "could not modify placement" error. In the second we can fail-fast and tell the user what happened.

Creation should error when provided with bad table name

From @samay-sharma on November 20, 2014 15:52

Currently, master_create_distributed_table() works successfully if I pass an index name as the table name and the column on which the index is as the partition key. It also puts an entry into the partition metadata table. master_create_worker_shards() errors out saying that the relation is not a table when an index name is passed to it, which is how I feel master_create_distributed_table() should behave as well.

Again not something we’d expect users to do, but IMO its better if the UDF guards itself against invalid arguments, specially since it is directly exposed to the end users.

Add package to Linux package managers

Though #67 will take care of serious PostgreSQL users, many will prefer to install pg_shard using OS package management tools (apt, etc.). Many PostgreSQL add-ons are included in Ubuntu's built-in repositories (Slony-I shows up as as postgresql-9.3-slony1-2, for instance). If we could get pg_shard in some of these repositories it could make building cloud instances easier (with CloudInit scripts or similar).

At any rate, users might appreciate an install that doesn't require a compilation step (even if it is hidden within PGXN client or pex).

Misleading errors in queries with subselects

From @samay-sharma on November 21, 2014 14:56

I am reporting this mainly because of the query it sends to the worker node. I don't think this is a high priority but we should show a clearer error message on the master.

Steps to reproduce:

SELECT count(*) FROM lineitem WHERE l_orderkey = (SELECT min(l_orderkey) FROM lineitem);
# WARNING:  Bad result from ip-10-102-169-241.ec2.internal:5432
# DETAIL:  Remote message: there is no parameter $0
# WARNING:  Bad result from ip-10-181-241-37.ec2.internal:5432
# DETAIL:  Remote message: there is no parameter $0
# ERROR:  could not receive query results

Worker node log:

# LOG:  statement: SELECT l_orderkey FROM ONLY lineitem_32 WHERE (l_orderkey = $0)
# ERROR:  there is no parameter $0 at character 61
# STATEMENT:  SELECT l_orderkey FROM ONLY lineitem_32 WHERE (l_orderkey = $0)

The error message while having such a WHERE condition in an UPDATE statement is also misleading.

# Showing min value
SELECT min(l_orderkey) FROM lineitem;
#  min 
# -----
#   1
# (1 row)

# This works as expected
UPDATE lineitem SET l_linenumber = 2 WHERE l_orderkey = 1;
# UPDATE 1

# This statement should touch only one shard, so the error message is a bit misleading.
UPDATE lineitem SET l_linenumber = 2 WHERE l_orderkey = (SELECT min(l_orderkey) FROM lineitem);
# ERROR:  cannot modify multiple shards during a single query

Removing existing shards should be easier

We need a function for dropping all shards of a particular table. Otherwise the process of dismantling a sharded table in order to recreate it can be quite complex.

Fail fast for certain DDL commands

#1 and #6 discuss issues around dropping shards (or attempting to use DROP TABLE), but there are larger questions around other DDL commands.

Discuss whether DDL commands against distributed tables should be allowed at all
List any advanced use cases for permitting certain commands
Determine feasibility of intercepting certain commands
Choose mechanism of interception (utility hook or event trigger)
Add code to warn or error when users issue DDL commands against distributed tables

Worker commands omit target relation's schema

From @samay-sharma on November 21, 2014 15:23

Steps to reproduce:

# On master and worker nodes:
CREATE SCHEMA test_schema;
# CREATE SCHEMA

# On master node:
CREATE TABLE test_schema.customer_reviews
(
    customer_id INTEGER,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
);
# CREATE TABLE

SELECT master_create_distributed_table('test_schema.customer_reviews', 'customer_id');
#  master_create_distributed_table
# ---------------------------------
#  
# (1 row)

INSERT INTO test_schema.customer_reviews (customer_id, review_rating) VALUES (4687, 5);
# WARNING:  Bad result from ip-10-181-241-37.ec2.internal:5432
# DETAIL:  Remote message: relation "customer_reviews_86" does not exist
# WARNING:  Bad result from ip-10-102-169-241.ec2.internal:5432
# DETAIL:  Remote message: relation "customer_reviews_86" does not exist
# ERROR:  could not modify any active placements

Worker node log:

# LOG:  statement: INSERT INTO customer_reviews_86 (customer_id, review_rating) VALUES (4687, 5)
# ERROR:  relation "customer_reviews_86" does not exist at character 13
# STATEMENT:  INSERT INTO customer_reviews_86 (customer_id, review_rating) VALUES (4687, 5)

Is HuBoard bidirectional?

Probably…

First-class COPY support

This ticket is to track full support for the COPY command. Unlike the trigger implementation in #61, this would mean supporting a bulk method for data ingestion. Issues like consistency and isolation will show up, as well as failure modes.

Allow a query to supply rows for INSERT commands

Support INSERT ... SELECT for inserting from other tables into a sharded table.

Add more unit tests

Implementing a true multi-node regression test will require more than pg_regress can provide, but we can get most of the way there with carefully isolated unit tests.

Warn when distributing unenforceable constraints

Certain constraints are unenforceable in a distributed setting: UNIQUE indexes foremost among them. pg_shard will happily propagate a UNIQUE index with nary a peep.

The extension should either warn when propagating these constraints or refuse to distribute tables altogether which contain such constraints.

Setting use_citusdb_select_logic crashes without Citus

From @samay-sharma on November 24, 2014 15:11

While using pg_shard with postgres, if I run SET pg_shard.use_citusdb_select_logic TO true and then run a select query, pg_shard crashes.

It think it is possible that a user assumes that he can use the citus select logic just by turning this config on (without installing Citus). I think we have an assert to ensure that this situation does not happen, but should we just replace that assert with an error which explicitly says "you need to install CitusDB in order to use this feature" in order to handle such cases?

Create databases/schemas during shard creation

Currently the user has to manually create databases on each worker to match the master. Since we require that the database names match anyways, and there are no real options here, we should just have master_create_worker_shards create the databases on workers if they don't already exist. Probably we should do this by connecting to the "postgres" database first, which would then require some extra error handling.

Audit error codes and messages

PostgreSQL has a pretty strict style guide for these things and someone needs to do a quick sweep to make sure we're compliant.

Multi-shard SELECT queries pull unnecessary columns

pg_shard is pulling all data from any column named in the WHERE clause across the wire, even if that column is not in the target list. If those columns are large, this makes queries which aren't using the partition key very slow due to data transfer time, and results in huge temporary tables on the master.

select count(*) from members where profile @> '{ "groups" : [{ "name" : "SFPUG" } ] }';
#  count
# --------
#  183891
# (1 row)
#
# Time: 20229.489 ms

select id from members where profile @> '{ "groups" : [ {"name" : "SFPUG" } ] }';
# Time: 20315.859 ms

# select count(*) from members_10008 where profile @> '{"groups" : [ { "name" : "SFPUG" } ] }';
#  count
# -------
#   5689
# (1 row)
#
# Time: 30.092 ms

Even if we did all 32 shards serially, that's still only 950ms. So the 20s response time suggests that pg_shard is pulling all of the data from all shards even though we're not looking at the tuples. In fact:

select count(*) from members;
#   count
# ---------
#  7879817
# (1 row)
#
# Time: 9181.476 ms

But this appears to be because qual pushdown is working strangely:

select id from members where email like 'j%' limit 1000;
# Time: 8607.005 ms

select id from members where email like 'jst%' limit 1000;
# Time: 2651.042 ms

So I did some digging, and it turns out that the problem is that pg_shard returns too much data, regardless of what's in the SELECT list. When the records are JSON blobs or very wide rows, this is a major issue:

select sum(1) from members where profile @> '{ "groups" : [ { "name" : "SFPUG" } ] }';
#   sum
# --------
#  183891
# (1 row)

Notice that there are no columns in the projection list. What should get returned to the master is just a column of 183,000 integers. But:

select pg_size_pretty(pg_total_relation_size('pg_shard_temp_table_7220_1'));
#  pg_size_pretty
# ----------------
#  276 MB
# (1 row)

select * from pg_shard_temp_table_7220_1 limit 1;
# id      |
# email   |
# profile | {
#         |     "address": null,
#         |     "country": "Indonesia",
#         |     "created": "2012-03-11T20:00:32Z",
#         |     "email": "[email protected]",
#         |     "first_name": "Amanda",
#         |     (omitted for brevity)
#         |     "quote": null
#         | }

select sum(1) from members where email like 'js%';
# -[ RECORD 1 ]
# sum | 76993

select * from pg_shard_temp_table_7220_2 limit 5;
# -[ RECORD 1 ]-------------------
# id      |
# email   | [email protected]
# profile |
# -[ RECORD 2 ]-------------------
# id      |
# email   | [email protected]
# profile |

… if any column is used in the WHERE clause, pg_shard pulls that column across the wire even if it's not in the projection list at all.

Misleading error for incompletely created tables

From @samay-sharma on November 19, 2014 15:38

I get this error when I run a select query after running master_create_distributed_table() but before running master_create_worker_shards(). The same query works after running master_create_worker_shards().

Steps to reproduce:

CREATE TABLE customer_reviews
(
    customer_id INTEGER,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
);
# CREATE TABLE

SELECT master_create_distributed_table('customer_reviews', 'customer_id');
#  master_create_distributed_table
# ---------------------------------
#  
# (1 row)

select count(*) from customer_reviews ;
# ERROR:  cannot execute select over multiple shards

I had expected to get 0 as the result. Even if we error out, the error message does not seem very apt.

Temporary tables used by multi-shard SELECTs not dropped

I was a bit startled to see this on the master:

ERROR:  could not extend file "base/16385/t3_17723": No space left on device
HINT:  Check free disk space.

When I checked, there were literally hundreds of temp tables on the master, using up 100% of disk space. I was doing a \watch 10 on an aggregate query across shards, and apparently the temp tables pg_shard needs to resolve this query do not get dropped automatically:

select count(*) from members;
# count
# ---------
#1929821
# (1 row)

\dt
#                     List of relations
# Schema   |            Name            | Type  |  Owner
# -----------+----------------------------+-------+---------
# pg_temp_2 | pg_shard_temp_table_3648_0 | table | psocial
# public    | members                    | table | psocial
# public    | members_local              | table | psocial
# (3 rows)

Temp tables used by pg_shard should be ON COMMIT DROP at the least; ideally they should be dropped as soon as the query is over. I'll see about documenting the master's use of temp tables somewhat.

Write blog series

I think a series of blog posts to get readers' imaginations going could help pg_shard adoption greatly. I'm talking a real-life app with no big hacks (i.e. possibly using ActiveRecord or similar). It'd have a set of traditional tables and one big table that has NoSQL access patterns. We'd show how pg_shard can be used to transparently access the big table within an otherwise traditional application. Any caveats specific to the environment could be addressed.

I think three to five posts would be best, and the code itself should be in a GitHub repository for perusal by interested parties.

Support having multiple head nodes

The missing feature for pg_shard-as-OLTP-big-data tool is the ability to have multiple head nodes so that you can support thousands of connections. This shouldn't be too hard to solve, and may actually be solvable with existing PostgreSQL 9.4 tools.

replicate all of the metadata tables only: these don't get updated that often, so BDR ought to be adequate for it.
streaming replication option. This one is hampered by (a) needing to figure out which node is the master, and (b) pg_shards' use of temporary tables.
FDW option: have metadata be an FDW back to the master node. Likely to add latency and complexity. Also fails to provide us with easy failover from the master node.

Testing waffle.io integration

So is this bidirectional? Or can I actually create issues in waffle.io that don't appear in GitHub?

Queries with HAVING can return incorrect results

Assume a simple table distributed along an id column:

CREATE TABLE employees (
    id integer,
    name text
);

SELECT master_create_distributed_table('employees', 'id');
SELECT master_create_worker_shards('employees', 2, 1);

Using HAVING clauses in SELECT queries works correctly so long as columns referenced in the HAVING clause are in a WHERE, ORDER BY, or GROUP BY clause:

SELECT substr(name, 1, 1) AS init, 
       sum(id)            AS id_total 
FROM   employees 
GROUP  BY init 
HAVING SUM(id) > 200000;
# ┌──────┬──────────┐
# │ init │ id_total │
# ├──────┼──────────┤
# │ H    │   319464 │
# │ M    │  1253983 │
# │ J    │   844768 │
# │ V    │   280801 │
# │ R    │   610906 │
# │ L    │  1166332 │
# │ D    │   829092 │
# │ T    │   700698 │
# │ I    │   201319 │
# │ F    │   284724 │
# │ N    │   363474 │
# │ G    │   401067 │
# │ E    │   697631 │
# │ B    │   526513 │
# │ C    │  1148864 │
# │ S    │  1010327 │
# │ P    │   244061 │
# │ K    │   748454 │
# │ A    │  1079310 │
# └──────┴──────────┘
# (19 rows)

But if columns in the HAVING clause do not appear elsewhere, the query returns incorrect results:

SELECT substr(name, 1, 1) AS init 
FROM   employees 
GROUP  BY init 
HAVING SUM(id) > 200000; 
# ┌──────┐
# │ init │
# ├──────┤
# └──────┘
# (0 rows)

This is probably related to #33, since it appears pg_shard currently only considers quals and projections in deciding which columns to retrieve from the remote. The HAVING clause is evaluated locally, so if its columns are not retrieved it cannot be evaluated correctly.

We should either reject the use of HAVING or make sure it has the columns it needs.

CLUSTER command cannot be extended

ExtendDDLCommand is capable of extending index and table names in an ALTER TABLE table_name CLUSTER ON index_name statement, but when the equivalent CLUSTER table_name USING index_name is provided, DDL extension fails.

Within pg_shard, everything provided to ExtendDDLCommand comes from TableDDLCommandList, which may never produce a CLUSTER statement (it appears to favor ALTER TABLE), but as extend_ddl_commands.c appears to contain some support for this feature, we should investigate to either remove the support or flesh it out.

I first noticed this behavior during testing. I thought I removed my test before merging, but apparently not. I expected those lines to produce a CLUSTER statement with shard IDs in the table and index names, but instead we get the error about node 714 being unsupported.

Support sequences

pg_shard needs to support sequences and SERIAL/BIGSERIAL. The simplest way to do these for now is to have the sequence live on the master. Later we'll need to do something more complicated, but 9.5 will have machinery for that, I think.

PRIMARY KEY modifier ignored during DDL extension

Let's say I have a DDL command similar to the following:

CREATE TABLE employees
(
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    id BIGINT PRIMARY KEY,
    salary NUMERIC DEFAULT 0.00,
    start_date timestamp without TIME zone,
    resume text,
    CONSTRAINT sal_check CHECK (salary >= 0.00)
);

When passed to ExtendDDLCommand with a shard ID of 12345, we get:

CREATE TABLE employees
(
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    id BIGINT,
    salary NUMERIC DEFAULT 0.00,
    start_date timestamp without TIME zone,
    resume text,
    CONSTRAINT sal_check CHECK (salary >= 0.00)
);

While propagating the UNIQUE property of a PRIMARY KEY might be questionable in a sharded environment, the NOT NULL property isn't as ambiguous: we should definitely be propagating it.

Not that—as in #42—this is a theoretical problem. ExtendDDLCommand only technically needs to support the subset of DDL commands produced by TableDDLCommandList. I so far cannot get the latter to produce a CREATE TABLE statement with a PRIMARY KEY clause (it usually turns those into a CREATE TABLE statement followed by an ALTER TABLE statement), but if this ever changes we'll have a problem.

Clarify lack of DROP TABLE support

Though, it seems there is code to do that, testing proves otherwise

postgres=# CREATE TABLE nums(a int);
CREATE TABLE

postgres=# \d
       List of relations
 Schema | Name | Type  | Owner 
--------+------+-------+-------
 public | nums | table | amit

postgres=# SELECT master_create_distributed_table('nums', 'a');
 master_create_distributed_table 
---------------------------------

(1 row)

postgres=# SELECT master_create_worker_shards('nums', 2, 1);
 master_create_worker_shards 
-----------------------------

(1 row)

postgres=# \d
          List of relations
 Schema |    Name    | Type  | Owner 
--------+------------+-------+-------
 public | nums       | table | amit
 public | nums_10000 | table | amit

postgres=# DROP TABLE nums;
DROP TABLE

postgres=# \d
          List of relations
 Schema |    Name    | Type  | Owner 
--------+------------+-------+-------
 public | nums_10000 | table | amit
(1 row)

Did I miss a step or something?

Optimize operator expression construction

MakeOpExpression is used to build:

Equality comparisons against a hash value which replace other equality constraints in SELECT, UPDATE, and DELETE queries
Synthetic clauses used as the hash value of the partition key of an INSERTed row
"Between" comparisons which represent the range of hash values contained within a given shard

@onderkalaci has extensively benchmarked performance characteristics of pg_shard and determined that avoiding this call to GetDefaultOpClass results in a huge performance boost, especially on multi-socket machines (probably due to lock contention).

We previously replaced all calls to GetDefaultOpClass with equivalent calls to lookup_type_cache, but the type cache does not contain the ≥ and ≤ operators needed for the "between" comparison in the above list. At the time we had already returned substantial improvements in other places, so we left this final call site, but now it is apparent we should remove it as well.

I had previously tried building an equivalent OR clause using the greater (or less) than operator and an equality clause (thus enabling the use of simple typecache lookups), but that has performance implications if the number of shards is large.

So the straightforward option here is to just cache the ≥ and ≤ operators ourselves (or cache the entire expression node returned by BuildBaseConstraint and clone it when needed). The need for equality operators can be directly fulfilled by lookup_type_cache responses and we can get rid of all remaining calls to GetDefaultOpClass (and stop doing duplicate work looking up operators for int32 during every query).

Shards backed by foreign tables not marked as such

From @samay-sharma on November 21, 2014 15:12

Currently, the storage type is always set as ‘t’ in pgs_distribution_metadata.shard. We also have a SHARD_STORAGE_FOREIGN in distribution_metadata.h. I think we should use that for foreign tables.

Also, this does not seem to hamper anything. I tried with cstore_fdw and INSERT commands for foreign tables are routed to the correct worker shards. The worker node commands fail as INSERT is not yet implemented for cstore_fdw but pg_shard logic on the master seems to work correctly.

Plan bidirectional metadata sync

Citus and pg_shard should be able to sync distribution metadata back and forth without any user intervention. Determine what approaches could meet this need.

Prohibit non-IMMUTABLE functions in multi-shard SELECT queries

Though all (non-side-effecting) functions are safe to execute in a single-shard SELECT query, the same is not true for multi-shard SELECTs. In particular, a function such as random() or now() would return a different value for each shard and so the query results would not represent a consistent view of the world.

This may overlap somewhat with #47, but that issue is more concerned with single-shard ramifications. We've noticed this bug affects multi-shard queries as well, so they'll need similar treatment.

Support STABLE functions in INSERT statements

We took a conservative approach in function evaluation for INSERT: we evaluate immutable expressions (such as 2 + 2 or char_length('postgres')) during planning but reject any other expressions.

This means certain idioms (like using now() to timestamp a row) are unavailable when using pg_shard. The next logical step would be to coerce STABLE functions into constants during INSERT: evaluating them on the master would ensure all replicas receive an identical value.

citusdata / pg_shard Goto Github PK

pg_shard's Issues

Recommend Projects

Recommend Topics

Recommend Org