Comments (8)
Similarly, as mentioned in the pg_shard-users list, subqueries might be another work around to the function issue, but they are also not supported yet. (e.g. perform the function on data from a different table on the master and insert only constant values into the sharded table).
from pg_shard.
Since this function is available on the workers, and is deterministic based on the value of the existing row and the new HLL value to be added, there shouldn't be any issue with dispatching this expression through to the workers.
Are you sure this issue is due to the hll_hash_text
function and not the date
one? The HLL function is marked as IMMUTABLE
, which means it should be folded into a constant before being pushed to the workers…
This might be an argument that the error message should be clarified to say which expression is non-constant.
Is the HLL extension installed on the master? Does it know the details of these functions?
from pg_shard.
Is there a hard limitation preventing pg_shard from dispatching modifications for non-constant expressions?
At the moment, yes. pg_shard
performs UPDATE
s by grabbing a lock for a shard and updating all replicas of that shard. This means there is no first-class replication happening but rather a replica-by-replica execution of the query at hand. If that query had e.g. a now()
call, the output might differ among replicas, resulting in data divergence.
So unless we can fold an expression into a constant expression, modifications are unsafe if replication is in play.
from pg_shard.
To usefully combine or update HLL datastructures, you need to call hll_add
, which is not immutable.
I didn't think about replication. That's interesting. I assumed that each shard was responsible for its own replication (streaming WAL logs), not that it was something that pg_shard handled for me (if I'm understanding you correctly).
Thanks for taking the time to correct my misunderstanding.
from pg_shard.
Streaming replication replicates the entire database which is incompatible with our use of many small "logical" shards. Because of this we've been keeping an eye on BDR/UDR, but don't have a timeline or even any concrete designs at this point.
We could probably expand the use of functions to those which are STABLE
so long as we can ensure the snapshots on each replica are identical when we run the command… We'll definitely take your input into consideration during the next cycle, as we like the HLL extension a lot, too and want it to work well with our software.
from pg_shard.
Yeah, one really, really nice property of storing data as HLL data types is that mutation is idempotent, which has a nice resiliency in distributed systems.
from pg_shard.
Hm… so I just checked hll_add
and it is also listed as IMMUTABLE
…
We find the HLL extension very useful for a number of our customers, so I want to make sure it's working well with pg_shard
. I'm curious about the specifics of what you're doing, especially since you've earned yourself the distinction of being the first external party to open a pull request against pg_shard
😁 🏆!
Could you shoot me an email at engage at citusdata dot com to have a quick chat about the problem you're working on?
from pg_shard.
OK, so I've investigated what's going on here. Previously we had the assumption that IMMUTABLE
functions would be collapsed into constants during our call to eval_const_expressions
. Because of that, we expected all allowable function calls to have been reduced to constant expressions by the time our planner gets a hold of a query.
This is obviously true when transforming something like 2 + 2
into a constant 4
. But if the query were something like UPDATE table SET counter = counter + 1
then the presence of a Var
(counter
) would keep eval_const_expressions
from doing its thing (rightly so).
I think we can relax this check by replacing it with a call to contain_mutable_functions
instead, allowing immutable functions to be pushed down to remote nodes even if they can't be fully reduced at the master.
Because UPDATE
s execute against a single shard and its replicas, and because UPDATE
is one-at-a-time (due to locking), it's safe to do this pushdown.
We'll look into this during a future cycle.
from pg_shard.
Related Issues (20)
- DISTINCT and JSON field selection HOT 2
- Does pg_shard support user defined functions? HOT 3
- Does pg_shard encrypt the messages between nodes? HOT 3
- build issue HOT 2
- Support UNLOGGED tables HOT 2
- Schema support incomplete? HOT 1
- Support RETURNING in modification commands HOT 3
- sync-metadata does not support quoted column names HOT 1
- Zapier Test
- Default to CitusDB SELECT plan when within CitusDB
- Let CitusDB invalidate pg_shard's shard interval cache
- Make zero-shard SELECT consistent with CitusDB
- Support "UPSERT"
- Support prepared statements HOT 2
- 1.2.1 build error against PostgreSQL 9.4 HOT 5
- 1.2.1 build error against PostgreSQL 9.5 HOT 2
- cant read data from master [issue moved to pg_shard mailing list]
- Cannot read data in partition sets HOT 3
- Handling bigserial / auto-increment [Moved to mailing list] HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg_shard.