Comments (7)
I just wanted to note that there is a concurrency bug in the script version that I have been using.
The problem is that the process table, pg_proc
, isn't MVCC-safe. When we do REPLACE FUNCTION
, we invalidate the cached copies of all of the old versions of the function, and in-flight copies fail.
This problem only occurs when we are doing REPLACE FUNCTION
calls, so if you create the function once, you'll be fine. I just wanted to make sure you're aware of the concurrency issue before you productize the trigger.
from pg_shard.
@rsolari — Does your copy of this script have a call to pg_advisory_lock
? It was added to guard the CREATE OR REPLACE
call because concurrent modifications caused problems…
Or is this a separate issue? It sounds as though you're saying the REPLACE
call trips up in-flight executions of the trigger that were otherwise happy…
from pg_shard.
@rsolari — Does your copy of this script have a call to pg_advisory_lock? It was added to guard the CREATE OR REPLACE call because concurrent modifications caused problems…
Yes, there is a lock around CREATE OR REPLACE
.
Or is this a separate issue? It sounds as though you're saying the REPLACE call trips up in-flight executions of the trigger that were otherwise happy…
Yep, that's what's happening. Each local process' cached version of the function gets invalidated, and in-flight copies fail.
from pg_shard.
@rsolari So the current (short-term) approach is to make parallel use safe, but still have the same failure mode (failure meaning the COPY
failed because of something beyond our control, not because of a bug in parallel access).
I was imagining you could do something like:
- Get a huge file to ingest
- Split it into n parts
- For each part, create a separate writer (
COPY
) process - Each writer uses
COPY
to ingest its file - If a file fails partway through (i.e. it ingests only 500 of 1000 lines), skip past the successful lines and retry
- After m retries of a file that has failed, take some other action (raise an error or skip a row?)
Assuming we provide a multiprocess-safe COPY
-compatible function that returns the number of rows successfully copied, what are you missing? Is your desired workflow significantly different from the above?
from pg_shard.
That workflow sounds like exactly what we want.
from pg_shard.
Hey @rsolari — I know you guys had some issues with the existing script apart from what you've said here, namely:
- Having to specify the full path to the file is annoying
- You wanted to change some of the
OPTIONS
provided toCOPY
The pull request (#82) I opened has a script that allows relative paths and supports most OPTIONS
for COPY
, but I was wondering if you also need the ability to explicitly specify what columns are in the input (if, for instance, you want to omit certain columns in your input file). This feature shows up in the COPY
syntax as the ( column_name [, ...] )
clause. Do you need support for this right now?
from pg_shard.
Thanks for checking in. We don't need support for specifying columns right now.
We only need support for specifying FORMAT
as text, the DELIMITER
, and the NULL
character. Here's our COPY
:
COPY my_table FROM :'filename' WITH(FORMAT text, DELIMITER ',', NULL '\N');
I looked over #82, and it looks like all of things we'd want supported are supported, which is awesome.
from pg_shard.
Related Issues (20)
- Allow IMMUTABLE functions in modification commands HOT 8
- DISTINCT and JSON field selection HOT 2
- Does pg_shard support user defined functions? HOT 3
- Does pg_shard encrypt the messages between nodes? HOT 3
- build issue HOT 2
- Support UNLOGGED tables HOT 2
- Schema support incomplete? HOT 1
- Support RETURNING in modification commands HOT 3
- sync-metadata does not support quoted column names HOT 1
- Zapier Test
- Default to CitusDB SELECT plan when within CitusDB
- Let CitusDB invalidate pg_shard's shard interval cache
- Make zero-shard SELECT consistent with CitusDB
- Support "UPSERT"
- Support prepared statements HOT 2
- 1.2.1 build error against PostgreSQL 9.4 HOT 5
- 1.2.1 build error against PostgreSQL 9.5 HOT 2
- cant read data from master [issue moved to pg_shard mailing list]
- Cannot read data in partition sets HOT 3
- Handling bigserial / auto-increment [Moved to mailing list] HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg_shard.