I've written a COPY trigger used by some of our custo

I just wanted to note that there is a concurrency bug in the version that I hav

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Generalize COPY-to-INSERT trigger about pg_shard HOT 7 CLOSED

citusdata commented on September 23, 2024

Generalize COPY-to-INSERT trigger

from pg_shard.

Comments (7)

rsolari commented on September 23, 2024

I just wanted to note that there is a concurrency bug in the script version that I have been using.

The problem is that the process table, pg_proc, isn't MVCC-safe. When we do REPLACE FUNCTION, we invalidate the cached copies of all of the old versions of the function, and in-flight copies fail.

This problem only occurs when we are doing REPLACE FUNCTION calls, so if you create the function once, you'll be fine. I just wanted to make sure you're aware of the concurrency issue before you productize the trigger.

from pg_shard.

jasonmp85 commented on September 23, 2024

@rsolari — Does your copy of this script have a call to pg_advisory_lock? It was added to guard the CREATE OR REPLACE call because concurrent modifications caused problems…

Or is this a separate issue? It sounds as though you're saying the REPLACE call trips up in-flight executions of the trigger that were otherwise happy…

from pg_shard.

rsolari commented on September 23, 2024

@rsolari — Does your copy of this script have a call to pg_advisory_lock? It was added to guard the CREATE OR REPLACE call because concurrent modifications caused problems…

Yes, there is a lock around CREATE OR REPLACE.

Or is this a separate issue? It sounds as though you're saying the REPLACE call trips up in-flight executions of the trigger that were otherwise happy…

Yep, that's what's happening. Each local process' cached version of the function gets invalidated, and in-flight copies fail.

from pg_shard.

jasonmp85 commented on September 23, 2024

@rsolari So the current (short-term) approach is to make parallel use safe, but still have the same failure mode (failure meaning the COPY failed because of something beyond our control, not because of a bug in parallel access).

I was imagining you could do something like:

Get a huge file to ingest
Split it into n parts
For each part, create a separate writer (COPY) process
Each writer uses COPY to ingest its file
If a file fails partway through (i.e. it ingests only 500 of 1000 lines), skip past the successful lines and retry
After m retries of a file that has failed, take some other action (raise an error or skip a row?)

Assuming we provide a multiprocess-safe COPY-compatible function that returns the number of rows successfully copied, what are you missing? Is your desired workflow significantly different from the above?

from pg_shard.

rsolari commented on September 23, 2024

That workflow sounds like exactly what we want.

from pg_shard.

jasonmp85 commented on September 23, 2024

Hey @rsolari — I know you guys had some issues with the existing script apart from what you've said here, namely:

Having to specify the full path to the file is annoying
You wanted to change some of the OPTIONS provided to COPY

The pull request (#82) I opened has a script that allows relative paths and supports most OPTIONS for COPY, but I was wondering if you also need the ability to explicitly specify what columns are in the input (if, for instance, you want to omit certain columns in your input file). This feature shows up in the COPY syntax as the ( column_name [, ...] ) clause. Do you need support for this right now?

from pg_shard.

rsolari commented on September 23, 2024

Thanks for checking in. We don't need support for specifying columns right now.

We only need support for specifying FORMAT as text, the DELIMITER, and the NULL character. Here's our COPY:

COPY my_table FROM :'filename' WITH(FORMAT text, DELIMITER ',', NULL '\N');

I looked over #82, and it looks like all of things we'd want supported are supported, which is awesome.

from pg_shard.

Generalize COPY-to-INSERT trigger about pg_shard HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent