Git Product home page Git Product logo

Comments (7)

rsolari avatar rsolari commented on September 23, 2024

I just wanted to note that there is a concurrency bug in the script version that I have been using.

The problem is that the process table, pg_proc, isn't MVCC-safe. When we do REPLACE FUNCTION, we invalidate the cached copies of all of the old versions of the function, and in-flight copies fail.

This problem only occurs when we are doing REPLACE FUNCTION calls, so if you create the function once, you'll be fine. I just wanted to make sure you're aware of the concurrency issue before you productize the trigger.

from pg_shard.

jasonmp85 avatar jasonmp85 commented on September 23, 2024

@rsolari — Does your copy of this script have a call to pg_advisory_lock? It was added to guard the CREATE OR REPLACE call because concurrent modifications caused problems…

Or is this a separate issue? It sounds as though you're saying the REPLACE call trips up in-flight executions of the trigger that were otherwise happy…

from pg_shard.

rsolari avatar rsolari commented on September 23, 2024

@rsolari — Does your copy of this script have a call to pg_advisory_lock? It was added to guard the CREATE OR REPLACE call because concurrent modifications caused problems…

Yes, there is a lock around CREATE OR REPLACE.

Or is this a separate issue? It sounds as though you're saying the REPLACE call trips up in-flight executions of the trigger that were otherwise happy…

Yep, that's what's happening. Each local process' cached version of the function gets invalidated, and in-flight copies fail.

from pg_shard.

jasonmp85 avatar jasonmp85 commented on September 23, 2024

@rsolari So the current (short-term) approach is to make parallel use safe, but still have the same failure mode (failure meaning the COPY failed because of something beyond our control, not because of a bug in parallel access).

I was imagining you could do something like:

  1. Get a huge file to ingest
  2. Split it into n parts
  3. For each part, create a separate writer (COPY) process
  4. Each writer uses COPY to ingest its file
  5. If a file fails partway through (i.e. it ingests only 500 of 1000 lines), skip past the successful lines and retry
  6. After m retries of a file that has failed, take some other action (raise an error or skip a row?)

Assuming we provide a multiprocess-safe COPY-compatible function that returns the number of rows successfully copied, what are you missing? Is your desired workflow significantly different from the above?

from pg_shard.

rsolari avatar rsolari commented on September 23, 2024

That workflow sounds like exactly what we want.

from pg_shard.

jasonmp85 avatar jasonmp85 commented on September 23, 2024

Hey @rsolari — I know you guys had some issues with the existing script apart from what you've said here, namely:

  • Having to specify the full path to the file is annoying
  • You wanted to change some of the OPTIONS provided to COPY

The pull request (#82) I opened has a script that allows relative paths and supports most OPTIONS for COPY, but I was wondering if you also need the ability to explicitly specify what columns are in the input (if, for instance, you want to omit certain columns in your input file). This feature shows up in the COPY syntax as the ( column_name [, ...] ) clause. Do you need support for this right now?

from pg_shard.

rsolari avatar rsolari commented on September 23, 2024

Thanks for checking in. We don't need support for specifying columns right now.

We only need support for specifying FORMAT as text, the DELIMITER, and the NULL character. Here's our COPY:

COPY my_table FROM :'filename' WITH(FORMAT text, DELIMITER ',', NULL '\N');

I looked over #82, and it looks like all of things we'd want supported are supported, which is awesome.

from pg_shard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.