Git Product home page Git Product logo

Comments (8)

dimitri avatar dimitri commented on May 18, 2024

When custom writing the CSV for PostgreSQL, maybe you could use COPY directly? pgloader main advantage here will be handling of errors in the data...

Other than that, it should be possible to send in hstore from pgloader, I will have a look.

from pgloader.

reist avatar reist commented on May 18, 2024

Even though the format should be PostgreSQL-targetted CSV, the data comes from the web.
Even after some filtering, from time to time the data rows are broken. As can be expected, this mostly happens in the middle of the night or on holidays.

As loading happens into a temporary table, I changed its data type to text and do the conversion when inserting the data into the real table. Not the best solution, as there's a chance of failure at that stage, but still much better than without pgloader.

from pgloader.

reist avatar reist commented on May 18, 2024

In the end, quoting was the issue, which I fixed through awk.
I don't know clisp beyond what I had to learn to compile it and pgloader and make pgloader work, so I don't know how to do this as a column definition for pgloader. My experiments doing that caused those rows not to load or log errors, just disappear.

Anyway, the awk just checks whether the hstore column has quotes, doubles them up and encloses the string in quotes as well.

if (index($6,"\"") > 0) {gsub("\"","&&", $6); $6="\""$6"\""}

from pgloader.

dimitri avatar dimitri commented on May 18, 2024

I just made a test here, which works fine:

LOAD CSV
     FROM INLINE
     INTO postgresql://dim@localhost/pgloader?hs

     WITH truncate,
          fields terminated by '\t',
          fields not enclosed,
          fields escaped by backslash-quote

      SET work_mem to '128MB',
          standard_conforming_strings to 'on'

   BEFORE LOAD DO
    $$ create extension if not exists hstore; $$,
    $$ drop table if exists hs; $$,
    $$ CREATE TABLE hs
       (
          id     serial primary key,
          kv     hstore
       )
    $$;


1   email=>[email protected],a=>b
2   test=>value
3   a=>b,c=>"quoted hstore value",d=>other

Then I have:

select *, kv ? 'a', kv -> 'c' from hs;
 id |                         kv                         | ?column? |      ?column?       
----+----------------------------------------------------+----------+---------------------
  1 | "a"=>"b", "email"=>"[email protected]"               | t        | 
  2 | "test"=>"value"                                    | f        | 
  3 | "a"=>"b", "c"=>"quoted hstore value", "d"=>"other" | t        | quoted hstore value
(3 rows)

So I'm not sure I understand where your problem comes from actually.

from pgloader.

Jacob-Kroeze avatar Jacob-Kroeze commented on May 18, 2024

This is can work for naive load of json, too.
WITH fields not encloses, fields terminated by '0x02', */a control char/* fields escaped by backslash-quote

If we can have character-level control of the fields escaped by clause, there is more of a guarantee that json can load. For me, backslash-quote is ok.

See this for the idea: http://adpgtech.blogspot.com/2014/09/importing-json-data.html

from pgloader.

dimitri avatar dimitri commented on May 18, 2024

I would like to use a real JSON parser here and maybe provide a toolkit/DSL allowing to normalize the JSON data on the fly... Meanwhile I could also extend the escape option clause so that you can use whatever you want to...

from pgloader.

Jacob-Kroeze avatar Jacob-Kroeze commented on May 18, 2024

Extending the the escape option would be nice.

from pgloader.

dimitri avatar dimitri commented on May 18, 2024

As the CSV parser we are using in pgloader already allows such characters, allowing it in pgloader was indeed quite easy. It's now done with a new test case, see https://github.com/dimitri/pgloader/blob/master/test/csv-json.load for a full example of what I tested.

from pgloader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.