Comments (8)
When custom writing the CSV for PostgreSQL, maybe you could use COPY directly? pgloader main advantage here will be handling of errors in the data...
Other than that, it should be possible to send in hstore from pgloader, I will have a look.
from pgloader.
Even though the format should be PostgreSQL-targetted CSV, the data comes from the web.
Even after some filtering, from time to time the data rows are broken. As can be expected, this mostly happens in the middle of the night or on holidays.
As loading happens into a temporary table, I changed its data type to text and do the conversion when inserting the data into the real table. Not the best solution, as there's a chance of failure at that stage, but still much better than without pgloader.
from pgloader.
In the end, quoting was the issue, which I fixed through awk.
I don't know clisp beyond what I had to learn to compile it and pgloader and make pgloader work, so I don't know how to do this as a column definition for pgloader. My experiments doing that caused those rows not to load or log errors, just disappear.
Anyway, the awk just checks whether the hstore column has quotes, doubles them up and encloses the string in quotes as well.
if (index($6,"\"") > 0) {gsub("\"","&&", $6); $6="\""$6"\""}
from pgloader.
I just made a test here, which works fine:
LOAD CSV
FROM INLINE
INTO postgresql://dim@localhost/pgloader?hs
WITH truncate,
fields terminated by '\t',
fields not enclosed,
fields escaped by backslash-quote
SET work_mem to '128MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ create extension if not exists hstore; $$,
$$ drop table if exists hs; $$,
$$ CREATE TABLE hs
(
id serial primary key,
kv hstore
)
$$;
1 email=>[email protected],a=>b
2 test=>value
3 a=>b,c=>"quoted hstore value",d=>other
Then I have:
select *, kv ? 'a', kv -> 'c' from hs;
id | kv | ?column? | ?column?
----+----------------------------------------------------+----------+---------------------
1 | "a"=>"b", "email"=>"[email protected]" | t |
2 | "test"=>"value" | f |
3 | "a"=>"b", "c"=>"quoted hstore value", "d"=>"other" | t | quoted hstore value
(3 rows)
So I'm not sure I understand where your problem comes from actually.
from pgloader.
This is can work for naive load of json, too.
WITH fields not encloses, fields terminated by '0x02', */a control char/* fields escaped by backslash-quote
If we can have character-level control of the fields escaped by
clause, there is more of a guarantee that json can load. For me, backslash-quote is ok.
See this for the idea: http://adpgtech.blogspot.com/2014/09/importing-json-data.html
from pgloader.
I would like to use a real JSON parser here and maybe provide a toolkit/DSL allowing to normalize the JSON data on the fly... Meanwhile I could also extend the escape option clause so that you can use whatever you want to...
from pgloader.
Extending the the escape option would be nice.
from pgloader.
As the CSV parser we are using in pgloader already allows such characters, allowing it in pgloader was indeed quite easy. It's now done with a new test case, see https://github.com/dimitri/pgloader/blob/master/test/csv-json.load for a full example of what I tested.
from pgloader.
Related Issues (20)
- Duration at the summary page is wrong HOT 2
- Common Lisp trivial-utf-8 error: Invalid byte at start of character: 0xFC HOT 2
- DB-CONNECTION-ERROR: Failed to connect to pgsql
- ENUM Being Renamed
- MATERIALIZE VIEWS does not work for MySQL HOT 2
- How to migrate stored procedure in MySQL
- DB-CONNECTION-ERROR: Failed to connect to pgsql
- `:` in the connection string
- Unnable to use MSSQL as a source :
- Backslashes in Enum Values not possible.
- bytes after `0x00` get silently truncated when importing from mysql `text` column to pgsql
- 561400 of lisp codes in a single log for `WARNING Source column "table"."column" is casted to type #1=#S(PGLOADER.CATALOG:SQLTYPE {561398_LINES} which is not the same as "...", the type of current target database column ..."` HOT 1
- case-insensitivty
- `WITH max parallel create index 1` is still parallelly run many `CREATE INDEX` and the importing of next table
- SQLITE: Foreign keys not respecting quote identifiers option
- Migrating a database from MSSQL to PostgreSQL issue
- Underscore in host does not work / how to escape underscore in host name
- getting error ERROR sqlite: Failed to open sqlite file
- Transforming specific column from int(sqlserver) to UUID(postgres)
- Transforming MS-SQL money datatype to PostgreSQL numeric(19, 4) datatype.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgloader.