Comments (8)
@soneymathew I had a test case but it didn't reproduce the error. Anyway i'll try your suggestion.
from neo4j-etl.
I tested the following strategy for a table with ~1 million rows to complete the load in ~46 seconds
set the DB_URL so that it can be refered later
call apoc.static.set('DB_URL', '<connection url>')
not sure if this matters but I set the constraints required for the Node first
CREATE CONSTRAINT ON (t:<TableName>) ASSERT t.id IS UNIQUE;
use apoc procedures to load the data in parallel with a batchsize of 10k
CALL apoc.periodic.iterate('
CALL apoc.static.get("DB_URL") yield value WITH apoc.convert.toString(value) AS DB_URL
call apoc.load.jdbc(DB_URL,"<TableName>") YIELD row RETURN row',
'CREATE (c:<TableName>) SET c += row', {batchSize:10000, parallel:true})
from neo4j-etl.
it's odd the csv export / import should handle json correctly. I have a deeper look at the issue
What is the data type in your relational database?
Yep apoc.periodic.iterate rocks it :)
from neo4j-etl.
It's actually an interesting proposal. Let's discuss this.
So instead of doing the transformation + batching (optionally parallel) in Java, we would do it in apoc instead.
from neo4j-etl.
My initial exploration was to manually do what the ETL tool is facilitating currently.
but as I went thru multiple stages I learned that the APOC approach can be amazingly performant and skips the translation overhead of CSV creation and loading.
my sample dataset is a postgres schema containing a mix of the following
- normal tables/columns with column values as database supported base data types
- some columns are of datatype
text
with content as multiline JSON ( this is the data that messed with the ETL tool usage) - some columns are of datatype
text
with content as XML
on a side note, I noticed that the etl tool silently ignores unsupported datatypes Examples are oid
and tsvector
it will be cool to log it as errors or choices that can be pondered upon perhaps?
from neo4j-etl.
@soneymathew Can you help me to reproduce this issue? For example can you told me the sample database that you use?
Thank you
from neo4j-etl.
@mroiter-larus sorry I won't be able to share my database.
you could probably reproduce this by storing a big pretty printed json file content in a text column
Example: The mapping json generated is a good candidate for test data.
from neo4j-etl.
@mroiter-larus is it important that you need to reproduce it?
Users could be having bad data in database with non-printable characters or any multiline content that can break your CSV. There is a possible risk to data loss along the translations steps DB -> CSV -> LOADCSV step.
I raised this issue to explore if neo4j-etl
can be updated to bypass the CSV generation step all together.
I believe it will help your users from the following perspectives.
- Space savings (CSVs generated out of a large DB can be pretty large as it is uncompressed)
- Avoid risk of data lost in translation by cutting off the CSV step
- possibly better performance as you avoid the intermediate process and straight away import to neo4j from RDBMS
- leverage/experiment with better strategies like concurrent access, batching etc to increase the overall speed in importing
from neo4j-etl.
Related Issues (20)
- Mapping error when host contains backslash
- Neo4j ETL tool struggles with large datamodels
- SQLITE driver HOT 4
- Password is displayed in plain text when you enable "logs" HOT 1
- Better support for the neo4j+s URI scheme (Aura's default) when specifying a connection
- [PostgreSQL] hanging after "An illegal reflective access operation has occurred" HOT 2
- Error while loading MySQL data into Neo4j using ETL HOT 2
- Load Data Model - "Start Mapping" spins forever, never completes HOT 1
- Import data fails with Null Pointer Exception HOT 1
- Neo4j ETL Tool not working on Neo4j Desktop 1.4.9
- How to change the relation’s "start node" and "end node"?
- can't mapping mssql (sqlserver 13) HOT 1
- Can't map postgres to Neo4j graph using Neo4j ETL tool? HOT 1
- Bulk import fails using Neo4j Desktop on Windows due to `InvalidPathException`
- How to import data from views? HOT 2
- Certificate Errors preventing Neo4j ETL Graph App from starting HOT 1
- Error importing data from postgres to neo4J ETL tool HOT 1
- Too many slashes in file URI '////'
- Cannot drive neo4j-etl with my JAVA 8 environment
- Importing / Saving the visualized graph to json/neo4j friendly format
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neo4j-etl.