Comments (5)
Hey @kylemarkwilliams,
can you provide head -100 nodes.csv
of your nodes and rels file ?
Probably something about delimeters?
Also, are you sure that there are no duplicate id's ?
from batch-import.
Hi @jexp
Thanks for the response. I will send you the first 100 lines of the nodes and rels via email since I can't attach them here.
I suspect you're right about it maybe being related to delimiters. My data is very messy (it was automatically extracted) so there may be all sort of weird things going on with it. I've already stripped all non-ASCII characters since the import was failing due to bad characters before I did that.
I suspect that there may be some double quotes (") in my data so for now I will try adding batch_import.csv.quotes=false to my batch.properties and see if that helps.
Also, yes. I'm pretty sure about duplicate id's. I did a select count(id) and select count(distinct id) on my relational database and got the same number of rows.
Thanks
from batch-import.
I managed to resolve this issue and will post the solution here (feel free to remove it if it's not appropriate). I followed the following steps to successfully import all 23million nodes:
- Exported my data from MySQL using the
--default-character-set=utf8
option - Verified that my exported data had the correct number of tab separated columns:
awk -F"\t" 'NF!=$n {print}' $file
where$n
was the number of columns and$file
was the file I exported to. - I edited batch.properties to use a whole lot more memory
- I added
batch_import.csv.quotes=false
to batch.properties to ignore quotes (I exported from MySQL without quotes) - I edited
src/main/java/org/neo4j/batchimport/utils/Chunker.java
and changedprivate static final int BUFSIZE = 32*1024;
toprivate static final int BUFSIZE = 64*1024;
since the import was failing on a node with particularly large properties - I ran batch-import with
-Dfile.encoding=UTF-8
as mentioned in the config file.
By doing this I managed to import 23 million nodes successfully in under 10 minutes.
from batch-import.
Great! Well done.
Adapting batch.properties to your memory requirements is expected.
I can increase BUFSIZE to 1M or something
It would be cool if you could convert this comment into a small blog post, perhaps also describing your domain and how you exported the data from mysql? That would be awesome!
from batch-import.
Sure, will do that later this week and edit and update this post with a link when it's done.
from batch-import.
Related Issues (20)
- Make Chunker.BUFSIZE a configuration option HOT 1
- Index lookup in relationship file throws NumberFormatException HOT 1
- Cannot connect to graphdb created by batch-import tool in Windows OS HOT 3
- NumberFormatException when importing relationships file HOT 1
- Rename master branch to 1.9 HOT 1
- Slow Import / 2G nodes file HOT 2
- Import Error HOT 1
- Error running batch_importer_22 HOT 3
- No relations input
- How can I know which line is it crashing on? HOT 1
- View graph in Neo4j Browser after importing HOT 3
- Imported csv files successfully but query results are empty
- common interface for BatchInserter and GraphDatabaseService?
- Why Batch-Importer is not work when i import about one hundred million nodes?
- Improt error,emergency!!!!
- How can i use this tool to import data that may be duplicated HOT 1
- Download link to binary is broken HOT 2
- Failed to load csv in Neo4j Ver 3.3
- An exception occured while executing the Java class. More than one element in org.mapdb.Bind$5$1@46248627. First element is '983727989' and the second element is '997379223' HOT 1
- version 3.4.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from batch-import.