Comments (6)
Hi,
I think it is not so easy, you do need the IDs, but that is not all.
What if there are nodes that are in database but not in the import? Should they be deleted? If yes, the relationships of the node, should be deleted too, to keep the data consistent.
That also means that a incremental update would not be feasible.
How to deal with updates on nodes and relationships?
A possible solution could be, to load the import in-Memory temporarily with a embedded Server and to compare them with the existing neo4j database (also embedded).
Then you could iterate through the graphs and compare them, iterating through all nodes and compare properties and relationships.
That would need a lot of resources and probably would not be so fast as importing them into an empty database.
May be I am thinking too complicated, but I think that because of the - in theory - endless complexity of a graph database, there is no simple way compared to a relational database.
But how this problem is solved in an Enterprise environment? Incremental loads and updates do happen everywhere.
Regards,
Stephan
On 21.03.2013, at 15:42, Max De Marzi [email protected] wrote:
We're always getting requests for this. Maybe a way to specify the node id and rel id that the import should start from.
—
Reply to this email directly or view it on GitHub.
from batch-import.
May I know how this issue has been solved?
from batch-import.
With a config option see the readme
from batch-import.
Hey Michael, I don't see any option in the documentation to keep Unique nodes.
e.g: If I keep
batch_import.keep_db=true
and run the sample/import.sh twice nodes and rels with the same property are getting created:
neo4j-sh (?)$ MATCH (a)-[r]->(b) RETURN a,b LIMIT 25;
+-------------------------------------------------------------------------------------+
| a | b |
+-------------------------------------------------------------------------------------+
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[1]{age:"14",name:"Selina"} |
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[2]{age:"6",name:"Rana"} |
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[3]{age:"4",name:"Selma"} |
| Node[1]{age:"14",name:"Selina"} | Node[2]{age:"6",name:"Rana"} |
| Node[2]{age:"6",name:"Rana"} | Node[3]{age:"4",name:"Selma"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[5]{age:"14",name:"Selina"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[6]{age:"6",name:"Rana"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[7]{age:"4",name:"Selma"} |
| Node[5]{age:"14",name:"Selina"} | Node[6]{age:"6",name:"Rana"} |
| Node[6]{age:"6",name:"Rana"} | Node[7]{age:"4",name:"Selma"} |
+-------------------------------------------------------------------------------------+
I want to know about the specific option to set in the batch.properties so that the nodes with same properties doesn't get created twice.
TO KEEP IT IN A NUT-SHELL MY QUESTION IS: HOW CAN I USE BATCH INSERT TO MAKE SURE THE SAME NODES/RELS WON'T BE CREATED TWICE
Thanks in advance !
from batch-import.
The batch insertion is not about creating unique nodes, sorry, right now that was no focus b/c it will also reduce performance.
The only thing out of the box that I can think of is to control the node id's externally (with id:id as first column) and then use the same externally driven id's again.
If you are starting do to index lookups during batch insertion your performance will drop a lot.
from batch-import.
Okay !!
Thanx a lot for your prompt reply !! :)
Actually I've built a graphDB with a large collection of words. Now I'm trying to integrate DB-pedia and ran into such situation.
from batch-import.
Related Issues (20)
- Make Chunker.BUFSIZE a configuration option HOT 1
- Index lookup in relationship file throws NumberFormatException HOT 1
- Cannot connect to graphdb created by batch-import tool in Windows OS HOT 3
- NumberFormatException when importing relationships file HOT 1
- Rename master branch to 1.9 HOT 1
- Slow Import / 2G nodes file HOT 2
- Import Error HOT 1
- Error running batch_importer_22 HOT 3
- No relations input
- How can I know which line is it crashing on? HOT 1
- View graph in Neo4j Browser after importing HOT 3
- Imported csv files successfully but query results are empty
- common interface for BatchInserter and GraphDatabaseService?
- Why Batch-Importer is not work when i import about one hundred million nodes?
- Improt error,emergency!!!!
- How can i use this tool to import data that may be duplicated HOT 1
- Download link to binary is broken HOT 2
- Failed to load csv in Neo4j Ver 3.3
- An exception occured while executing the Java class. More than one element in org.mapdb.Bind$5$1@46248627. First element is '983727989' and the second element is '997379223' HOT 1
- version 3.4.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from batch-import.