Allow batch importing to already existing database. about batch-import HOT 6 CLOSED

jexp commented on August 16, 2024

Allow batch importing to already existing database.

from batch-import.

Comments (6)

stephanf commented on August 16, 2024

Hi,

I think it is not so easy, you do need the IDs, but that is not all.

What if there are nodes that are in database but not in the import? Should they be deleted? If yes, the relationships of the node, should be deleted too, to keep the data consistent.

That also means that a incremental update would not be feasible.

How to deal with updates on nodes and relationships?

A possible solution could be, to load the import in-Memory temporarily with a embedded Server and to compare them with the existing neo4j database (also embedded).

Then you could iterate through the graphs and compare them, iterating through all nodes and compare properties and relationships.

That would need a lot of resources and probably would not be so fast as importing them into an empty database.

May be I am thinking too complicated, but I think that because of the - in theory - endless complexity of a graph database, there is no simple way compared to a relational database.

But how this problem is solved in an Enterprise environment? Incremental loads and updates do happen everywhere.

Regards,
Stephan

On 21.03.2013, at 15:42, Max De Marzi [email protected] wrote:

We're always getting requests for this. Maybe a way to specify the node id and rel id that the import should start from.

—
Reply to this email directly or view it on GitHub.

from batch-import.

robinloxley1 commented on August 16, 2024

May I know how this issue has been solved?

from batch-import.

jexp commented on August 16, 2024

With a config option see the readme

from batch-import.

aroyc commented on August 16, 2024

Hey Michael, I don't see any option in the documentation to keep Unique nodes.
e.g: If I keep

batch_import.keep_db=true
and run the sample/import.sh twice nodes and rels with the same property are getting created:

neo4j-sh (?)$ MATCH (a)-[r]->(b) RETURN a,b LIMIT 25;
+-------------------------------------------------------------------------------------+
| a | b |
+-------------------------------------------------------------------------------------+
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[1]{age:"14",name:"Selina"} |
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[2]{age:"6",name:"Rana"} |
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[3]{age:"4",name:"Selma"} |
| Node[1]{age:"14",name:"Selina"} | Node[2]{age:"6",name:"Rana"} |
| Node[2]{age:"6",name:"Rana"} | Node[3]{age:"4",name:"Selma"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[5]{age:"14",name:"Selina"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[6]{age:"6",name:"Rana"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[7]{age:"4",name:"Selma"} |
| Node[5]{age:"14",name:"Selina"} | Node[6]{age:"6",name:"Rana"} |
| Node[6]{age:"6",name:"Rana"} | Node[7]{age:"4",name:"Selma"} |
+-------------------------------------------------------------------------------------+

I want to know about the specific option to set in the batch.properties so that the nodes with same properties doesn't get created twice.
TO KEEP IT IN A NUT-SHELL MY QUESTION IS: HOW CAN I USE BATCH INSERT TO MAKE SURE THE SAME NODES/RELS WON'T BE CREATED TWICE

Thanks in advance !

from batch-import.

jexp commented on August 16, 2024

The batch insertion is not about creating unique nodes, sorry, right now that was no focus b/c it will also reduce performance.

The only thing out of the box that I can think of is to control the node id's externally (with id:id as first column) and then use the same externally driven id's again.

If you are starting do to index lookups during batch insertion your performance will drop a lot.

from batch-import.

aroyc commented on August 16, 2024

Okay !!
Thanx a lot for your prompt reply !! :)

Actually I've built a graphDB with a large collection of words. Now I'm trying to integrate DB-pedia and ran into such situation.

from batch-import.

Allow batch importing to already existing database. about batch-import HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent