Git Product home page Git Product logo

erkg's Introduction

Entity Resolved Knowledge Graphs

This hands-on tutorial in Python demonstrates integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph:

  1. Use three datasets describing businesses in Las Vegas: ~85K records, ~2% duplicates.
  2. Run entity resolution in Senzing to resolve duplicate business names and addresses.
  3. Parse results to construct a knowledge graph in Neo4j.
  4. Analyze and visualize the entity resolved knowledge graph.

We'll walk through example code based on Neo4j Desktop and the Graph Data Science (GDS) library to run Cypher queries on the graph, preparing data for downstream analysis and visualizations with Jupyter, Pandas, Seaborn, PyVis.

The code is simple to download and easy to follow, and presented so you can try it with your own data. Overall, this tutorial takes about 35 minutes total to run.

Before and After

Why? For one example, popular use of retrieval augmented generation (RAG) to make AI applications more robust has boosted recent interest in KGs. When the entities, relations, and properties in a KG leverage your domain-specific data to strengthen your AI app ... compliance issues and audits rush to the foreground.

TL;DR: sense-making of the data coming from a connected world. During the transition from data integration to KG construction, you need to make sure the entities in your graph get resolved correctly. Otherwise, your AI app downstream will struggle with the kinds of details that make people get concerned, very concerned, very quickly: e.g., billing, deliveries, voter registration, crucial medical details, credit reporting, industrial safety, security, and so on.

Highly recommended:

Prerequisites

In this tutorial we'll work in two environments. The configuration and coding are at a level which should be comfortable for most people working in data science. You'll need to have familiarity with how to:

  • clone a public repo from GitHub
  • launch a server in the cloud
  • use Linux command lines
  • write some code in Python

Total estimated project time: 35 minutes.

Cloud computing budget: running Senzing in this tutorial cost a total of $0.04 USD.

Set up local environment

After cloning this repo, connect into the ERKG directory and set up your local environment:

git clone https://github.com/DerwenAI/ERKG.git
cd ERKG

python3.11 -m venv venv
source venv/bin/activate

python3 -m pip install -U pip wheel setuptools
python3 -m pip install -r requirements.txt 

We're using Python 3.11 here, although this code should run with most of the recent Python 3.x versions.

Run the tutorial notebooks

First, launch Jupyter:

./venv/bin/jupyter lab

Then based on the tutorial, follow the steps shown in these notebooks:

  1. examples/datasets.ipynb
  2. examples/graph.ipynb
  3. examples/impact.ipynb

You can view the results -- an interactive visualization of the entity resolved knowledge graph -- by loading examples/big_vegas.2.html in a web browser. The full HTML+JavaScript is large and may take several minutes to load.

Deleting data

If you need to clear the database and start over, run this in Neo4j Desktop:

MATCH (n)
CALL {
  WITH n
  DETACH DELETE n
} IN TRANSACTIONS

See: https://neo4j.com/docs/cypher-manual/current/subqueries/subqueries-in-transactions/#delete-with-call-in-transactions

Kudos

Many thanks to: @akollegger, @brianmacy

erkg's People

Contributors

ceteri avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.