Git Product home page Git Product logo

nallm's Introduction

Project NaLLM

Welcome to the NaLLM project repository, where we are exploring and demonstrating the synergies between Neo4j and Large Language Models (LLMs). As a part of our ongoing project, we are focusing on three primary use cases - a Natural Language Interface to a Knowledge Graph, Creating a Knowledge Graph from Unstructured Data and Generate a Report using both static data and data from LLM.

This repository houses both backend and frontend code, designed and organized to facilitate an intuitive journey through our project.

Blog posts

During this project we're also writing blog posts where we deep dive into our learnings and explorations.

  1. https://medium.com/neo4j/harnessing-large-language-models-with-neo4j-306ccbdd2867
  2. https://medium.com/neo4j/knowledge-graphs-llms-fine-tuning-vs-retrieval-augmented-generation-30e875d63a35
  3. https://medium.com/neo4j/knowledge-graphs-llms-multi-hop-question-answering-322113f53f51
  4. https://medium.com/neo4j/knowledge-graphs-llms-real-time-graph-analytics-89b392eaaa95
  5. https://medium.com/neo4j/construct-knowledge-graphs-from-unstructured-text-877be33300a2

Repository Structure

Our repository is designed with an efficient and logical structure for ease of navigation:

  • Backend Code: The backend code is found in the api folder in the main.py file you can find all endpoints and their corresponding functions. All LLM functionality is split into different components which have thier own purpose.

  • Frontend Code: The frontend code is organized into two folders - one for each use case these can be found in ui/src. Each folder contains separate React applications that are independent from each other.

Running the Demos

To simplify the process of running the demos, we have incorporated scripts that generate Docker images. To use these, you'll need to:

  1. Navigate into the root directory.
  2. Create an env file. You can use the env.example file as a template. (The open API key is optional and can be provided from the UI instead)
  3. run docker-compose up to build the images.

This will start the backend and frontend servers, and you can access the demos at the following URLs:

Please note that you'll need Docker installed on your machine to build and run these images. If you haven't already, you can download Docker from here.

Demo database

There is a demo databasing running on demo.neo4jlabs.com. This database is a set of compnaies, thier subsidaiers, people related to the companies and articles mentioned the compnaies. The database is a subset of the Diffbot knowledge graph. You can access it with the following credentaiils:

URI: neo4j+s://demo.neo4jlabs.com
username: companies
password: companies
database: companies

Graph schema

The database contains both structured information about organizations and people as well as news articles. The news articles are linked to the mentioned entity, while the actual text is stored in the Chunk nodes alongside their text-embedding-ada-002 vector representations.

Bring your own database

To run the project on your own database, follow these two steps:

  1. Set appropriate database credentials in .env file
  2. Remove or set appropriate Cypher examples in api/fewshot_examples.py file

Contributing

We welcome contributions and feedback to improve our project and demonstrations. Please feel free to raise issues or submit pull requests.

Note

We want to emphasize that the field of AI and specifically LLMs is rapidly evolving. As such, the information, assumptions, and code contained within this repository are based on our current understanding and are subject to change as new data and technological advancements become available.

Thank you for your interest in our project. We hope you find this repository useful and informative. Stay tuned for more updates as we continue to explore the fascinating world of Neo4j and LLMs!

nallm's People

Contributors

jharris4 avatar noahmay avatar oskarhane avatar tomasonjo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nallm's Issues

Unable to Generate a report

HI,

The selector to pick a company does not appear (if it has to appear) and when I click on the Generate button it stays like processing something but does not finish.
gen_report

How can we evaluate the ability of each LLM to generate Cypher?

NaLLM as an interface library for LLLM and Neo4j, we should consider adding such code, which is used to evaluate the ability of LLM to generate Cypher. It is more convenient for users to quickly localize. It's just a suggestion. We can think about it.

Unable to Access demo.neo4jlabs.com

I have run docker-compose up to build the images, and I can open the interface at http://localhost:4173/ where I can see three use cases. However, every time I click on one of them, I am unable to use it, and I also cannot access demo.neo4jlabs.com.
image
for example,I can not chick Import

"Missing APOC Core plugin" with Neo4j Aura

I'm using Neo4j Aura with GCP and I'm getting the following error when starting the Docker container. I know that I need to install the APOC plugin, but I don't know how to do this with Neo4j Aura.

I can see that Neo4j Aura comes with some pre-installed APOC functions (https://neo4j.com/docs/aura/platform/apoc/?utm_source=Google&utm_medium=PaidSearch&utm_campaign=Evergreen&utm_content=EMEA-Search-SEMCE-DSA-None-SEM-SEM-NonABM&utm_term=&utm_adgroup=DSA&gad_source=1&gclid=CjwKCAjwvIWzBhAlEiwAHHWgvaUqu_woDxxDQp_CL52aSnn3V8Uoj4AEEOlqX0mvK5fv2pshoqBABhoC8YYQAvD_BwE), but I'm still getting the error.

[+] Running 2/0
✔ Container api Created 0.0s
✔ Container ui Created 0.0s
Attaching to api, ui
api | INFO: Will watch for changes in these directories: ['/api']
api | INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
api | INFO: Started reloader process [1] using StatReload
ui |
ui | > [email protected] preview
ui | > vite preview
ui |
ui | ➜ Local: http://localhost:4173/
ui | ➜ Network: http://172.18.0.2:4173/
api | Process SpawnProcess-1:
api | Traceback (most recent call last):
api | File "/api/src/driver/neo4j.py", line 70, in init
api | self.refresh_schema()
api | File "/api/src/driver/neo4j.py", line 116, in refresh_schema
api | node_props = [el["output"] for el in self.query(node_properties_query)]
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "/api/src/driver/neo4j.py", line 116, in
api | node_props = [el["output"] for el in self.query(node_properties_query)]
api | ~~^^^^^^^^^^
api | KeyError: 'output'
api |
api | During handling of the above exception, another exception occurred:
api |
api | Traceback (most recent call last):
api | File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
api | self.run()
api | File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run
api | self._target(*self._args, **self._kwargs)
api | File "/usr/local/lib/python3.11/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
api | target(sockets=sockets)
api | File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 61, in run
api | return asyncio.run(self.serve(sockets=sockets))
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
api | return runner.run(main)
api | ^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
api | return self._loop.run_until_complete(task)
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
api | return future.result()
api | ^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
api | config.load()
api | File "/usr/local/lib/python3.11/site-packages/uvicorn/config.py", line 473, in load
api | self.loaded_app = import_from_string(self.app)
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/site-packages/uvicorn/importer.py", line 21, in import_from_string
api | module = importlib.import_module(module_str)
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/importlib/init.py", line 126, in import_module
api | return _bootstrap._gcd_import(name[level:], package, level)
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "", line 1204, in _gcd_import
api | File "", line 1176, in _find_and_load
api | File "", line 1147, in _find_and_load_unlocked
api | File "", line 690, in _load_unlocked
api | File "", line 940, in exec_module
api | File "", line 241, in _call_with_frames_removed
api | File "/api/src/main.py", line 43, in
api | neo4j_connection = Neo4jDatabase(
api | ^^^^^^^^^^^^^^
api | File "/api/src/driver/neo4j.py", line 72, in init
api | raise ValueError("Missing APOC Core plugin")
api | ValueError: Missing APOC Core plugin
api | Failed to write data to connection ResolvedIPv4Address(('35.242.209.97', 7687)) (ResolvedIPv4Address(('35.242.209.97', 7687)))
api | Failed to write data to connection IPv4Address(('p-ee3efe44-f7a2-0001.production-orch-0036.neo4j.io', 7687)) (ResolvedIPv4Address(('35.242.209.97', 7687)))

Allow custom cypher queries on refresh_schema()

There is an issue in my case where large schema (many properties) cause the token allowed to exceed.
I was able to bypass that using rel_query returns

RETURN {type: nodeLabels, properties: properties[0..5]} AS output

this leads to allowing custom cypher queries to be supported - I'm sure there are other cases where this can be helpful
can you add me as contributor so I can open a PR?
thx

Question: Database subset for Diffbot

How did you create and fill with the data your demo database?. On other words how can I mix structure and structure data using Nallm import page. Did you define a schema and importa the text?

Run docker-compose up to build the images. failed to solve: process "/bin/sh -c mkdir -p $FOLDER" did not complete successfully: exit code: 1

Run docker-compose up to build the images. failed to solve: process "/bin/sh -c mkdir -p $FOLDER" did not complete successfully: exit code: 1

[+] Building 2.6s (2/2) FINISHED                                                                                                                                                            
 => [internal] load build definition from Dockerfile                                                                                                                                   0.0s
 => => transferring dockerfile: 646B                                                                                                                                                   0.0s
 => CANCELED [internal] load metadata for docker.io/library/python:3.11                                                                                                                2.5s
failed to solve: process "/bin/sh -c mkdir -p $FOLDER" did not complete successfully: exit code: 1

Question: is this a typo?

Hi

First thanks a lot for sharing, I am very interested in Neo4j + LLM applications and have been trying to use your code

I was just wondering if there is a bug in line 140 in
In /NaLLM/tree/main/api/src/components/unstructured_data_extractor.py

        {"role": "system", "content": generate_system_message_with_schema()},

should that line instead be
{"role": "system", "content": generate_system_message()},

since it is for the case where no schema is given?

Thanks again

OpenAI RateLimit ERROR || Ample Quota left

Hi,

I have deployed the code as suggested, but while trying to query Knowledge graph, I got the error on terminal/server:

api | [{'role': 'user', 'content': 'Hi'}, {'role': 'user', 'content': 'Which company has generated maximum revenue?'}, {'role': 'user', 'content': 'Question to be converted to Cypher: Which company has generated maximum revenue?'}]
api | Retrying LLM call You exceeded your current quota, please check your plan and billing details.
api | Retrying LLM call You exceeded your current quota, please check your plan and billing details.
api | Retrying LLM call You exceeded your current quota, please check your plan and billing details.

Q. This shouldn't be the case since I have enough quota left in my free tier account for GPT 3.5, kindly review.

Also, there is an uncaught exception for RateLimitError for GPT 3.5 which is implemented for GPT 4:

GPT 4:

api | results {'output': [{'message': 'Error: The model gpt-4 does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.'}], 'generated_cypher': None}

GPT 3.5:

api | INFO: 172.18.0.1:52382 - "POST /questionProposalsForCurrentDb HTTP/1.1" 500 Internal Server Error
api | ERROR: Exception in ASGI application
api | Traceback (most recent call last):
api | File "/api/src/llm/openai.py", line 33, in generate
api | completions = openai.ChatCompletion.create(
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create
api | return super().create(*args, **kwargs)
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
api | response, _, api_key = requestor.request(
api | ^^^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 230, in request
api | resp, got_stream = self._interpret_response(result, stream)
api | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api | File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 624, in _interpret_response
api | self._interpret_response_line(
api | File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 687, in _interpret_response_line
api | raise self.handle_error_response(
api | openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details.

This can be handled, in openai.py..

except openai.error.RateLimitError as e:
    return(f"Rate limit exceeded. Error: {e}")

Need URGENT Help in this issue "Missing APOC Core plugin"

I have follwing the instructions in Readme.md for setting up the docker image to run the project but I am getting "Missing APOC Core plugin" error ,please help me with solving the issue I URGENTLY need this project

Device:MAC

I set up the .env file by the downloaded txt file while setting up AuraDB

  • copied the URL,password,username,Instance01 from the txt file and annotated in .env file
api  | INFO:     Will watch for changes in these directories: ['/api']
api  | INFO:     Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
api  | INFO:     Started reloader process [1] using StatReload
ui   | 
ui   | > [email protected] preview
ui   | > vite preview
ui   | 
ui   |   ➜  Local:   http://localhost:4173/
ui   |   ➜  Network: http://172.18.0.2:4173/
api  | Process SpawnProcess-1:
api  | Traceback (most recent call last):
api  |   File "/api/src/driver/neo4j.py", line 70, in __init__
api  |     self.refresh_schema()
api  |   File "/api/src/driver/neo4j.py", line 116, in refresh_schema
api  |     node_props = [el["output"] for el in self.query(node_properties_query)]
api  |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api  |   File "/api/src/driver/neo4j.py", line 116, in <listcomp>
api  |     node_props = [el["output"] for el in self.query(node_properties_query)]
api  |                   ~~^^^^^^^^^^
api  | KeyError: 'output'
api  | 
api  | During handling of the above exception, another exception occurred:
api  | 
api  | Traceback (most recent call last):
api  |   File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
api  |     self.run()
api  |   File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run
api  |     self._target(*self._args, **self._kwargs)
api  |   File "/usr/local/lib/python3.11/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
api  |     target(sockets=sockets)
api  |   File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 61, in run
api  |     return asyncio.run(self.serve(sockets=sockets))
api  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api  |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
api  |     return runner.run(main)
api  |            ^^^^^^^^^^^^^^^^
api  |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
api  |     return self._loop.run_until_complete(task)
api  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api  |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
api  |     return future.result()
api  |            ^^^^^^^^^^^^^^^
api  |   File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
api  |     config.load()
api  |   File "/usr/local/lib/python3.11/site-packages/uvicorn/config.py", line 473, in load
api  |     self.loaded_app = import_from_string(self.app)
api  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api  |   File "/usr/local/lib/python3.11/site-packages/uvicorn/importer.py", line 21, in import_from_string
api  |     module = importlib.import_module(module_str)
api  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api  |   File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
api  |     return _bootstrap._gcd_import(name[level:], package, level)
api  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api  |   File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
api  |   File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
api  |   File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
api  |   File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
api  |   File "<frozen importlib._bootstrap_external>", line 940, in exec_module
api  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
api  |   File "/api/src/main.py", line 43, in <module>
api  |     neo4j_connection = Neo4jDatabase(
api  |                        ^^^^^^^^^^^^^^
api  |   File "/api/src/driver/neo4j.py", line 72, in __init__
api  |     raise ValueError("Missing APOC Core plugin")
api  | ValueError: Missing APOC Core plugin
api  | Failed to write data to connection ResolvedIPv4Address(('34.126.64.110', 7687)) (ResolvedIPv4Address(('34.126.64.110', 7687)))
api  | Failed to write data to connection IPv4Address(('eaa50530.databases.neo4j.io', 7687)) (ResolvedIPv4Address(('34.126.64.110', 7687)))

Issue in main.py (api) - NEO4J_PASS

Hi, I saw that in line 45 of main.py database=os.environ.get("NEO4J_PASS", "companies") should be replaced by database=os.environ.get("NEO4J_DATABASE", "companies")

neo4j_connection = Neo4jDatabase(
host=os.environ.get("NEO4J_URL", "neo4j+s://demo.neo4jlabs.com"),
user=os.environ.get("NEO4J_USER", "companies"),
password=os.environ.get("NEO4J_PASS", "companies"),
database=os.environ.get("NEO4J_PASS", "companies"),
)

Unable to import unstructured data

I have followed the steps, create .env file in root path and ui folder.
And the followed the suggestion from #68 (comment)
to set the neo4j url : NEO4J_URL=neo4j+s://demo.neo4jlabs.com:7687

However, when I started running docker composer up, I am not able to import data.
image

And the logs show as below.
api | INFO: Started server process [8]
api | INFO: Waiting for application startup.
api | INFO: Application startup complete.
api | INFO: 172.20.0.1:34960 - "GET /hasapikey HTTP/1.1" 200 OK
api | INFO: 172.20.0.1:58074 - "GET /hasapikey HTTP/1.1" 200 OK
api | INFO: 172.20.0.1:45770 - "OPTIONS /data2cypher/data2cypher HTTP/1.1" 200 OK
api | INFO: 172.20.0.1:45770 - "POST /data2cypher/data2cypher HTTP/1.1" 404 Not Found
api | INFO: 172.20.0.1:47902 - "GET / HTTP/1.1" 404 Not Found
api | INFO: 172.20.0.1:53978 - "GET / HTTP/1.1" 404 Not Found

Is there anyone has this problem ?

ValueError: Missing APOC Core plugin (AuraDB)

Hello, we are trying to use this project on our AuraDB instance,
and get the following
ValueError: Missing APOC Core plugin
which is actually out of the box available on Aura
thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.