aplbrain / grand-cypher Goto Github PK

View Code? Open in Web Editor NEW

66.0 20.0 7.0 56 KB

Implementation of the Cypher language for searching NetworkX graphs

License: Apache License 2.0

Python 100.00%

cypher python neo4j grand networkx graph networks graph-database

grand-cypher's Introduction

GrandCypher

pip install grand-cypher
# Note: You will want a version of grandiso>=2.2.0 for best performance!
pip install -U 'grandiso>=2.2.0'

GrandCypher is a partial (and growing!) implementation of the Cypher graph query language written in Python, for Python data structures.

You likely already know Cypher from the Neo4j Graph Database. Use it with your favorite graph libraries in Python!

Usage

Example Usage with NetworkX:

from grandcypher import GrandCypher
import networkx as nx

GrandCypher(nx.karate_club_graph()).run("""
MATCH (A)-[]->(B)
MATCH (B)-[]->(C)
WHERE A.club == "Mr. Hi"
RETURN A.club, B.club
""")

Example Usage with SQL

Create your own "Sqlite for Neo4j"! This example uses grand-graph to run queries in SQL:

import grand
from grandcypher import GrandCypher

G = grand.Graph(
    backend=grand.backends.SQLBackend(
        db_url="my_persisted_graph.db",
        directed=True
    )
)

# use the networkx-style API for the Grand library:
G.nx.add_node("A", foo="bar")
G.nx.add_edge("A", "B")
G.nx.add_edge("B", "C")
G.nx.add_edge("C", "A")

GrandCypher(G.nx).run("""
MATCH (A)-[]->(B)-[]->(C)
MATCH (C)-[]->(A)
WHERE
    A.foo == "bar"
RETURN
    A, B, C
""")

Feature Parity

Feature	Support
Multiple `MATCH` clauses	✅
`WHERE`-clause filtering on nodes	✅
Anonymous `-[]-` edges	✅
`LIMIT`	✅
`SKIP`	✅
Node/edge attributes with `{}` syntax	✅
`WHERE`-clause filtering on edges	✅
Named `-[]-` edges	✅
Chained `()-[]->()-[]->()` edges	✅ Thanks @khoale88!
Backwards `()<-[]-()` edges	✅ Thanks @khoale88!
Anonymous `()` nodes	✅ Thanks @khoale88!
Undirected `()-[]-()` edges	✅ Thanks @khoale88!
Boolean Arithmetic (`AND`/`OR`)	✅ Thanks @khoale88!
`OPTIONAL MATCH`	🛣
`(:Type)` node-labels	✅ Thanks @khoale88!
`[:Type]` edge-labels	✅ Thanks @khoale88!
Graph mutations (e.g. `DELETE`, `SET`,...)	🛣


✅ = Supported	🛣 = On Roadmap	🔴 = Not Planned

Citing

If this tool is helpful to your research, please consider citing it with:

# https://doi.org/10.1038/s41598-021-91025-5
@article{Matelsky_Motifs_2021,
    title={{DotMotif: an open-source tool for connectome subgraph isomorphism search and graph queries}},
    volume={11},
    ISSN={2045-2322},
    url={http://dx.doi.org/10.1038/s41598-021-91025-5},
    DOI={10.1038/s41598-021-91025-5},
    number={1},
    journal={Scientific Reports},
    publisher={Springer Science and Business Media LLC},
    author={Matelsky, Jordan K. and Reilly, Elizabeth P. and Johnson, Erik C. and Stiso, Jennifer and Bassett, Danielle S. and Wester, Brock A. and Gray-Roncal, William},
    year={2021},
    month={Jun}
}

grand-cypher's People

Stargazers

Watchers

Forkers

abdealiloko vishalbelsare khoale88 wenhoujx zahidabasher

grand-cypher's Issues

NetworkXDialect does not work correctly with networkx.DiGraph

since NetworkXDialect is inherited from networkx.Graph, there happens to be discrepancies between networkx.Graph and networkx.Digraph popagated back to grand.Graph. One of them is the networkx.Graph.edges return EdgeView while networkx.Digraph.edges return OutEdgeView.

Below is one of the test to replicate the issue

def test_nx_edges(self):
        G = Graph(directed=True).nx
        H = nx.DiGraph()
        G.add_edge("1", "2")
        G.add_edge("2", "1")   # <<< this won't work with EdgeView for G
        G.add_edge("1", "3")
        H.add_edge("1", "2")
        H.add_edge("2", "1")   # <<< OutEdgeView returns this for H
        H.add_edge("1", "3")
        self.assertEqual(dict(G.edges), dict(H.edges))
        self.assertEqual(dict(G.edges()), dict(H.edges()))
        self.assertEqual(list(G.edges["1", "2"]), list(H.edges["1", "2"]))

the result is

    def test_nx_edges(self):
        G = Graph(directed=True).nx
        H = nx.DiGraph()
        # H = nx.Graph()
        G.add_edge("1", "2")
        G.add_edge("2", "1")
        G.add_edge("1", "3")
        H.add_edge("1", "2")
        H.add_edge("2", "1")
        H.add_edge("1", "3")
>       self.assertEqual(dict(G.edges), dict(H.edges))
E       AssertionError: {('1', '2'): {}, ('1', '3'): {}} != {('1', '2'): {}, ('1', '3'): {}, ('2', '1'): {}}
E       - {('1', '2'): {}, ('1', '3'): {}}
E       + {('1', '2'): {}, ('1', '3'): {}, ('2', '1'): {}}
E       ?                              ++++++++++++++++

Add aggregation functions

CONTAINS, ENDS WITH, and STARTS WITH

https://neo4j.com/docs/cypher-manual/current/clauses/where/#match-string-negation

Feature request: path groups

For example, the syntax:

MATCH p=(n1 {type: "compilation_unit"})-[]->(n2 {type: "class_declaration"})-[*2]->(n3 {type: "method_declaration"})-->(n4 {text: "Main", type:"identifier"})
RETURN p

The return type for a path from neo4j has start, end, segments, and notably includes nodes or edge that would be traversed by the edge * operator.

I can still work around it, but I would very much like to have access to variable-length paths!

Edge Hopping

My use case is a perfect fit for this feature. I do not know exactly the depth of a branch, so I would like to search for depth starting from a node all the way down or to a limit. It would be good if edge hopping or variable relationship is supported.

From what I understand, the syntax for it is -[*min..max]- where min and max are positive integers. The result is subgraphs having that branch node reaching out from min to max.

License MIT or Apache 2.0?

I believe the license is Apache 2.0 but it comes up as MIT on PyPI due to this line: https://github.com/aplbrain/grand-cypher/blob/master/setup.py#L18?

Is it possible to run node related queries instead of relation related queries?

I apologize for asking this from a place of relatively little understanding. I am working with a project which has brought me further into graphs than I have ever ventured before. For my project I need a way to query my graph for nodes which match a pattern. For the moment I am using https://geronimo-iia.github.io/networkx-query/ . However, I am curious to use this library for the expansive capabilities of cypher. Is it possible to make this query work within the supported syntax of the library?

MATCH (c:City)
WHERE c.name = "London"
RETURN c

I'm reading through the source code to try to answer this for myself but there are a lot of new concepts being introduced to me all at once so I figured it wouldn't hurt to ask. I appreciate the time and effort!

Use grandiso limit arg to implement the cypher LIMIT keyword

This will improve the performance of queries with LIMIT arguments.

WHERE-clause boolean algebra

This involves AND/OR/NOT support (with order-of-operations to match that of Cypher) with parentheses. I think this might be pretty complicated because it will entail backtracking the entire structural match if clauses aren't met; it might make more sense to run OR operands in parallel, so that

MATCH (A)
WHERE (A.type = 1 AND B.type = 1) OR (B.type = 2)
RETURN A

becomes two queries:

MATCH (A)
WHERE (A.type = 1 AND B.type = 1)
RETURN A

MATCH (A)
WHERE (B.type = 2)
RETURN A

But I imagine this will get much more complicated for deeper nesting.

Graph mutations?

Graph mutations (updating, deleting, and creating vertices using Cypher) are a big engineering change, and will likely require a lot of corner-case tests.

I previously listed this as a "not-planned" feature but I wonder if users are interested in this capability existing? Perhaps @khoale88, I wonder what your current use-cases look like? I would be interested in adding this feature back into the roadmap if it will be useful!

Entity types

In Cypher, node and edge types are represented by :ColonNotation. For example,

(A:Neuron)-[AB:Synapse]->(B:Neuron)

NetworkX has no concept of entity "types," so this will be the first time that this codebase mandates a data schema (i.e., a type attribute on the entities in the graph). I'm not sure this is something I want to enforce, but if we do decide to use vertex/edge attributes like this, I'd like to open discussion in this issue to establish what schema we want to support.

Multi-hop graph relationships

In building multi-hop queries, is there currently a method to retrieve the node ids along with the attributes?

i.e for the query

MATCH (A{id: "Vikings"})-[R*0..3]->(B{id: "England"})
RETURN A, R, B
LIMIT 1

The source and target nodes could be added as attributes. In looking at the code, it's also straightforward to add those in all cases or with multi-hop relationships. Lastly, the openCypher standard has startNode and endNode functions.

Curious if there are any thoughts on this use case.

OPTIONAL MATCH

@j6k4m8 do you have any idea how the implementation for optional match should be? Not sure if isomorphic search can do this. This is helpful for query with variable/dynamic length of relationship

Support comments (C-style)

https://neo4j.com/docs/cypher-manual/current/syntax/comments/

Support for Equijoins

Hello, thank you for the great project :)

In how far are equijoins exactly supported?

Given that I have the following NetworkX graph:

G = nx.DiGraph()
G.add_node("x")
G.add_node("y")
G.add_node("z")
G.add_edge("x", "y")
G.add_edge("y", "x")
G.add_edge("x", "x")
G.add_edge("z", "x")

When I execute the following query:

MATCH (n)-->(n)
RETURN n

I get the result:

{Token('CNAME', 'n'): ['x', 'y']}

However, if I execute the same query on the equivalent graph in neo4j, I only get the node x as result - which to my understanding of Cypher would be the correct result.

Therefore, to my understanding, equijoins are currently only supported in the project when they are distributed over multiple match clauses and do not recognizes self cycles on nodes. Is that correct?

For instances, the following query correctly recognizes two loops in the graph:

MATCH (n)-->(m)
MATCH (m)-->(n)
RETURN n, m

While neo4j additionally returns n=x and m=x as a result.

Many regards,
Felix