Git Product home page Git Product logo

gdl's Introduction

Build Status

GDL - Graph Definition Language

Inspired by the popular graph query language Cypher, which is implemented in Neo4j, I started developing an ANTLR grammar to define property graphs. I added the concept of subgraphs into the language to support multiple, possible overlapping property graphs in one database.

For me, this project is a way to learn more about ANTLR and context-free grammars. Furthermore, GDL is used for unit testing and graph definition in Gradoop, a framework for distributed graph analytics.

The project contains the grammar and a listener implementation which transforms GDL scripts into property graph model elements (i.e. graphs, vertices and edges).

There is also a Rust version of GDL available on GitHub.

Data model

The data model contains three elements: graphs, vertices and edges. Any element has an optional label and can have multiple attributes in the form of key-value pairs. Vertices and edges may be contained in an arbitrary number of graphs including zero graphs. Edges are binary and directed.

Language Examples

Define a vertex:

()

Define a vertex and assign it to variable alice:

(alice)

Define a vertex with label User:

(:User)

Define a vertex with label User, assign it to variable alice and give it some properties:

(alice:User {name : "Alice", age : 23})

Property values can also be null:

(alice:User {name : "Alice", age : 23, city : NULL})

Numeric property values can have specific data types:

(alice:User {name : "Alice", age : 23L, height : 1.82f, weight : 42.7d})

Property values can also be ordered lists:

(alice:User {name : "Alice", age : 23, codes: ["Java", "Rust", "Scala"]})

Define an outgoing edge:

(alice)-->()

Define an incoming edge:

(alice)<--()

Define an edge with label knows, assign it to variable e1 and give it some properties:

(alice)-[e1:knows {since : 2014}]->(bob)

Define multiple outgoing edges from the same source vertex (i.e. alice):

(alice)-[e1:knows {since : 2014}]->(bob)
(alice)-[e2:knows {since : 2013}]->(eve)

Define paths (four vertices and three edges are created):

()-->()<--()-->()

Define a graph with one vertex (graphs can be empty):

[()]

Define a graph and assign it to variable g:

g[()]

Define a graph with label Community:

:Community[()]

Define a graph with label Community, assign it to variable g and give it some properties:

g:Community {title : "Graphs", memberCount : 42}[()]

Define mixed path and graph statements (elements in the paths don't belong to a specific graph):

()-->()<--()-->()
[()]

Define a fragmented graph with variable reuse:

g[(a)-->()]
g[(a)-->(b)]
g[(b)-->(c)]

Define three graphs with overlapping vertex sets (e.g. alice is in g1 and g2):

g1:Community {title : "Graphs", memberCount : 23}[
    (alice:User)
    (bob:User)
    (eve:User)
]
g2:Community {title : "Databases", memberCount : 42}[
    (alice)
]
g2:Community {title : "Hadoop", memberCount : 31}[
    (bob)
    (eve)
]

Define three graphs with overlapping vertex and edge sets (e is in g1 and g2):

g1:Community {title : "Graphs", memberCount : 23}[
    (alice:User)-[:knows]->(bob:User),
    (bob)-[e:knows]->(eve:User),
    (eve)
]
g2:Community {title : "Databases", memberCount : 42}[
    (alice)
]
g2:Community {title : "Hadoop", memberCount : 31}[
    (bob)-[e]->(eve)
]

Query Expressions

As part of his thesis, Max extended the grammar to support MATCH .. WHERE .. statements analogous to Cypher. Besides defining a graph it is now also possible to formulate a query including patterns, variable length paths and predicates:

MATCH (alice:Person)-[:knows]->(bob:Person)-[:knows*2..2]->(eve:Person)
WHERE (alice.name = "Alice" AND bob.name = "Bob") 
OR (alice.age > bob.age)
OR (alice.age > eve.age)

Note that queries always start with the MATCH keyword optionally followed by one or more WHERE clauses.

Usage examples

Add dependency to your maven project:

<dependency>
    <groupId>com.github.s1ck</groupId>
    <artifactId>gdl</artifactId>
    <version>0.3.8</version>
</dependency>

Create a database from a GDL string:

GDLHandler handler = new GDLHandler.Builder().buildFromString("g[(alice)-[e1:knows {since : 2014}]->(bob)]");

for (Vertex v : handler.getVertices()) {
    // do something
}

// access elements by variable
Graph g = handler.getGraphCache().get("g");
Vertex alice = handler.getVertexCache().get("alice");
Edge e = handler.getEdgeCache().get("e1");

Read predicates from a Cypher query:

GDLHandler handler = new GDLHandler.Builder().buildFromString("MATCH (a:Person)-[e:knows]->(b:Person) WHERE a.age > b.age");

// prints (((a.age > b.age AND a.__label__ = Person) AND b.__label__ = Person) AND e.__label__ = knows)
handler.getPredicates().ifPresent(System.out::println);

Create a database from an InputStream or an input file:

GDLHandler handler1 = new GDLHandler.Builder().buildFromStream(stream);
GDLHandler handler2 = new GDLHandler.Builder().buildFromFile(fileName);

Append data to a given handler:

GDLHandler handler = new GDLHandler.Builder().buildFromString("g[(alice)-[e1:knows {since : 2014}]->(bob)]");

handler.append("g[(alice)-[:knows]->(eve)]");

License

Licensed under the Apache License, Version 2.0.

gdl's People

Contributors

darthmax avatar dependabot[bot] avatar florentind avatar foesi avatar lc0197 avatar s1ck avatar taucontrib avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gdl's Issues

Exception handling for GDL syntax errors

Usually, a user would expect the GDL reader to throw an exception in case off the input not matching the correct GDL syntax. The DefaultErrorStrategy only prints out the wrong syntax while the BailErrorStrategy only throws an exception without printing that false input.
An additional ErrorStrategy printing out syntax errors and throwing exceptions would be helpful.

Support For alternative lable

We should support alternate labels in the MATCH clause

E.g.

MATCH (me:Person|:Alien)
RETURN me

Which is equal to

MATCH (me)
WHERE me.__label__ = "Person" OR me.__label__ = "Alien"
RETURN me

Support for GradoopID values

In Order to be able to match certain ID we have to be able to specify GradoopIDs inside a query

I would suggest the following markup

MATCH (a)
WHERE a = GradoopId(0000-0000-00-000)
RETURN

Support labels with _

The Neo4j Cypher default for edge labels (or types) is upper-case snake-case, e.g., :WORKS_AT.
Currently this is not supported by GDL

Bug in complex NOT predicates

Queries conjoining a NOT predicate are not handled correctly.
E.g., the query
MATCH (alice)-[r]->(bob) WHERE bob.age<30 AND NOT alice.age > 50
yields the predicate
(alice.age > 50 AND (NOT bob.age < 30))
in GDLLoader.java.

I think this is due to confusion of FIFO/LIFO functionality in GDLLoader::exitNotExpression
Changing line 561 in GDLLoader.java from

Predicate not = new Not(currentPredicates.pop());

to

Predicate not = new Not(currentPredicates.removeLast());

fixed this issue for me, test cases still worked after this change.

Access variable names directly via the bound elements

e.g.:

String gdlString = "(alice:Person)-e->(:Person)";
GDLHandler gdlHandler = new GDLHandler.Builder().buildFromString(gdlString);
for (Vertex v : gdlHandler.getVertices()) {
  System.out.println(v.getVariable());
}

output:

alice
null

Gdl should fail if two nodes get assigned the same id

E.g. (u1 {score: 1.0}), (u1 {score: 2.0}) shouldn't be valid, as most likely it is a copy-paste error.
In another sense, this is actually trying to guess the users intent. Maybe a warning would also a solution.

From a compiler point of view, this is the equivalent of:

a = 10;
a = 12;

Co-issued: @breakanalysis

Wrong predicates for unlabeled GraphElements

For MATCH (n) or MATCH ()-[e]->() the parser produces the predicates n.__label__ = 'DefaultVertexLabel' and e.__label__ = 'DefaultEdgeLabel' which is not the expected behaviour

GDL to Maven Central

Would be great to reduce the necessary configuration overhead to use GDL by adding it to the maven central instead of our "special" solution as we discussed it in our gradoop-team-meeting.

introduce github actions

Instead of using thridparty travis ci for integration testing we can use the provided github actions to trigger testing with maven.

Configurable error handler

Currently the ANTLR parser uses the default error strategy org.antlr.v4.runtime.DefaultErrorStrategy which just prints to the standard error for each parse error but continues the parsing. The user has no way of knowing/reacting to syntax errors.
If there was a way of specifying an alternative error strategy, for example org.antlr.v4.runtime.BailErrorStrategy via org.antlr.v4.runtime.Parser.setErrorHandler(ANTLRErrorStrategy) the user could choose to wrap the parser in a try/catch-block an catch org.antlr.v4.runtime.misc.ParseCancellationException.ParseCancellationException(Throwable).

Inversion of Comparators is counter-intuitive

The Comparator::getInverse that inverts a comparator looks like

...
case LT: return GT;

case GT: return LT;

case LTE: return GTE;

case GTE: return LTE;
....

However, the inverse of LT should be GTE and similarly the other inverses (GT ->LTE, LTE->GT, GTE->LT).

Add support for <>

We currently use != for not equal. However, Cypher uses <>, hence we should support both?

Support injection for id generation

In a project were we're implementing Cypher we're using GDL to generate test graphs. In order to test whether a GDL-defined graph is identical to a graph generated by our system, we need to control what ids entities in GDL are given.

It would be lovely to support some kind of injection of an id-generator interface, or similar, which we can use to position the id generator in the correct state for our test case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.