gbagan / gmark Goto Github PK

License: MIT License

C++ 98.20% Makefile 0.63% CMake 1.04% Dockerfile 0.09% C 0.04%

gmark's Introduction

gMark

gMark is a domain- and query language-independent framework targeting highly tunable generation of both graph instances and graph query workloads based on user-defined schemas.

For more details about gMark, please refer to our technical report: http://arxiv.org/abs/1511.08386

gMark was demonstrated in VLDB 2016. The gMark research paper was published in the TKDE journal.

If you use gMark, please cite:

@article{BBCFLA17,
  author = {Bagan, G. and Bonifati, A. and Ciucanu, R. and Fletcher, G. H. L. and Lemay, A. and Advokaat, N.},
  title = {{gMark}: Schema-Driven Generation of Graphs and Queries},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  volume = {29},
  number = {4},
  pages = {856--869},
  year = {2017}
}

and/or

@article{BBCFLA16,
  author = {Bagan, G. and Bonifati, A. and Ciucanu, R. and Fletcher, G. H. L. and Lemay, A. and Advokaat, N.},
  title = {Generating Flexible Workloads for Graph Databases},
  journal = {PVLDB},
  volume = {9},
  number = {13},
  pages = {1457--1460},
  year = {2016}
}

How to use gMark

To compile the code:

cd demo/scripts
./compile-all.sh

In the rest of the readme, suppose that we are in demo/scripts.

To generate an entire workflow, use the prepared script play.sh:

./play.sh

This executes the following three steps:

1. Generation of the graph and query workload in internal format, and html reports for both

cd ../../src
./test -c ../use-cases/test.xml -g ../demo/play/play-graph.txt -w ../demo/play/play-workload.xml -r ../demo/play/

where the parameters are:

-c : the configuration file
-g : the output file for the graph instance
-w : the output file for the query workload generated on this instance (in internal format)
-r : the output directory for the html reports

and optionally

-a : to use aliases for the predicates in the generated graph and queries
-n : to specify the number of nodes in the graph (it overrides the parameter from the config file)

The provided configuration files in the directory use-cases are:

test.xml : schema of a bibliographical graph database
shop.xml : schema of an online shop (our gMark encoding of the default schema from WatDiv)
social-network.xml : schema of a social network (our gMark encoding of the schema from LDBC SNB)
uniprot.xml : schema of a protein network (our gMark encoding of the schema extracted from UniProt)

2. Translation of the queries into the four concrete syntaxes

cd querytranslate
./test -w ../../demo/play/play-workload.xml -o ../../demo/play/play-translated

where the parameters are:

-w : the query workload in internal format generated at step 1.
-o : the output directory for the translations of the queries

3. Generation of the query workload interface

cd ../queryinterface
./test -w ../../demo/play/play-workload.xml -t ../../demo/play/play-translated -o ../../demo/play/play-interface

where the parameters are:

-w : the query workload in internal format generated at step 1.
-t : the translations of the queries generated at step 2.
-o : the output directory for the query workload interface

Provided examples

We provide several examples of generated graphs, query workloads in internal format, html reports, translated queries, and query workload interfaces.

You can find them in the directory demo, subdirectories test, test-a, shop, shop-a, social, social-a, uniprot, uniprot-a.

These scenarios basically correspond to the aforementioned four configuration files from use-cases, and for each of them we generated versions without or with aliases (i.e., using integers as predicates or using real-world predicates specified in the configuration file, respectively).

gmark's People

Contributors

Stargazers

Watchers

gmark's Issues

Infinite graph file generation

I cloned and compiled the current version of gmark and now I get a strange behaviour when executing it:
gmark is generating graph files infinitely where most of the files contain only less than 10 edges. For example with the 'play.sh' script it generates three larger files at first (probably because of 'test.xml' definition) and then those smaller files. The filenames look like this (counting upwards):

'play-graph.txt0.txt'
'play-graph.txt1.txt'
'play-graph.txt2.txt'
...

gmark stucks in this generation loop until you kill it.
I tried it also with 'shop.sh', 'test.sh' and different operation systems. I always get the same behaviour.

Has anyone seen this behaviour before or has an idea what the problem could be?

gmark version: ebe0fd7
compilation: with 'demo/scripts/compile-all.sh' on Win10 64bit with Cygwin 2.8.0(0.309/5/3) and also on Ubuntu 14.04 LTS (VM on Windows host)
hardware: AMD Phentom II X6 1090T, 12GB RAM

Let me know if you need more information.

Segmentation fault

I compiled gmark on Ubuntu and ran the scripts. The output seems fine, but a segmentation fault occurs while running the script. To reproduce the issue, I created a Travis configuration file. Its output also shows the error: https://travis-ci.org/FTSRG/gmark/builds/265882692#L518

Why does this segfault occur?

I'd happy to submit a PR if you think Travis CI would be useful for gmark.

Cypher queries have empty UNIONs in them

The translated cypher queries tend to have multiple UNIONs with no RHS. See this query for example. These queries can't be parsed by databases which expose openCypher interfaces (e.g., Neo4j).

Problem with query synthesis : ~/src/test segfaults

Hello,

I have tried to use the tool to generate queries but I have problems changing the use cases. My first problem it that gmark segfaults on various "use cases" I design. For instance, a minimal example where gmark segfaults on my machine (I supposed that only 1 conjunct is allowed but maybe not):

git clone https://github.com/graphMark/gmark.git
cd gmark
sed 's/<conjuncts min="3" max="4"/<conjuncts min="1" max="3"/' -i use-cases/test.xml
cd src
make
./test -c ../use-cases/test.xml -g ../demo/test/test-graph.txt -w ../demo/test/test-workload.xml -r ../demo/test/

Furthermore, I have tried several settings but I cannot manage to create recursive queries. It seems to me that the setting should be the star setting but it does not seem to work. For instance, none of the queries in the demo/ folder seems to include recursivity while the use-cases all have
<multiplicity star="0.5"/>

Thanks in advance !

Wrong syntaxis in generated SQL queries

the conditions generated are in all SQL queries like s2.label = isLocatedIn without quotes surrounding the predicates.

Please add a license to this repo

Could you please add an explicit LICENSE file to the repo so that it's clear under what terms the content is provided, and under what terms user contributions are licensed?

Per GitHub docs on licensing:

[...] without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. If you're creating an open source project, we strongly encourage you to include an open source license.

Thanks!

gbagan / gmark Goto Github PK

gmark's Introduction

gMark

How to use gMark

gmark's People

Contributors

Stargazers

Watchers

Forkers

gmark's Issues

Infinite graph file generation

Segmentation fault

Cypher queries have empty UNIONs in them

Problem with query synthesis : ~/src/test segfaults

Wrong syntaxis in generated SQL queries

Please add a license to this repo

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent