ldbc / ldbc_snb_interactive_v1_impls Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 82.0 12.86 MB

Reference implementations for LDBC Social Network Benchmark's Interactive workload.

Home Page: https://ldbcouncil.org/benchmarks/snb-interactive

License: Apache License 2.0

Java 81.89% Shell 11.76% Python 1.08% R 0.43% Dockerfile 0.18% Cypher 3.98% C++ 0.67%

snb

ldbc_snb_interactive_v1_impls's Introduction

LDBC SNB Interactive v1 workload implementations

Reference implementations of the LDBC Social Network Benchmark's Interactive workload (paper, specification on GitHub pages, specification on arXiv).

To get started with the LDBC SNB benchmarks, check out our introductory presentation: The LDBC Social Network Benchmark (PDF).

Notes

⚠️ Please keep in mind the following when using this repository.

The goal of the implementations in this repository is to serve as reference implementations which other implementations can cross-validated against. Therefore, our primary objective was readability and not absolute performance when formulating the queries.
The default workload contains updates which are persisted in the database. Therefore, the database needs to be reloaded or restored from backup before each run. Use the provided scripts/backup-database.sh and scripts/restore-database.sh scripts to achieve this.
We expect most systems-under-test to use multi-threaded execution for their benchmark runs. To allow running the updates on multiple threads, the update stream files need to be partitioned accordingly by the generator. We have pre-generated these for frequent partition numbers (1, 2, ..., 1024 and 24, 48, 96, ..., 768) and scale factors up to 1000.

Implementations

We provide three reference implementations:

Additional implementations:

For detailed instructions, consult the READMEs of the projects.

To build a subset of the projects, use Maven profiles, e.g. to build the reference implementations, run:

mvn clean package -DskipTests -Pcypher,postgres

User's guide

Building the project

This project uses Java 11.

To build the project, run:

scripts/build.sh

Inputs

The benchmark framework relies on the following inputs produced by the SNB Datagen:

Initial data set: the SNB graph in CSV format (social_network/{static,dynamic})
Update streams: the input for the update operations (social_network/updateStream_*.csv)
Substitution parameters: the input parameters for the complex queries. It is produced by the Datagen (substitution_parameters/)

Driver modes

For each implementation, it is possible to perform to perform the run in one of the SNB driver's three modes: create validation parameters, validate, and benchmark. The execution in all three modes should be started after the initial data set was loaded into the system under test.

Create validation parameters with the driver/create-validation-parameters.sh script.
- Inputs:
  - The query substitution parameters are taken from the directory set in ldbc.snb.interactive.parameters_dir configuration property.
  - The update streams are the updateStream_0_0_{forum,person}.csv files from the location set in the ldbc.snb.interactive.updates_dir configuration property.
  - For this mode, the query frequencies are set to a uniform 1 value to ensure the best average test coverage.
- Output: The results will be stored in the validation parameters file (e.g. validation_params.csv) file set in the create_validation_parameters configuration property.
- Parallelism: The execution must be single-threaded to ensure a deterministic order of operations.
Validate against an existing reference output (called "validation parameters") with the driver/validate.sh script.
- Input:
  - The query substitution parameters are taken from the validation parameters file (e.g. validation_params.csv) file set in the validate_database configuration property.
  - The update operations are also based on the content of the validation parameters file.
- Output:
  - The validation either passes of fails.
  - The per query results of the validation are printed to the console.
  - If the validation failed, the results are saved to the validation_params-failed-expected.json and validation_params-failed-actual.json files.
- Parallelism: The execution must be single-threaded to ensure a deterministic order of operations.
Run the benchmark with the driver/benchmark.sh script.
- Inputs:
  - The query substitution parameters are taken from the directory set in ldbc.snb.interactive.parameters_dir configuration property.
  - The update streams are the updateStream_*_{forum,person}.csv files from the location set in the ldbc.snb.interactive.updates_dir configuration property.
    - To get 2n write threads, the framework requires n updateStream_*_forum.csv and n updateStream_*_person.csv files.
    - If you are generating the data sets from scratch, set ldbc.snb.datagen.serializer.numUpdatePartitions to n in the data generator to get produce these.
  - The goal of the benchmark is the achieve the best (lowest possible) time_compression_ratio value while ensuring that the 95% on-time requirement is kept (i.e. 95% of the queries can be started within 1 second of their scheduled time). If your benchmark run returns "failed schedule audit", increase this number (which lowers the time compression rate) until it passes.
  - Set the thread_count property to the size of the thread pool for read operations.
  - For audited benchmarks, ensure that the warmup and operation_count properties are set so that the warmup and benchmark phases last for 30+ minutes and 2+ hours, respectively.
- Output:
  - Passed or failed the "schedule audit" (the 95% on-time requirement).
  - The throughput achieved in the run (operations/second).
  - The detailed results of the benchmark are printed to the console and saved in the results/ directory.
- Parallelism: Multi-threaded execution is recommended to achieve the best result.

For more details on validating and benchmarking, visit the driver's documentation.

Developer's guide

To create a new implementation, it is recommended to use one of the existing ones: the Neo4j implementation for graph database management systems and the PostgreSQL implementation for RDBMSs.

The implementation process looks roughly as follows:

Create a bulk loader which loads the initial data set to the database.
Implement the complex and short reads queries (22 in total).
Implement the 7 update queries.
Test the implementation against the reference implementations using various scale factors.
Optimize the implementation.

Data sets

Benchmark data sets

To generate the benchmark data sets, use the Hadoop-based LDBC SNB Datagen.

The key configurations are the following:

ldbc.snb.datagen.generator.scaleFactor: set this to snb.interactive.${SCALE_FACTOR} where ${SCALE_FACTOR} is the desired scale factor
ldbc.snb.datagen.serializer.numUpdatePartitions: set this to the number of write threads used in the benchmark runs
serializers: set these to the required format, e.g. the ones starting with CsvMergeForeign or CsvComposite
- ldbc.snb.datagen.serializer.dynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer
- ldbc.snb.datagen.serializer.staticSerializer

Pre-generated data sets

Producing large-scale data sets requires non-trivial amounts of memory and computing resources (e.g. SF100 requires 24GB memory and takes about 4 hours to generate on a single machine). To mitigate this, we have pregenerated data sets using 9 different serializers and the update streams using 17 different partition numbers:

Serializers: csv_basic, csv_basic-longdateformatter, csv_composite, csv_composite-longdateformatter, csv_composite_merge_foreign, csv_composite_merge_foreign-longdateformatter, csv_merge_foreign, csv_merge_foreign-longdateformatter, ttl
Partition numbers: 2^k (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024) and 6×2^k (24, 48, 96, 192, 384, 768).

The data sets are available at the SURF/CWI data repository. We also provide direct links and a download script (which stages the data sets from tape storage if they are not immediately available).

We pre-generated validation parameters for SF0.1 to SF10 using the Neo4j reference implementation.

Test data set

The test data sets are placed in the cypher/test-data/ directory for Neo4j and in the postgres/test-data/ for the SQL systems.

To generate a data set with the same characteristics, see the documentation on generating the test data set.

Preparing for an audited run

Implementations of the Interactive workload can be audited by a certified LDBC auditor. The Auditing Policies chapter of the specification describes the auditing process and the required artifacts. If you are considering commissioning an LDBC SNB audit, please study the auditing process document and the audit questionnaire.

Determining the best TCR

Select a scale factor and configure the driver/benchmark.properties file as described in the Driver modes section.
Load the data set with scripts/load-in-one-step.sh.
Create a backup with scripts/backup-database.sh.
Run the driver/determine-best-tcr.sh.
Once the "best TCR" value has been determined, test it with a full workload (at least 0.5h for warmup operation and at least 2h of benchmark time), and make further adjustments if necessary.

Recommendations

We have a few recommendations for creating audited implementations. (These are not requirements – implementations are allowed to deviate from these recommendations.)

The implementation should target a popular Linux distribution (e.g. Ubuntu LTS, CentOS, Fedora).
Use a containerized setup, where the DBMS is running in a Docker container.
Instead of a specific hardware, target a cloud virtual machine instance (e.g. AWS r5d.12xlarge). Both bare-metal and regular instances can be used for audited runs.

ldbc_snb_interactive_v1_impls's People

Contributors

Stargazers

Watchers

ldbc_snb_interactive_v1_impls's Issues

Getting Q7 right

@ArnauPrat @alexaverbuch

I tried to make the specification of Q7 easier to follow by adding variable names such as message1, message2, etc. (both the figure and the text):

https://ldbc.github.io/ldbc_snb_docs_snapshot/bi-read-07.pdf

I have two Cypher implementation: one that does a two-step aggregation (as shown in the figure of the query cards) and another that does it in a single step.

MATCH
  (tag:Tag {name: $tag})<-[:HAS_TAG]-(message1:Message)-[:HAS_CREATOR]->(person1:Person)
MATCH
  (person1)<-[:HAS_CREATOR]-(message2:Message)-[:HAS_TAG]->(tag),
  (message2)<-[:LIKES]-(person2:Person)<-[:HAS_CREATOR]-(message3:Message)<-[l:LIKES]-(person3:Person)
WITH
  person1,
  person2,
  count(DISTINCT l) AS popularityScore
RETURN
  person1.id,
  sum(popularityScore) AS authorityScore
ORDER BY
  authorityScore DESC,
  person1.id ASC
LIMIT 100

MATCH
  (tag:Tag {name: $tag})<-[:HAS_TAG]-(message1:Message)-[:HAS_CREATOR]->(person1:Person)
MATCH
  (person1)<-[:HAS_CREATOR]-(message2:Message)-[:HAS_TAG]->(tag),
  (message2)<-[:LIKES]-(person2:Person)<-[:HAS_CREATOR]-(message3:Message)<-[l:LIKES]-(person3:Person)
RETURN
  person1.id,
  count(DISTINCT l) AS authorityScore
ORDER BY
  authorityScore DESC,
  person1.id ASC
LIMIT 100

I experimented on the SF0.1 data set. For the substitituion :param { tag: 'Yoko_Ono' }, both return the same results:

╒══════════════╤════════════════╕
│"person1.id"  │"authorityScore"│
╞══════════════╪════════════════╡
│987           │7959            │
├──────────────┼────────────────┤
│2199023255688 │5284            │
├──────────────┼────────────────┤
│15393162789569│4171            │
├──────────────┼────────────────┤
│296           │1224            │
├──────────────┼────────────────┤
│407           │1071            │
├──────────────┼────────────────┤
│4398046512201 │477             │
├──────────────┼────────────────┤
│6597069766998 │377             │
├──────────────┼────────────────┤
│2199023256459 │119             │
├──────────────┼────────────────┤
│13194139534376│24              │
├──────────────┼────────────────┤
│19791209300631│6               │
└──────────────┴────────────────┘

However, I don't think this is correct, so we should investigate this further.

Tidy properties files

The db property that specifies the database class can be included in the properties file, e.g.:

workload=com.ldbc.driver.workloads.ldbc.snb.bi.LdbcSnbBiWorkload
db=com.ldbc.impls.workloads.ldbc.snb.jdbc.bi.PostgresBi

This means that it doesn't have to be included in the create..., validate, benchmark scripts.

Also, filenames should be changed, e.g. the current postgres-create_validation_parameters.properties should be renamed as bi-create_validation_parameters.properties to give way to interactive- implementations.

Support parameters in SPARQL queries

Use numbers for storing dates

The current Cypher queries use strings to represent dates. We should consider using numbers to store the same information, yyyymmddHHMMSSmmm-style. Extracting years/months/days will then work like this:

WITH 20100914221133999 AS t
RETURN
  t/10000000000000 AS year,
  t/100000000000%100 AS month,
  t/1000000000%100 AS date

Missing interactive SPARQL queries

Hi,
Thank you for providing the SPARQL implementation of the SNB benchmark. I was looking for all the interactive and business intelligence SPARQL queries of the benchmark. Currently, the folder
https://github.com/ldbc/ldbc_snb_implementations/tree/master/sparql/queries only provides business intelligence queries. Could you please provide me the interactive queries as well. Thanks.

Refactor Stores and States

Generics is used quite inconsistently, the best example being classes like

PostgreSQL schema and load script for Messages

I am working to restore the Interactive workload for PostgreSQL. While doing this, I noticed that the PostgreSQL implementation uses an odd schema for Messages: it uses post for both Comments and Posts:

https://github.com/ldbc/ldbc_snb_implementations/blob/42425052fd187ae031676dca789a20d97eea06c1/postgres/load-scripts/schema.sql#L1-L16

For example, it has a field ps_country (an int), but also a ps_locationid (a bigint, not to be confused with the ps_locationip field, which is a varchar).

According to the schema, all _Message_s should have an isLocatedIn edge connecting them to a country, so ps_locationId is cool, but we don't need ps_country

The specification prescribes the following schema for posts, formatted by the CSV_MERGE_FOREIGN serializer:

comments: id | creationDate | locationIP | browserUsed | content | length | creator | place | replyOfPost | replyOfComment
posts:    id | imageFile | creationDate | locationIP | browserUsed | language | content | length | creator | Forum.id | place

This defines 10 fields for comments and 11 fields for posts, in contrast to the 14 fields defined by the schema.

For posts, the difference can be attributed to the following fields:

    ps_p_creatorid bigint
    ps_replyof bigint
    ps_country int

Currently, this is handled by the load script with awk calls that duplicate field #9 (so ps_creatorid and ps_p_creatorid are get the same value), and leave the last two fields empty (so both ps_replyof and ps_country are set to NULL for all rows):

https://github.com/ldbc/ldbc_snb_implementations/blob/42425052fd187ae031676dca789a20d97eea06c1/postgres/load-scripts/load.sh#L7-L12

Suggestion: this is not a serious bug, but it makes the schema more difficult to comprehend, and is also introduces unnecessary duplication - so we should fix this sooner or later.

cc @jmarton

The interactive benchmark for PostgreSQL hangs using SF-1

Using the default interactive-benchmark.properties, the driver hangs after certain number of operations ( in the example below it's 4). This is the third time this happens running this benchmark. For the first two I was using Ubuntu 18.04 in VirtualBox and I let one of them running overnight for 16 hours. I suspected that the performance was impacted by the virtualization. However I run it in a OS X Catalina this time and it also hanged.

[nix-shell:~/github/ldbc_snb_implementations/postgres]$ java postgres-0.4.0-SNAPSHOT.jar com.ldbc.driver.Client -P interactive-benchmark.properties
ExecuteWorkloadMode  Driver Configuration
ExecuteWorkloadMode  Workload Start Time:	2019-10-20 - 20:17:54.985
Parameters:
	Name:                           LDBC-SNB
	DB:                             com.ldbc.impls.workloads.ldbc.snb.postgres.interactive.PostgresInteractiveDb
	Workload:                       com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbInteractiveWorkload
	Operation Count:                250
	Warmup Count:                   100
	Skip Count:                     0
	Worker Threads:                 1
	Status Display Interval:        00:01.000 (m:s.ms)
	Time Unit:                      MILLISECONDS
	Results Directory:              /Users/chaker/github/ldbc_snb_implementations/postgres/results
	Time Compression Ratio:         0.0010000
	Validation Creation Params:     null
	Database Validation File:       null
	Calculate Workload Statistics:  false
	Spinner Sleep Duration:         00:00.001 (m:s.ms) / 1 (ms)
	Print Help:                     false
	Ignore Scheduled Start Times:   true
	User-defined parameters:
		databaseName = ldbcsf1
		endpoint = localhost:5432
		jdbcDriver = org.postgresql.ds.PGPoolingDataSource
		ldbc.snb.interactive.LdbcQuery10_enable = true
		ldbc.snb.interactive.LdbcQuery10_freq = 40
		ldbc.snb.interactive.LdbcQuery11_enable = true
		ldbc.snb.interactive.LdbcQuery11_freq = 22
		ldbc.snb.interactive.LdbcQuery12_enable = true
		ldbc.snb.interactive.LdbcQuery12_freq = 44
		ldbc.snb.interactive.LdbcQuery13_enable = true
		ldbc.snb.interactive.LdbcQuery13_freq = 19
		ldbc.snb.interactive.LdbcQuery14_enable = true
		ldbc.snb.interactive.LdbcQuery14_freq = 49
		ldbc.snb.interactive.LdbcQuery1_enable = true
		ldbc.snb.interactive.LdbcQuery1_freq = 26
		ldbc.snb.interactive.LdbcQuery2_enable = true
		ldbc.snb.interactive.LdbcQuery2_freq = 37
		ldbc.snb.interactive.LdbcQuery3_enable = true
		ldbc.snb.interactive.LdbcQuery3_freq = 123
		ldbc.snb.interactive.LdbcQuery4_enable = true
		ldbc.snb.interactive.LdbcQuery4_freq = 36
		ldbc.snb.interactive.LdbcQuery5_enable = true
		ldbc.snb.interactive.LdbcQuery5_freq = 78
		ldbc.snb.interactive.LdbcQuery6_enable = true
		ldbc.snb.interactive.LdbcQuery6_freq = 434
		ldbc.snb.interactive.LdbcQuery7_enable = true
		ldbc.snb.interactive.LdbcQuery7_freq = 38
		ldbc.snb.interactive.LdbcQuery8_enable = true
		ldbc.snb.interactive.LdbcQuery8_freq = 5
		ldbc.snb.interactive.LdbcQuery9_enable = true
		ldbc.snb.interactive.LdbcQuery9_freq = 527
		ldbc.snb.interactive.LdbcShortQuery1PersonProfile_enable = true
		ldbc.snb.interactive.LdbcShortQuery2PersonPosts_enable = true
		ldbc.snb.interactive.LdbcShortQuery3PersonFriends_enable = true
		ldbc.snb.interactive.LdbcShortQuery4MessageContent_enable = true
		ldbc.snb.interactive.LdbcShortQuery5MessageCreator_enable = true
		ldbc.snb.interactive.LdbcShortQuery6MessageForum_enable = true
		ldbc.snb.interactive.LdbcShortQuery7MessageReplies_enable = true
		ldbc.snb.interactive.LdbcUpdate1AddPerson_enable = true
		ldbc.snb.interactive.LdbcUpdate2AddPostLike_enable = true
		ldbc.snb.interactive.LdbcUpdate3AddCommentLike_enable = true
		ldbc.snb.interactive.LdbcUpdate4AddForum_enable = true
		ldbc.snb.interactive.LdbcUpdate5AddForumMembership_enable = true
		ldbc.snb.interactive.LdbcUpdate6AddPost_enable = true
		ldbc.snb.interactive.LdbcUpdate7AddComment_enable = true
		ldbc.snb.interactive.LdbcUpdate8AddFriendship_enable = true
		ldbc.snb.interactive.parameters_dir = /tmp/ldbc-snb-1/substitution_parameters/
		ldbc.snb.interactive.short_read_dissipation = 0.2
		ldbc.snb.interactive.update_interleave = 49274
		ldbc.snb.interactive.updates_dir = /tmp/ldbc-snb-1/social_network/
		password = foo
		peer_identifiers =
		printQueryNames = false
		printQueryResults = false
		printQueryStrings = false
		queryDir = queries
		results_log = true
		user = postgres

ExecuteWorkloadMode
 --------------------
 --- Warmup Phase ---
 --------------------
ExecuteWorkloadMode  Scanning workload streams to calculate their limits...
WorkloadStreams  Scanned 0 of 0 - OFFSET
WorkloadStreams  Scanned 100 of 100 - RUN
ExecuteWorkloadMode  Loaded workload: com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbInteractiveWorkload
ExecuteWorkloadMode  Retrieving workload stream: LdbcSnbInteractiveWorkload
ExecuteWorkloadMode  Loaded DB: com.ldbc.impls.workloads.ldbc.snb.postgres.interactive.PostgresInteractiveDb
ExecuteWorkloadMode  Instantiating WorkloadRunner
WorkloadStatusThread  2019/10/20 21:17:50 +0100 Runtime [00:00.000 (m:s.ms)], Operations [0], Last [00:00.000 (m:s.ms)], Throughput (Total) [0.00] (Last 0s) [0.00]
WorkloadStatusThread  2019/10/20 21:17:51 +0100 Runtime [00:01.100 (m:s.ms)], Operations [4], Last [00:00.971 (m:s.ms)], Throughput (Total) [3.64] (Last 1s) [3.64]
.
.
.
WorkloadStatusThread  2019/10/20 21:44:06 +0100 Runtime [26:16.343 (m:s.ms)], Operations [4], Last [26:16.214 (m:s.ms)], Throughput (Total) [0.00] (Last 3s) [0.00]
WorkloadStatusThread  2019/10/20 21:44:08 +0100 Runtime [26:17.446 (m:s.ms)], Operations [4], Last [26:17.317 (m:s.ms)], Throughput (Total) [0.00] (Last 3s) [0.00]
WorkloadStatusThread  2019/10/20 21:44:09 +0100 Runtime [26:18.549 (m:s.ms)], Operations [4], Last [26:18.420 (m:s.ms)], Throughput (Total) [0.00] (Last 3s) [0.00]
WorkloadStatusThread  2019/10/20 21:44:10 +0100 Runtime [26:19.655 (m:s.ms)], Operations [4], Last [26:19.526 (m:s.ms)], Throughput (Total) [0.00] (Last 3s) [0.00]
WorkloadStatusThread  2019/10/20 21:44:11 +0100 Runtime [26:20.757 (m:s.ms)], Operations [4], Last [26:20.628 (m:s.ms)], Throughput (Total) [0.00] (Last 3s) [0.00]
WorkloadStatusThread  2019/10/20 21:44:12 +0100 Runtime [26:21.864 (m:s.ms)], Operations [4], Last [26:21.735 (m:s.ms)], Throughput (Total) [0.00] (Last 3s) [0.00]

The BI benchmark on the other hand works

Shutting down status thread...
ExecuteWorkloadMode  Shutting down workload...
ExecuteWorkloadMode  Shutting down completion time service...
ExecuteWorkloadMode  Shutting down metrics collection service...
ExecuteWorkloadMode
------------------------------------------------------------------------------
Operation Count:                        250
Duration:                               12:16.009.000 (m:s.ms.us)
Throughput:                             0.34 (op/s)
Start Time (Central European Time):     2019-10-20 - 21:00:53.946
Finish Time (Central European Time):    2019-10-20 - 21:13:09.955
------------------------------------------------------------------------------
    LdbcSnbBiQuery10TagPerson
        Units:              MILLISECONDS
        Count:              11
        Min:                70
        Max:                90
        Mean:               79.55
        50th Percentile:    79
        90th Percentile:    87
        95th Percentile:    87
        99th Percentile:    90
    LdbcSnbBiQuery11UnrelatedReplies
        Units:              MILLISECONDS
        Count:              11
        Min:                955
        Max:                1,533
        Mean:               1,133.36
        50th Percentile:    1,132
        90th Percentile:    1,224
        95th Percentile:    1,224
        99th Percentile:    1,533
    LdbcSnbBiQuery12TrendingPosts
        Units:              MILLISECONDS
        Count:              10
        Min:                907
        Max:                1,121
        Mean:               1,039.40
        50th Percentile:    1,057
        90th Percentile:    1,116
        95th Percentile:    1,121
        99th Percentile:    1,121
    LdbcSnbBiQuery13PopularMonthlyTags
        Units:              MILLISECONDS
        Count:              10
        Min:                207
        Max:                287
        Mean:               240.80
        50th Percentile:    226
        90th Percentile:    284
        95th Percentile:    287
        99th Percentile:    287
    LdbcSnbBiQuery14TopThreadInitiators
        Units:              MILLISECONDS
        Count:              10
        Min:                1,216
        Max:                3,615
        Mean:               1,615.60
        50th Percentile:    1,403
        90th Percentile:    1,691
        95th Percentile:    3,615
        99th Percentile:    3,615
    LdbcSnbBiQuery15SocialNormals
        Units:              MILLISECONDS
        Count:              11
        Min:                1
        Max:                7
        Mean:               2.55
        50th Percentile:    2
        90th Percentile:    3
        95th Percentile:    3
        99th Percentile:    7
    LdbcSnbBiQuery17FriendshipTriangles
        Units:              MILLISECONDS
        Count:              10
        Min:                4
        Max:                15
        Mean:               7.50
        50th Percentile:    6
        90th Percentile:    9
        95th Percentile:    15
        99th Percentile:    15
    LdbcSnbBiQuery18PersonPostCounts
        Units:              MILLISECONDS
        Count:              10
        Min:                14,872
        Max:                15,994
        Mean:               15,559.00
        50th Percentile:    15,676
        90th Percentile:    15,870
        95th Percentile:    15,994
        99th Percentile:    15,994
    LdbcSnbBiQuery19StrangerInteraction
        Units:              MILLISECONDS
        Count:              10
        Min:                24,159
        Max:                54,842
        Mean:               34,097.00
        50th Percentile:    29,309
        90th Percentile:    48,014
        95th Percentile:    54,842
        99th Percentile:    54,842
    LdbcSnbBiQuery1PostingSummary
        Units:              MILLISECONDS
        Count:              10
        Min:                1,241
        Max:                1,613
        Mean:               1,380.30
        50th Percentile:    1,323
        90th Percentile:    1,545
        95th Percentile:    1,613
        99th Percentile:    1,613
    LdbcSnbBiQuery20HighLevelTopics
        Units:              MILLISECONDS
        Count:              11
        Min:                1,052
        Max:                3,162
        Mean:               1,867.00
        50th Percentile:    1,302
        90th Percentile:    3,094
        95th Percentile:    3,094
        99th Percentile:    3,162
    LdbcSnbBiQuery21Zombies
        Units:              MILLISECONDS
        Count:              11
        Min:                2,515
        Max:                2,706
        Mean:               2,574.45
        50th Percentile:    2,558
        90th Percentile:    2,642
        95th Percentile:    2,642
        99th Percentile:    2,706
    LdbcSnbBiQuery22InternationalDialog
        Units:              MILLISECONDS
        Count:              11
        Min:                839
        Max:                1,467
        Mean:               990.91
        50th Percentile:    889
        90th Percentile:    1,246
        95th Percentile:    1,246
        99th Percentile:    1,467
    LdbcSnbBiQuery23HolidayDestinations
        Units:              MILLISECONDS
        Count:              11
        Min:                10
        Max:                35
        Mean:               22.09
        50th Percentile:    25
        90th Percentile:    32
        95th Percentile:    32
        99th Percentile:    35
    LdbcSnbBiQuery24MessagesByTopic
        Units:              MILLISECONDS
        Count:              11
        Min:                436
        Max:                3,003
        Mean:               1,123.36
        50th Percentile:    734
        90th Percentile:    2,282
        95th Percentile:    2,282
        99th Percentile:    3,003
    LdbcSnbBiQuery25WeightedPaths
        Units:              MILLISECONDS
        Count:              11
        Min:                6,505
        Max:                8,204
        Mean:               6,990.91
        50th Percentile:    6,812
        90th Percentile:    7,731
        95th Percentile:    7,731
        99th Percentile:    8,204
    LdbcSnbBiQuery2TopTags
        Units:              MILLISECONDS
        Count:              10
        Min:                108
        Max:                139
        Mean:               126.90
        50th Percentile:    128
        90th Percentile:    138
        95th Percentile:    139
        99th Percentile:    139
    LdbcSnbBiQuery3TagEvolution
        Units:              MILLISECONDS
        Count:              10
        Min:                342
        Max:                1,010
        Mean:               600.20
        50th Percentile:    470
        90th Percentile:    878
        95th Percentile:    1,010
        99th Percentile:    1,010
    LdbcSnbBiQuery4PopularCountryTopics
        Units:              MILLISECONDS
        Count:              10
        Min:                36
        Max:                128
        Mean:               78.70
        50th Percentile:    75
        90th Percentile:    114
        95th Percentile:    128
        99th Percentile:    128
    LdbcSnbBiQuery5TopCountryPosters
        Units:              MILLISECONDS
        Count:              10
        Min:                156
        Max:                232
        Mean:               190.80
        50th Percentile:    191
        90th Percentile:    215
        95th Percentile:    232
        99th Percentile:    232
    LdbcSnbBiQuery6ActivePosters
        Units:              MILLISECONDS
        Count:              10
        Min:                120
        Max:                265
        Mean:               178.50
        50th Percentile:    141
        90th Percentile:    256
        95th Percentile:    265
        99th Percentile:    265
    LdbcSnbBiQuery7AuthoritativeUsers
        Units:              MILLISECONDS
        Count:              9
        Min:                986
        Max:                1,074
        Mean:               1,045.67
        50th Percentile:    1,050
        90th Percentile:    1,071
        95th Percentile:    1,074
        99th Percentile:    1,074
    LdbcSnbBiQuery8RelatedTopics
        Units:              MILLISECONDS
        Count:              11
        Min:                85
        Max:                268
        Mean:               134.91
        50th Percentile:    120
        90th Percentile:    238
        95th Percentile:    238
        99th Percentile:    268
    LdbcSnbBiQuery9RelatedForums
        Units:              MILLISECONDS
        Count:              11
        Min:                710
        Max:                1,560
        Mean:               1,016.91
        50th Percentile:    739
        90th Percentile:    1,541
        95th Percentile:    1,541
        99th Percentile:    1,560
------------------------------------------------------------------------------

ExecuteWorkloadMode  Exporting workload metrics to /Users/chaker/github/ldbc_snb_implementations/postgres/results/LDBC-SNB-results.json...
ExecuteWorkloadMode  Shutting down database connector...
ExecuteWorkloadMode  Database connector shutdown successfully in: PT0.001S
ExecuteWorkloadMode  Workload completed successfully

I'm running the benchmarks using a 2019 MBP with 32GB of memory and Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz.

Failed when generating the JAR files for the implementations

Hi,
I tried to generate the JAR files for the implementations with the command mvn clean package -DskipTests. And I got this error:

[INFO] ------------------------< com.ldbc.snb:sparql >-------------------------
[INFO] Building SPARQL DB class 0.0.1-SNAPSHOT                            [5/5]
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] The POM for com.virtuoso.virtjdbc4:virtjdbc4:jar:3.0 is missing, no dependency information available
[WARNING] The POM for virtuoso:virtuoso-sesame4:jar:4.0.0 is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for implementations 0.0.1-SNAPSHOT:
[INFO] 
[INFO] implementations .................................... SUCCESS [  0.900 s]
[INFO] Common classes ..................................... SUCCESS [  4.234 s]
[INFO] Cypher BI class .................................... SUCCESS [  4.118 s]
[INFO] PostgreSQL DB class ................................ SUCCESS [  1.797 s]
[INFO] SPARQL DB class .................................... FAILURE [  0.859 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  12.137 s
[INFO] Finished at: 2019-08-06T19:04:51+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project sparql: Could not resolve dependencies for project com.ldbc.snb:sparql:jar:0.0.1-SNAPSHOT: The following artifacts could not be resolved: com.virtuoso.virtjdbc4:virtjdbc4:jar:3.0, virtuoso:virtuoso-sesame4:jar:4.0.0: Failure to find com.virtuoso.virtjdbc4:virtjdbc4:jar:3.0 in http://maven.stardog.com was cached in the local repository, resolution will not be reattempted until the update interval of stardog-public has elapsed or updates are forced
 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :sparql

It seems that the POMs for virtjdbc4 and virtuoso-sesame4 are missing, and maven can't find either of them in default repository.

Do you know how to fix it ?
Thanks.

Update Neo4j importer

Update the importer to incorporate the changes of ldbc/ldbc_snb_datagen_spark#54

Naming convention for message fields

In the spirit of ldbc/ldbc_snb_docs#135 and ldbc/ldbc_snb_interactive_v1_driver#102

Due to historical reasons, there are some instances of variable names starting with postOrComment or
commentOrPost, instead of message. This is scattered among multiple query and DB driver implementations. See e.g. SPARQL:

$ ag '(commentorpost|postorcomment)' -l | sort
queries/interactive-complex-2.sparql
queries/interactive-complex-7.sparql
queries/interactive-complex-7-with-second.sparql
queries/interactive-complex-9.sparql
src/main/java/com/ldbc/impls/workloads/ldbc/snb/sparql/SparqlDb.java

And Cypher:

$ ag '(commentorpost|postorcomment)' -l  | sort
queries/interactive-complex-2.cypher
queries/interactive-complex-7.cypher
queries/interactive-complex-7-with-lists.cypher
queries/interactive-complex-9.cypher
src/main/java/com/ldbc/impls/workloads/ldbc/snb/cypher/CypherDb.java

Use indexing to support Cypher implementation

At a minimum, we should index id fields.

Implement and validate SPARQL queries

I think the easiest approach would be to validate these against the openCypher queries of #13. See also the PostgreSQL queries in #14.

Implement and validate openCypher queries

I currently validate this against the output of Sparksee. See also the PostgreSQL queries in #14 and the SPARQL queries in #20.

Support PostgreSQL implementation with additional index definitions

Currently, only the primary keys of the PostgreSQL tables are indexed.

Foreign keys should also have indexes to support traversing join path opposite to the foreign key direction.

Use embedded databases for testing

For testing, it'd be worth adding support for embedded databases, i.e. ones that run inside the application without setting up and loading an external database.

Possible candidates:

Neo4j, using the ImpermanentGraphDatabase class and my APOC CSV loader procedure)
PostgreSQL, using Yandex's embedded version and the SQL dump of an existing PostgreSQL instance (that was previously filled from CSVs)
Sesame (and RDF4J) have in-memory implementations. Jena also has one.

Cypher query 13 can be simplified

The lines

https://github.com/ldbc/ldbc_snb_implementations/blob/a36ef729b3e7a8cf33ff95cce4eb87b9d7e007dd/cypher/queries-opencypher/bi-13.cypher#L25-L26

could be simply replaced with popularTags[0..5] AS topPopularTags.

Rework ignore settings 'results' directories

The current ignore settings are not the best - if the user copies the results/ directory of a benchmark run, it has a .gitignore file that ignores the whole content of the directory. This makes it easy to lose benchmark data. Adding an ignore rule to results/* a level further up the hierarchy (e.g. in postgres/) would make more sense.

SPARQL Interactive Query 2 crashes

Query: https://github.com/ldbc/ldbc_snb_implementations/blob/master/sparql/queries/interactive-complex-2.sparql

#LdbcQuery2 / 945 -- Crashed 0 -- Incorrect 0 -- Currently processing LdbcQuery4...
com.ldbc.driver.DbException: org.openrdf.query.QueryEvaluationException: com.complexible.stardog.StardogException: com.complexible.stardog.plan.eval.ExecutionException: Variable used when already in scope: messageId
        at com.ldbc.impls.workloads.ldbc.snb.sparql.operationhandlers.SparqlListOperationHandler.executeOperation(SparqlListOperationHandler.java:49)
        at com.ldbc.impls.workloads.ldbc.snb.sparql.operationhandlers.SparqlListOperationHandler.executeOperation(SparqlListOperationHandler.java:18)
        at com.ldbc.driver.validation.DbValidator.validate(DbValidator.java:75)
        at com.ldbc.driver.client.ValidateDatabaseMode.startExecutionAndAwaitCompletion(ValidateDatabaseMode.java:111)
        at com.ldbc.driver.client.ValidateDatabaseMode.startExecutionAndAwaitCompletion(ValidateDatabaseMode.java:29)
        at com.ldbc.driver.Client.main(Client.java:53)
Caused by: org.openrdf.query.QueryEvaluationException: com.complexible.stardog.StardogException: com.complexible.stardog.plan.eval.ExecutionException: Variable used when already in scope: messageId
        at com.complexible.stardog.sesame.AbstractQuery.executeQuery(AbstractQuery.java:54)
        at com.complexible.stardog.sesame.StardogTupleQuery.evaluate(StardogTupleQuery.java:39)
        at com.ldbc.impls.workloads.ldbc.snb.sparql.operationhandlers.SparqlListOperationHandler.executeOperation(SparqlListOperationHandler.java:33)
        ... 5 more
Caused by: com.complexible.stardog.StardogException: com.complexible.stardog.plan.eval.ExecutionException: Variable used when already in scope: messageId
        at com.complexible.stardog.protocols.http.client.BaseHttpClient.checkResponseCode(BaseHttpClient.java:503)
        at com.complexible.stardog.protocols.http.client.BaseHttpClient.execute(BaseHttpClient.java:375)
        at com.complexible.stardog.protocols.http.client.HttpClientImpl.select(HttpClientImpl.java:202)
        at com.complexible.stardog.protocols.http.client.HttpConnection._select(HttpConnection.java:209)
        at com.complexible.stardog.api.impl.AbstractConnection.executeSelect(AbstractConnection.java:456)
        at com.complexible.stardog.api.impl.SelectQueryImpl.execute(SelectQueryImpl.java:38)
        at com.complexible.stardog.api.impl.SelectQueryImpl.execute(SelectQueryImpl.java:25)
        at com.complexible.stardog.sesame.AbstractQuery.executeQuery(AbstractQuery.java:47)
        ... 7 more
Caused by: com.stardog.stark.query.QueryExecutionFailure: com.complexible.stardog.plan.eval.ExecutionException: Variable used when already in scope: messageId
        ... 15 more

The reason is the return expression (?messageId AS ?messageId) which used to compile but it seems it no longer does so.

Interactive tests fails on Travis

As noted in 8bfa3b4, interactive tests fail on Travis:

They work locally from both Maven and IntelliJ, but fail on Travis with the following error: "could not find array type for data type character varying[]"

Additionally, I have pretty much the same PostgreSQL version as Travis does (9.6.2 locally vs 9.6.4 on Travis). Anyways, I would like to focus on BI now, so I don't have the time to dig into the cause of this.

Generate validation data sets with PostgreSQL

Add configuration that allows generating validation data sets for PostgreSQL queries.

Implement and validate PostgreSQL BI queries

I think the easiest approach would be to validate these against the openCypher queries of #13.

Rename 'post' to 'message' in SQL schema

Followup to #40 (comment)

0.4.0-SNAPSHOT of the driver doesn't exist

The version of the driver was bumped to 0.4.0-SNAPSHOT per 7cf3c3a. However, at the time of writing this, the latest version of the driver is 0.3.2 ldbc/ldbc_snb_interactive_v1_driver@42c7640. 7cf3c3a needs to be reverted until a a new version of the driver is released.

Allow to generate message CSV for PostgreSQL

The PostgreSQL schema currently has a unified message table, however, the table is fed from separate CSV files.

In order to allow for easier easy experimenting, we are to add an options to the PostgreSQL converter (load.sh):

PG_CREATE_MESSAGE_FILE: control creating a unified message data file of posts and comments. Possible values:
- no: don't create message file, as we traditionally did. This is the default.
- create: create message file, with no guarantee on being sorted
- sort_by_date: create message file, sorted by creation date

PostgreSQL schema: remove duplicate colum in the 'post' table

post table currently have two columns for the hasCreator relationship:

https://github.com/ldbc/ldbc_snb_implementations/blob/e7aff8b6403db4412ce849369d0b6be9dc9d7338/postgres/load-scripts/schema.sql#L10-L11

In the loading script, both columns are populated for posts with the same value ($9 in the snippet below)

https://github.com/ldbc/ldbc_snb_implementations/blob/e7aff8b6403db4412ce849369d0b6be9dc9d7338/postgres/load-scripts/load.sh#L7-L9

and only the first is populated for comments (see $7 and blank after it):

https://github.com/ldbc/ldbc_snb_implementations/blob/e7aff8b6403db4412ce849369d0b6be9dc9d7338/postgres/load-scripts/load.sh#L10-L12

Can we remove the ps_p_creatorid column from the PostgreSQL schema? What do you think about this?

Workload could not initialize due to missing parameters

Hi.
I tried to run ./interactive-create-validation-parameters.sh for cypher interactive workload, and I haven't changed anything except my user name and password in interactive-create-validation-parameters.properties.
I got the following error:

$./interactive-create-validation-parameters.sh 
Client  Client terminated unexpectedly
com.ldbc.driver.ClientException: Error loading Workload class: com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbI
nteractiveWorkload
        at com.ldbc.driver.client.CreateValidationParamsMode.init(CreateValidationParamsMode.java:55)
        at com.ldbc.driver.Client.main(Client.java:52)
Caused by: com.ldbc.driver.WorkloadException: Workload could not initialize due to missing parameters: [ldbc.snb.inter
active.LdbcUpdate2AddPostLike_enable, ldbc.snb.interactive.LdbcUpdate5AddForumMembership_enable, ldbc.snb.interactive.
LdbcQuery13_enable, ldbc.snb.interactive.LdbcQuery12_enable, ldbc.snb.interactive.LdbcQuery14_enable, ldbc.snb.interac
tive.LdbcUpdate6AddPost_enable, ldbc.snb.interactive.LdbcShortQuery6MessageForum_enable, ldbc.snb.interactive.LdbcUpda
te8AddFriendship_enable, ldbc.snb.interactive.LdbcQuery8_enable, ldbc.snb.interactive.LdbcQuery9_enable, ldbc.snb.inte
ractive.LdbcShortQuery5MessageCreator_enable, ldbc.snb.interactive.LdbcQuery10_enable, ldbc.snb.interactive.LdbcQuery1
1_enable, ldbc.snb.interactive.LdbcShortQuery4MessageContent_enable, ldbc.snb.interactive.LdbcUpdate7AddComment_enable
, ldbc.snb.interactive.LdbcShortQuery7MessageReplies_enable, ldbc.snb.interactive.LdbcQuery1_enable, ldbc.snb.interact
ive.LdbcShortQuery2PersonPosts_enable, ldbc.snb.interactive.LdbcShortQuery3PersonFriends_enable, ldbc.snb.interactive.
LdbcShortQuery1PersonProfile_enable, ldbc.snb.interactive.LdbcQuery5_enable, ldbc.snb.interactive.LdbcQuery3_enable, l
dbc.snb.interactive.LdbcQuery6_enable, ldbc.snb.interactive.LdbcQuery7_enable, ldbc.snb.interactive.LdbcQuery2_enable,
 ldbc.snb.interactive.LdbcQuery4_enable, ldbc.snb.interactive.LdbcUpdate1AddPerson_enable, ldbc.snb.interactive.LdbcUp
date3AddCommentLike_enable, ldbc.snb.interactive.LdbcUpdate4AddForum_enable]
        at com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbInteractiveWorkload.onInit(LdbcSnbInteractiveWorkload
.java:125)
        at com.ldbc.driver.Workload.init(Workload.java:55)
        at com.ldbc.driver.client.CreateValidationParamsMode.init(CreateValidationParamsMode.java:51)
        ... 1 more

I found that the missing parameters are in interactive-validate.properties and interactive-benchmark.properties, and the comments in interactive-benchmark.properties said that they are for debugging.
Should I copy these parameters to interactive-create-validation-parameters.properties? If not, what shoud I do?

Thanks.

Update Virtuoso schema

As a followup of #43 and #46 (PR: #49), Virtuoso schema and queries should also be updated.

Generate validation data sets for Cypher queries

Add configuration that allows generating validation data sets for Cypher queries.

Make QueryStore more flexible for updates

QueryStore is currently designed so that an update is implemented as a sequence of queries. For example, update 6 is implemented with a query for adding the post and another one for adding the tags.

        Update6AddPost("update6addpost"),
        Update6AddPostTags("update6addposttags"),

This is useful for some systems, e.g. SQL systems typically require insertions to multiple tables, best handled as separate queries. However, graph systems (Cypher/SPARQL implementations) do not require this and would best handle updates a single query. This should be supported by the QueryStore class.

Note that this restriction is not imposed by the driver, which use a single class for each update operation, e.g. LdbcUpdate6.

Handle birthday as a date

Birthday has no timestamp, i.e. it should be handled as a single date (not datetime-like).
https://ldbc.github.io/ldbc_snb_docs_snapshot/ldbc-snb-specification.pdf#page=66

Loading data fails

Hello,

I have a problem when i try to loaded the data in to Virtuoso:


**\* Error 42001: [OpenLink][Virtuoso ODBC Driver][Virtuoso Server]SR185: Undefined procedure DB.DBA.ft_set_file.at line 1 of Top-Level:ft_set_file ('comment_f', 'outputDir/comment_0_0.csv', delimiter => '|', skip_rows=>1)
SQL>

Am I doing something wrong?

I have tried so far:

Move CSV files to other location
Setup on an other server (First one Ubuntu next one Windows)
Only executed one line of code (in the SQL file)

PS.

I am doing a Master research project at the VU Amsterdam to compare SQL with Prolog and CQL. After (literally) days of troubles shouting I finally manage to export the correct CSV files (merges foreign keys) from the Data generator. I want setup Virtuoso to so that I can test the queries that I develop. As a basis I use the SQL queries form the paper "D2.2.4 Benchmarking Complex Queries" (http://ldbcouncil.org/publications). I need some kind of environment where I can execute these queries, if someone have some kind of a SQL dump that I can use that would be great. I am also open for any suggestions.

Thanks,

Aron

Refactor OperationHandlers

OperationHandlers should have a common supertype.

Double check Q15 results

Cypher spec was changed in 385a4e1.

Nuke validation repositories

Repositories https://github.com/ldbc/ldbc_snb_interactive_validation and https://github.com/ldbc/ldbc_snb_bi_validation are outdated, containing 3-year-old tar.gz archives.

We do not have the resources to maintain them manually - we should delete them, then proceed to properly containerize the generation of such validation sets, and place the required scripts/configuration files here (and maybe push to Docker Hub).

PostgreSQL specific files are generated under the wrong directory

In the postgresql example, the PostgreSQL specific files should be under dynamic directory but they're not.
https://github.com/ldbc/ldbc_snb_implementations/blob/master/postgres/load-scripts/load.sh#L17-L26

$PG_DATA_DIR/post_0_0-postgres.csv

Where PG_DATA_DIR is pointing to the social_network directory

social_network chaker$ ls dynamic/post_0_0.csv
dynamic/post_0_0.csv

Should I open a PR to fix both files?

Environment variables

The naming conventions, initialization approaches, etc. for environment variables are currently a mix.

The list of variables can be collected with ag:

$ ag '\$[A-Z_{][A-Z0-9_\}]+' -G '(cypher|postgres|sparql).*sh' -o --no-filename | sort -u
# edited for the sake of clarity
$FEATURE
${NEO4J_DATA_DIR}
$NEO4J_DB_DIR
$NEO4J_HOME
$NEO4J_VERSION
$PG_CSV_DIR
$PG_DB_NAME
$PG_USER
${POSTFIX}
$RDF_DATA_DIR
$RDF_DB
$STARDOG_INSTALL_DIR

Some examples:

POSTFIX: this is useless 99% of the time (it is set to _0_0.csv)
STARDOG_INSTALL_DIR: maybe this can be changed to STARDOG_HOME
RDF_DB vs. PG_DB_NAME are inconsistent
PG_CSV_DIR vs. RDF_DATA_DIR vs. NEO4J_DATA_DIR are inconsistent

Virtuoso implementation doesn't load the whole dataset

Hi,

I'm checking the results for the implementation of Virtuoso for the SQL queries, as published in this repository, with the SF1 of the SNB interactive workload.

Trying to manually issue the complex query1 with the params person.id = 2199023260527 and person.firstname = Lin, the output is http://pastebin.com/raw/B7hREJdv

Given the semantics of the query, I'd have expected that the tuple related to person id 10995116285331 would appear with distance 1. In the relation, id 10995116285331 and 2199023260527 are connected:

# awk 'BEGIN { FS = "|" } $2 == 10995116285331 && $1 == 2199023260527'  person_knows_person_0_0.csv;
2199023260527|10995116285331|2011-01-24T07:20:12.577+0000

and the name of the person is Lin:

# awk 'BEGIN { FS = "|" } $1 == 10995116285331' person_0_0.csv
10995116285331|Lin|Yang|male|1988-12-17|2011-01-21T15:25:36.713+0000|1.95.163.203|Internet Explorer|310

So I'm assuming the process of loading the initially data only partially fills the database:

SQL> SELECT COUNT(*) FROM knows;
count
INTEGER
_______________________________________________________________________________

180802

1 Rows. -- 1 msec.

Interestingly enough, when running the validation set against Virtuoso SQL from the readwrite results of neo4j, all queries pass but 2 instances for Q6 and Q14.

Did anyone else experience similar issues?

Issues running script import on cypher

Hi,

I am doing my degree work in cypher and I have the problem executing the import-to-neo4j script because it was executed and does nothing. I tried manually entering the commands found in the script and it does not do anything, it takes a long time while it is running and I am working with a workload of 0.1, all of the above about Ubuntu 18.04

Stay tuned to your comments.

Greetings.

Postgres interactive validation crashes

The - somewhat garbled - error message is the following:

com.ldbc.driver.DbException: WITH RECURSIVE search_graph(link, depth) AS (dbcQuery13...
                SELECT 768 , 0
      UNION ALL
        (WITH sg(link,depth) as (select * from search_graph)
        SELECT distinct k_person2id, x.depth+1
        FROM knows, sg x
        WHERE x.link = k_person1id and not exists(select * from sg y where y.link = 19791209308155) and not exists( select * from sg y where y.link=k_person2id))
)
select max(depth) from (
select depth from search_graph where link = 19791209308155
union select -1) tmp;
org.postgresql.util.PSQLException: ERROR: recursive query "search_graph" column 1 has type integer in non-recursive term but type bigint overall
  Hint: Cast the output of the non-recursive term to the correct type.
  Position: 56
        at com.ldbc.impls.workloads.ldbc.snb.postgres.PostgresSingletonOperationHandler.executeOperation(PostgresSingletonOperationHandler.java:40)
        at com.ldbc.impls.workloads.ldbc.snb.postgres.PostgresSingletonOperationHandler.executeOperation(PostgresSingletonOperationHandler.java:14)
        at com.ldbc.driver.validation.DbValidator.validate(DbValidator.java:66)
        at com.ldbc.driver.client.ValidateDatabaseMode.startExecutionAndAwaitCompletion(ValidateDatabaseMode.java:111)
        at com.ldbc.driver.client.ValidateDatabaseMode.startExecutionAndAwaitCompletion(ValidateDatabaseMode.java:29)
        at com.ldbc.driver.Client.main(Client.java:53)
Processed 503 / 727 -- Crashed 1 -- Incorrect 16 -- Currently processing LdbcQuery12...

The corresponding line in the validation file is:

["com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcQuery13",768,19791209308155]|[3]

Make PostgreSQL load script fail early

Suggested by @marci543

Issue running CSV files preprocessing script for Cypher implementation

The sed command distributed with OSX differs from that distributed with linux.

Running the ./convert-csvs.sh script on my Mac results in a "invalid command code j" error.

SPARQL Interactive Query 13 crashes

https://github.com/ldbc/ldbc_snb_implementations/blob/master/sparql/queries/interactive-complex-13.sparql

#LdbcQuery53 / 945 -- Crashed 0 -- Incorrect 0 -- Currently processing LdbcQuery13...
com.ldbc.driver.DbException: org.openrdf.query.QueryEvaluationException: com.complexible.stardog.StardogException: Index: 1, Size: 1
        at com.ldbc.impls.workloads.ldbc.snb.sparql.operationhandlers.SparqlListOperationHandler.executeOperation(SparqlListOperationHandler.java:49)
        at com.ldbc.impls.workloads.ldbc.snb.sparql.operationhandlers.SparqlListOperationHandler.executeOperation(SparqlListOperationHandler.java:18)
        at com.ldbc.driver.validation.DbValidator.validate(DbValidator.java:75)
        at com.ldbc.driver.client.ValidateDatabaseMode.startExecutionAndAwaitCompletion(ValidateDatabaseMode.java:111)
        at com.ldbc.driver.client.ValidateDatabaseMode.startExecutionAndAwaitCompletion(ValidateDatabaseMode.java:29)
        at com.ldbc.driver.Client.main(Client.java:53)
Caused by: org.openrdf.query.QueryEvaluationException: com.complexible.stardog.StardogException: Index: 1, Size: 1
        at com.complexible.stardog.sesame.AbstractQuery.executeQuery(AbstractQuery.java:54)
        at com.complexible.stardog.sesame.StardogTupleQuery.evaluate(StardogTupleQuery.java:39)
        at com.ldbc.impls.workloads.ldbc.snb.sparql.operationhandlers.SparqlListOperationHandler.executeOperation(SparqlListOperationHandler.java:33)
        ... 5 more
Caused by: com.complexible.stardog.StardogException: Index: 1, Size: 1
        at com.complexible.stardog.protocols.http.client.BaseHttpClient.checkResponseCode(BaseHttpClient.java:503)
        at com.complexible.stardog.protocols.http.client.BaseHttpClient.execute(BaseHttpClient.java:375)
        at com.complexible.stardog.protocols.http.client.HttpClientImpl.select(HttpClientImpl.java:202)
        at com.complexible.stardog.protocols.http.client.HttpConnection._select(HttpConnection.java:209)
        at com.complexible.stardog.api.impl.AbstractConnection.executeSelect(AbstractConnection.java:456)
        at com.complexible.stardog.api.impl.SelectQueryImpl.execute(SelectQueryImpl.java:38)
        at com.complexible.stardog.api.impl.SelectQueryImpl.execute(SelectQueryImpl.java:25)
        at com.complexible.stardog.sesame.AbstractQuery.executeQuery(AbstractQuery.java:47)
        ... 7 more
Caused by: com.stardog.stark.query.QueryExecutionFailure: Index: 1, Size: 1
        ... 15 more

No result for SQL implementation of Interactive short query 6

Based on the spec, it seems that the Interactive short query 6 should always return a single row. However, it returns zero rows in some (all?) cases, which causes the driver to fail with an NPE.

Spec: https://ldbc.github.io/ldbc_snb_docs_snapshot/interactive-short-read-06.pdf
Query: https://github.com/ldbc/ldbc_snb_implementations/blob/master/postgres/queries/interactive-short-6.sql
Test messageId value: 2199025368405

Conversion issues for Interactive

This line should be convertLong: https://github.com/ldbc/ldbc_snb_implementations/blob/d265079afa705dcd1c094ed911dce9e4e577ee0a/sparql/src/main/java/com/ldbc/impls/workloads/ldbc/snb/sparql/interactive/SparqlInteractiveDb.java#L250

implementation issues

Dear team,

I find this an extremely exciting project and look forward to contribute to both snb and graphanalytics.

Being a novice I am facing issues at very early part of implementation. I have cloned the graphanalytics and now I m stuck which direction to take. I tried to clone graphanalytics_refrence its throwing an error mentioning it needs username and password. I m also interested to look how can granula and datagen be installed.

Can you please redirect me to some reference manual that could be followed to take the implementation further.

BI Q10 has unnecessary casts for datetime

https://github.com/ldbc/ldbc_snb_implementations/blob/160dd7584096391b27552afb5b2a82f446c8507b/sparql/queries/bi-10.sparql#L26
https://github.com/ldbc/ldbc_snb_implementations/blob/160dd7584096391b27552afb5b2a82f446c8507b/sparql/queries/bi-10.sparql#L49

Do not return null in Postgres BI Q13 results

The Postgres implementation for BI Q13 returns null in some cases. In the SF1 validation data set:

["com.ldbc.driver.workloads.ldbc.snb.bi.LdbcSnbBiQuery13PopularMonthlyTags","Iran",100]|[[2012,1,[["Genghis_Khan",16],["Sanath_Jayasuriya",14],["Andrey_Kolmogorov",9],["Alexander_Downer",8],["John_Howard",8]]],[2012,2,[["Paradorn_Srichaphan",9],["Adolf_Hitler",8],["John_Howard",7],["Karol_Kučera",5],["Muammar_Gaddafi",5]]],[2012,3,[["Ivan_Ljubičić",8],["John_Howard",8],["Arnold_Schwarzenegger",6],["Gloria_Estefan",6],["Adolf_Hitler",4]]],[2012,4,[["Enrique_Iglesias",66],["John_Rhys-Davies",33],["Tunku_Abdul_Rahman",9],["Adolf_Hitler",7],["Richard_Harris",6]]],[2012,5,[["Imelda_Marcos",47],["Hassan_II_of_Morocco",18],["Alejandro_Falla",12],["Mack_the_Knife",6],["Chiang_Kai-shek",5]]],[2012,6,[["Manuel_Noriega",36],["Bobby_Hull",16],["Slavoj_Žižek",7],["Sammy_Sosa",5],["Adolf_Hitler",4]]],[2012,7,[["John_Howard",6],["Adolf_Hitler",5],["Arnold_Schwarzenegger",4],["Augustine_of_Hippo",4],["Marc_Gicquel",4]]],[2012,8,[["Ehud_Olmert",13],["Augustine_of_Hippo",9],["Adolf_Hitler",8],["Terry_Wogan",5],["Arnold_Schwarzenegger",4]]],[2012,9,[["Oscar_Wilde",4],["Robert_Redford",3],["Anthony_Hopkins",2],["Black_Hole_Sun",2],["Charlemagne",2]]],[2011,1,[["Freddie_Mercury",24],["Mobutu_Sese_Seko",18],["Slavoj_Žižek",14],["Philip_K._Dick",4],["Botswana",3]]],[2011,2,[["Luis_Horna",25],["Tom_Gehrels",14],["Arnold_Schwarzenegger",6],["Robert_Fripp",5],["France",4]]],[2011,3,[["Sanath_Jayasuriya",14],["Haile_Gebrselassie",7],["Genghis_Khan",6],["John_Howard",5],["Kurt_Gödel",5]]],[2011,4,[["Gil_Kane",15],["Bette_Davis",4],["Jawaharlal_Nehru",4],["Martin_Scorsese",4],["Adolf_Hitler",3]]],[2011,5,[["Arnold_Schwarzenegger",4],["John_Howard",3],["Rembrandt",3],["Abkhazia",2],["Baden",2]]],[2011,6,[["Ghostface_Killah",34],["David_Hockney",23],["Jan_Hus",10],["Arnold_Schwarzenegger",7],["China",5]]],[2011,7,[["Ban_Ki-moon",7],["Joseph_Haydn",4],["Adolf_Hitler",3],["David_Foster",3],["John_Howard",3]]],[2011,8,[["Richard_Harris",5],["Adolf_Hitler",4],["George_Orwell",4],["John_Howard",4],["Arnold_Schwarzenegger",3]]],[2011,9,[["Kateryna_Bondarenko",15],["Fernando_González",10],["Arnold_Schwarzenegger",7],["John_Howard",6],["Azerbaijan",3]]],[2011,10,[["Muammar_Gaddafi",63],["Joe_Strummer",45],["Mariano_Rivera",42],["Dimitri_Tiomkin",25],["Sammy_Sosa",22]]],[2011,11,[["John_Howard",7],["Adolf_Hitler",4],["Richard_Harris",4],["Arnold_Schwarzenegger",3],["Bertolt_Brecht",3]]],[2011,12,[["Carl_Gustaf_Emil_Mannerheim",12],["Adolf_Hitler",11],["Edgar_Prado",8],["Sammy_Sosa",7],["Francis_of_Assisi",6]]],[2010,1,[["Charles,_Prince_of_Wales",1],["Herman_Melville",1],["Jorge_Luis_Borges",1],["Ruhollah_Khomeini",1]]],[2010,2,[["Aung_San_Suu_Kyi",1],["Balhae",1],["Belgian_Congo",1],["Derg",1],["German_Empire",1]]],[2010,3,[[null,57],["Boris_Yeltsin",1],["Jean-Jacques_Rousseau",1],["The_Wiz:_Original_Motion_Picture_Soundtrack",1],["Wolfgang_Amadeus_Mozart",1]]],[2010,4,[["Oscar_Wilde",3],["Arthur_C._Clarke",2],["Boys_for_Pele",2],["Dean_Martin",2],["Edvard_Munch",2]]],[2010,5,[["Paula_Abdul",11],["Jorge_Luis_Borges",3],["Leon_Trotsky",3],["Outside_the_Wall",3],["Elton_John",2]]],[2010,6,[["Rubén_Blades",11],["Building_a_Mystery",4],["Sergei_Eisenstein",4],["Siad_Barre",4],["14th_Dalai_Lama",2]]],[2010,7,[["Sammy_Sosa",8],["Joni_Mitchell",3],["Paul_Martin",3],["Saxe-Weimar-Eisenach",3],["William_Blake",3]]],[2010,8,[["Haile_Selassie_I",16],["Ivan_Ljubičić",10],["Donald_Trump",5],["Heaven_or_Las_Vegas",5],["Bruce_Springsteen",4]]],[2010,9,[["Sonia_Gandhi",27],["Honduras",4],["John_Howard",4],["Luís_Figo",3],["Adolf_Hitler",2]]],[2010,10,[["Shapur_II",71],["Léopold_Sédar_Senghor",11],["Ho_Chi_Minh",9],["Iceland",6],["John_Howard",5]]],[2010,11,[["Yuvan_Shankar_Raja",6],["Paul_Capdeville",5],["John_Howard",4],["Dolores_del_Río",3],["Saudi_Arabia",3]]],[2010,12,[["Marc_Gicquel",11],["Chulalongkorn",4],["Jimi_Hendrix",3],["John_McEnroe",3],["Loyal_to_the_Game",3]]]]

We discovered this with @antaljanosbenjamin.

How to contribute TinkerPop implementation?

Hi.
Our group is working on a graph system that uses TinkerPop Gremlin as the query language, and we plan to use LDBC snb driver as our benchmark.
I have noticed that you have implemented part of JanusGraph. Since JanusGraph also uses Gremlin as the query language, should I continue to improve it directly on the basis of your JanusGraph folder? Or create a new TinkerPop folder under the interactive folder?
By the way, the current implementation of JanusGraph is only able to import data but not query, right?

Allow to generate only data files for PostgreSQL

In order to allow for easier easy experimenting, add an option to the PostgreSQL converter (load.sh):

PG_LOAD_TO_DB: controls whether we want to do or skip database loading phase. Possible values are:
- load: loads to the database, as one might expect. This is the default.
- skip: skip loading to the database.