Git Product home page Git Product logo

aksw / lsq Goto Github PK

View Code? Open in Web Editor NEW
24.0 5.0 11.0 23.71 MB

Linked SPARQL Queries (LSQ): Framework for RDFizing triple store (web) logs and performing SPARQL query extraction, analysis and benchmarking in order to produce datasets of Linked SPARQL Queries

Home Page: http://lsq.aksw.org

License: Apache License 2.0

Java 98.46% Shell 0.85% Dockerfile 0.32% Makefile 0.37%
semantic-web sparql rdf analytics benchmarking queries

lsq's Introduction

Linked SPARQL Queries (LSQ) Framework

A framework for RDFizing query logs and benchmarking queries and graph patterns.

What's New in LSQ V2

2020-08-06 LSQ2 Pre-release

LSQ2 introduces significant improvements over the prior version in every aspect: Ease-of-use, flexibility, modularity, consintency in the data model and generated IDs.

  • Pretty CLI (thanks to on picocli)
  • Easier yet more flexible to use: RDFization, static analysis and benchmarking now decoupled
  • Named graph stream approach: Information for each query is grouped in its own named graph which allows easily selecting subsets with complete information for detailed analysis.

Documentation

Detailed Documentation

For detailed documentation about setup, use and concepts of the LSQ command line tool please refer to our LSQ Website.

Quick Reference

Setup

This is a typical maven project and can is thus built with mvn clean install.

For Ubuntu/Debian users: The build process creates a .deb package that can be conviently installed after build with

./reinstall-deb.sh (requires root access).

Quick Usage

A quick reference for the typical process is as follows:

lsq rx probe file.log
lsq rx rdfize -e http://server.from/which/the/log/is/from file.log > file.log.trig
lsq rx benchmark create -d myDatasetLabel -e http://localhost:8890/sparql -o > benchmark.conf.ttl
lsq rx benchmark prepare -c benchmark.conf.ttl -o > benchmark.run.ttl
lsq rx benchmark run -c benchmark.run.ttl *.log.trig

The -o option causes the settings to be written to the console. Omit -o to have LSQ auto-generate files.

Run with Docker

Run example running LSQ to RDFize SPARQL logs, input and output files in the current working directory (replace $(pwd) by ${PWD} for Windows PowerShell):

docker run -it -v $(pwd):/data ghcr.io/aksw/lsq rx rdfize --endpoint=http://dbpedia.org/sparql virtuoso.dbpedia.log 

Build the Docker image from the source code:

docker build -t ghcr.io/aksw/lsq .

License

The source code of this repo is published under the Apache License Version 2.0.

lsq's People

Contributors

aidhog avatar aklakan avatar davidhaller avatar kasei avatar miguel76 avatar saleem-muhammad avatar simonbin avatar vemonet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lsq's Issues

mvn install fails

I get an error when running maven on the development branch.
System is Ubuntu 18.04 with:

~$ mvn -version
Apache Maven 3.5.2
Maven home: /usr/share/maven
Java version: 10.0.1, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-11-openjdk-amd64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-23-generic", arch: "amd64", family: "unix"

Here is the log:

~/LSQ$ mvn install
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/maven/lib/guice.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] lsq-parent
[INFO] lsq-vocab-jena
[INFO] lsq-parser
[INFO] lsq-core
[INFO] lsq-cli
[INFO] lsq-debian-cli
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Building lsq-parent 1.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-javadoc-plugin:2.9.1:jar (attach-javadocs) @ lsq-parent ---
[INFO] Not executing Javadoc as the project is not a Java classpath-capable package
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar-no-fork (attach-sources) @ lsq-parent ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ lsq-parent ---
[INFO] Installing /home/usr/Code/ExplainQueries/LSQ/pom.xml to /home/usr/.m2/repository/org/aksw/simba/lsq/lsq-parent/1.0.1-SNAPSHOT/lsq-parent-1.0.1-SNAPSHOT.pom
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Building lsq-vocab-jena 1.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ lsq-vocab-jena ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/usr/Code/ExplainQueries/LSQ/lsq-vocab-jena/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.2:compile (default-compile) @ lsq-vocab-jena ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 2 source files to /home/usr/Code/ExplainQueries/LSQ/lsq-vocab-jena/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ lsq-vocab-jena ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/usr/Code/ExplainQueries/LSQ/lsq-vocab-jena/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.2:testCompile (default-testCompile) @ lsq-vocab-jena ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.20:test (default-test) @ lsq-vocab-jena ---
[INFO] No tests to run.
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ lsq-vocab-jena ---
[INFO] Building jar: /home/usr/Code/ExplainQueries/LSQ/lsq-vocab-jena/target/lsq-vocab-jena-1.0.1-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-javadoc-plugin:2.9.1:jar (attach-javadocs) @ lsq-vocab-jena ---
[ERROR] MavenReportException: Error while creating archive: Unable to find javadoc command: The environment variable JAVA_HOME is not correctly set.
org.apache.maven.reporting.MavenReportException: Unable to find javadoc command: The environment variable JAVA_HOME is not correctly set.
    at org.apache.maven.plugin.javadoc.AbstractJavadocMojo.executeReport (AbstractJavadocMojo.java:1885)
    at org.apache.maven.plugin.javadoc.JavadocJar.execute (JavadocJar.java:181)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:208)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:154)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:146)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:309)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:194)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:107)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:955)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:290)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:194)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:564)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356)
Caused by: java.io.IOException: The environment variable JAVA_HOME is not correctly set.
    at org.apache.maven.plugin.javadoc.AbstractJavadocMojo.getJavadocExecutable (AbstractJavadocMojo.java:3553)
    at org.apache.maven.plugin.javadoc.AbstractJavadocMojo.executeReport (AbstractJavadocMojo.java:1881)
    at org.apache.maven.plugin.javadoc.JavadocJar.execute (JavadocJar.java:181)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:208)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:154)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:146)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:309)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:194)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:107)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:955)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:290)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:194)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:564)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356)
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar-no-fork (attach-sources) @ lsq-vocab-jena ---
[INFO] Building jar: /home/usr/Code/ExplainQueries/LSQ/lsq-vocab-jena/target/lsq-vocab-jena-1.0.1-SNAPSHOT-sources.jar
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ lsq-vocab-jena ---
[INFO] Installing /home/usr/Code/ExplainQueries/LSQ/lsq-vocab-jena/target/lsq-vocab-jena-1.0.1-SNAPSHOT.jar to /home/usr/.m2/repository/org/aksw/simba/lsq/lsq-vocab-jena/1.0.1-SNAPSHOT/lsq-vocab-jena-1.0.1-SNAPSHOT.jar
[INFO] Installing /home/usr/Code/ExplainQueries/LSQ/lsq-vocab-jena/pom.xml to /home/usr/.m2/repository/org/aksw/simba/lsq/lsq-vocab-jena/1.0.1-SNAPSHOT/lsq-vocab-jena-1.0.1-SNAPSHOT.pom
[INFO] Installing /home/usr/Code/ExplainQueries/LSQ/lsq-vocab-jena/target/lsq-vocab-jena-1.0.1-SNAPSHOT-sources.jar to /home/usr/.m2/repository/org/aksw/simba/lsq/lsq-vocab-jena/1.0.1-SNAPSHOT/lsq-vocab-jena-1.0.1-SNAPSHOT-sources.jar
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Building lsq-parser 1.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ lsq-parser ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.2:compile (default-compile) @ lsq-parser ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 14 source files to /home/usr/Code/ExplainQueries/LSQ/lsq-parser/target/classes
[INFO] /home/usr/Code/ExplainQueries/LSQ/lsq-parser/src/main/java/org/aksw/simba/lsq/parser/WebLogParser.java: /home/usr/Code/ExplainQueries/LSQ/lsq-parser/src/main/java/org/aksw/simba/lsq/parser/WebLogParser.java uses or overrides a deprecated API.
[INFO] /home/usr/Code/ExplainQueries/LSQ/lsq-parser/src/main/java/org/aksw/simba/lsq/parser/WebLogParser.java: Recompile with -Xlint:deprecation for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ lsq-parser ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/usr/Code/ExplainQueries/LSQ/lsq-parser/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.2:testCompile (default-testCompile) @ lsq-parser ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 1 source file to /home/usr/Code/ExplainQueries/LSQ/lsq-parser/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.20:test (default-test) @ lsq-parser ---
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-junit4/2.20/surefire-junit4-2.20.pom
[INFO] Failure detected.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] lsq-parent ......................................... SUCCESS [  1.034 s]
[INFO] lsq-vocab-jena ..................................... SUCCESS [  1.221 s]
[INFO] lsq-parser ......................................... FAILURE [  1.508 s]
[INFO] lsq-core ........................................... SKIPPED
[INFO] lsq-cli ............................................ SKIPPED
[INFO] lsq-debian-cli ..................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.957 s
[INFO] Finished at: 2018-06-12T13:52:04+02:00
[INFO] Final Memory: 34M/117M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.20:test (default-test) on project lsq-parser: Unable to generate classpath: org.apache.maven.artifact.resolver.ArtifactResolutionException: Unable to get dependency information for org.apache.maven.surefire:surefire-junit4:jar:2.20: Failed to retrieve POM for org.apache.maven.surefire:surefire-junit4:jar:2.20: Could not transfer artifact org.apache.maven.surefire:surefire-junit4:pom:2.20 from/to central (https://repo.maven.apache.org/maven2): java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
[ERROR]   org.apache.maven.surefire:surefire-junit4:jar:2.20
[ERROR] 
[ERROR] from the specified remote repositories:
[ERROR]   central (https://repo.maven.apache.org/maven2, releases=true, snapshots=false)
[ERROR] Path to dependency: 
[ERROR]         1) dummy:dummy:jar:1.0
[ERROR] 
[ERROR] 
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :lsq-parser

Missing information in LSQ2

Move bgp to structural features: joinVertices, meanJoinVertexDegree, medianJoinVertexsDegree, etc.
Cross check with the paper

Rename selectivities to ratios because (a) that's what we are doing and (b) they are faster to compute.

  • bgpToTpRatio
  • Missing: joinNodeToTpRatio (the current implementation takes the ratio |joinVar| / |bgpSize| instead of |bgp_{joinVar}|{tp_bgp_joinVar}|

NullPointerException using rdfize

I get the NullPointerEception copied below when using the rdfize command with the --slim option on.

Turns out that SparqlMappers.fallbackToVisitor() (package org.aksw.jena_sparql_api.rx.io.resultset, artifact jenax-arq-rx, ver. 4.5.0-1-SNAPSHOT) calls SparqlStmtUtils.execAny() (package org.aksw.jenax.stmt.util, artifact jenax-arq-stmt, ver. 4.5.0-1-SNAPSHOT) with a value of null for the parameter cxtMutator, leading later to the exception.

java.lang.NullPointerException: Cannot invoke "java.util.function.Consumer.accept(Object)" because "cxtMutator" is null
    at org.aksw.jenax.stmt.util.SparqlStmtUtils.execAny(SparqlStmtUtils.java:471)
    at org.aksw.jena_sparql_api.rx.io.resultset.SparqlMappers.fallbackToVisitor(SparqlMappers.java:108)
    at org.aksw.jena_sparql_api.rx.io.resultset.SparqlMappers.lambda$createMapperQuad$9(SparqlMappers.java:198)
    at io.reactivex.rxjava3.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:130)
    at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:243)
    at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:131)
    at io.reactivex.rxjava3.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onSubscribe(FlowableFlatMap.java:115)
    at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable.subscribe(FlowableFromIterable.java:69)
    at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable.subscribeActual(FlowableFromIterable.java:47)
    at io.reactivex.rxjava3.core.Flowable.subscribe(Flowable.java:15917)
    at io.reactivex.rxjava3.internal.operators.flowable.FlowableFlatMap.subscribeActual(FlowableFlatMap.java:51)
    at io.reactivex.rxjava3.core.Flowable.subscribe(Flowable.java:15917)
    at io.reactivex.rxjava3.core.Flowable.subscribe(Flowable.java:15863)
    at io.reactivex.rxjava3.internal.operators.flowable.FlowableReduceWithSingle.subscribeActual(FlowableReduceWithSingle.java:58)
    at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
    at io.reactivex.rxjava3.internal.operators.single.SingleMap.subscribeActual(SingleMap.java:35)
    at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
    at io.reactivex.rxjava3.core.Single.blockingGet(Single.java:3644)
    at org.aksw.jena_sparql_api.rx.io.resultset.SparqlMappers.lambda$createMapperDataset$13(SparqlMappers.java:230)
    at java.base/java.util.function.Function.lambda$andThen$1(Function.java:88)
    at org.aksw.jena_sparql_api.rx.io.resultset.SparqlMappers.lambda$mapDatasetToConnection$2(SparqlMappers.java:81)
    at org.aksw.jena_sparql_api.rx.dataset.DatasetFlowOps.lambda$createItemMultiMapper$10(DatasetFlowOps.java:236)
    at org.aksw.commons.rx.op.RxOps.lambda$createParallelMapperOrdered$5(RxOps.java:60)
    at io.reactivex.rxjava3.internal.operators.parallel.ParallelMap$ParallelMapSubscriber.onNext(ParallelMap.java:116)
    at io.reactivex.rxjava3.internal.operators.parallel.ParallelRunOn$RunOnSubscriber.run(ParallelRunOn.java:275)
    at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65)
    at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

spring-core dependency

I got an error while building several lsq-sub-projects.
Maven was not able to find spring-core.

After changing

            <dependency>
                <groupId>org.springframework</groupId>
                <artifactId>spring-core</artifactId>
                <version>4.3.18</version>
            </dependency>

back to

            <dependency>
                <groupId>org.springframework</groupId>
                <artifactId>spring-core</artifactId>
                <version>4.3.5.RELEASE</version>
            </dependency>

the build process finishes.

This change is based on the commit 69a0d10

Integrate FedX

In the FedX source src/org/aksw/simba/start/QueryEvaluation.java make loadEndpoints fetch data from a file passed as a command line argument to LSQ.
Then use this endpoint for query execution.

As the FedX code is based on Sesame, whereas LSQ is Jena based, possibly this wrapper could be used: https://github.com/afs/JenaSesame

Add option to disable tracking "remote executions".

LSQ can actually be used for benchmarking any set of sparql queries - such as ones produced by a query generator.
Such queries do not correspond to a log file and thus do not have remote executions.

Right now one can RDFize a file containing multiline sparql queries with:

lsq rx rdfize -m sparql multi-line-sparql-queries.rq

As a secondary issue, for some reason explicit -m sparql is needed for multiline sparql queries because query format auto-detection (aka probing) yields the format sparql2 which is the 'sparql query per line' (with encoded newlines) format.

However, in such cases, the output contains needless remoteExec resources:

<http://lsq.aksw.org/lsqQuery-0JZqkFLBIZX934e78cAKZTqP9UoOoOVGE6T0wh1fC48.AA.AA.s.GdjNBA.AA.AA.AA.AA.AA.qeAbhAAA.AA.AA.AA.HM5Tgg> {
    <http://lsq.aksw.org/remoteExec-_243>
            <http://lsq.aksw.org/vocab#endpoint>  <urn:grid-bench:512x512x16r> ;
            <http://lsq.aksw.org/vocab#query>  "SELECT ..." .
    <http://lsq.aksw.org/lsqQuery-0JZqkFLBIZX934e78cAKZTqP9UoOoOVGE6T0wh1fC48.AA.AA.s.GdjNBA.AA.AA.AA.AA.AA.qeAbhAAA.AA.AA.AA.HM5Tgg>
            <http://lsq.aksw.org/vocab#text>  "SELECT ..." ;
            <http://lsq.aksw.org/vocab#hasRemoteExec>  <http://lsq.aksw.org/remoteExec-_243> ;
}

In the benchmark setting, rather than expressing that a query was executed on a remote endpoint, the semantics is that a query was part of a query mix.

The triple

<http://lsq.aksw.org/remoteExec-_243> <http://lsq.aksw.org/vocab#endpoint>  <urn:grid-bench:512x512x16r> .

is not too bad but it should be something like:

<http://lsq.aksw.org/membership-243> :in <urn:grid-bench:512x512x16r> .

So LSQ needs a flag to switch modes.

Instructions not clear

Hi, we are trying to reuse LSQ to generate logs for SPARQL queries

I noticed that most of your commits (https://github.com/AKSW/LSQ/commits/develop) are just about improving the readme/docs but the documentation provided is surprisingly highly unclear, and a bit out of date

I needed to rewrite straightforward up-to-date instructions to help our collaborators to use it: https://github.com/vemonet/lsq-anal-sparql

You might want to reuse it to update your instructions and document how to run the LSQ process for people who are not developing LSQ (people who usually just want to provide 1 log file and 1 SPARQL endpoint URL to an executable, and get results)

An overall QueryHash needs to have the hash for ORDER BY closer to the front (before group by, having, etc)

Permutations of group by and projections (typically) have no impact on a result set (a binding does not have an order of variables), but a permutation of the order by clause changes the order of bindings.

Currently this query produces the following hash:

SELECT ?a COUNT(?b) FROM <http://dbpedia.org/sparql> FROM NAMED <urn:foo> { ?a ?b ?c } GROUP BY STR(?c) ?a ORDER BY DESC(?a) DESC(STR(?c)) LIMIT 10 OFFSET 2

Tjny8TXJKFa7hYkh6VBRiv_6S_tZn3Fm_vyP-JtwPFM/cm60CQ/AA/s/ReTPZw/MusTnQ/AA/AA/AQ/AA/ZjanBg3tcVCB7ffhA/AA/GrxMCookg6M/AA/AA/2+10

One of the members of AA/AA/AQ/AA is the lehmer code for permutation of the order by clause - it needs to be moved closer to the front.
(Another TODO is to write a documentation of the hashing itself)

How to generate sparql log from using LSQ?

Hi, sorry for bothering you. I am working on a sparql system and I need some query log to test it. But I cannot find the command on the website about how to generate sparql log. Could you help me with it? Thank you so much!

Multiple remote query executions merged together due to timestamp clash

I noticed that in some of the published datasets there are issues with single instances of lsqv:RemoteExec that have multiple values for properties like lsqv:hostHash and lsqv:uri, which (conceptually) should be functional.
Further analysing the data and later the source code, I discovered that the problem is that if the timestamp is available (which I guess is most of the times) it is used (alongside the service id) to build the IRI for the remote execution.
The problem is exacerbated in the case of the dbpedia.3.5.1 log, because for some reason the timestamps are truncated at the hour and hence blocks of several executions are merged together.
But it easily happens also in other cases (for sure in the case of the bioportal log) cause multiple query executions may be logged in the same second.

My suggestion is to either use always the sequential id (easiest hack, I guess) or add a mechanism to differentiate the IRIs when the timestamp is the same.

Errors in some subjects and objects

Getting errors when loading linkedgeodata with fuseki,

error example: Caused by: org.apache.jena.riot.RiotException: [line: 1402014, col: 35] Triples not terminated by DOT
on this line, the object islsqr:-anon-1

Queries without remote execution information

Note: this is a question, not an actual issue
I would like to know why in the generated data there are queries that have no remote execution information, i.e. there are no outgoing triples with predicate lsqv:hasRemoteExec.
In the data from bioportal, while all 158,553 queries have lsqv:hasLocalExec , only 89,664 have lsqv:hasRemoteExec.
If I understand correctly the model, the remote execution is when the query was originally executed.
So, given that the queries are extracted from existing logs, I would expect that there is always a resource representing the remote execution.
Am I wrong?

build unsuccessful

[ERROR] Failed to execute goal on project lsq-vocab-jena: Could not resolve dependencies for project org.aksw.simba.lsq:lsq-vocab-jena:jar:2.0.0-SNAPSHOT: Could not find artifact org.apache.jena:jena-core:jar:4.5.0-SNAPSHOT in maven.aksw.snapshots (https://maven.aksw.org/archiva/repository/snapshots) During build process this error is generated. Any hints to resolve the issue please ...

lsq rdfize and lsq benchmark use different ID schema

The initial idea was to reuse information contained in the static RDFization in the benchmark process.
I.e. instead of trying to parse an ID out from an IRI it should be assembled from attributes.

However, for some reason rdfization and running benchmarking does not yield matching graph IRIs

lsq rdfize:

<http://lsq.aksw.org/q-0ez272gh_Z4IFrwv_dbCEpkmz5ZnL5VaDwmja75SFPE>

lsq benchmark run:

lsqr:lsqQuery-0DgPrLmzT2lANsdflr3XIUpdNZgT09YRsOnbZ9gA7CE

Obviously, also the prefix declarations are not yet consistently used.

It may well be that the only differenec is in q- vs lsqQuery.
In other query logs there is the issue of hex-encoding of checksums vs base64 encoding but this might have been fixed already. Because the latter yields shorter IRIs this one should be used.

Lsq Enrichment gets stuck when adding many items to an RDF-backed List

List<LsqTriplePattern> bgpTps = bgpCtxRes.getTriplePatterns();
// This variant has to traverse the RDF list on every add
for(org.spinrdf.model.Triple tp : e.getValue()) {
// System.err.println("TP:" + tp);
    bgpTps.add(tp.as(LsqTriplePattern.class)); // ISSUE: bgpTps is a view over an RDF List; each add has to iterate all items
}

The solution is to use bgpTps.addAll() - FilteredList in aksw-commons 0.9.6 has been optimized for this case.

Skolemization of RDF models with blank nodes as leaf nodes fails

2021-10-11 10:54:50,001 [Thread-540] WARN  o.a.s.l.c.m.MainCliLsq: Enrichment of PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
PREFIX  adhoc: <http://linked.opendata.cz/ontology/adhoc/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

CONSTRUCT 
  { 
    _:c0 adhoc:class ?Class .
    _:c0 adhoc:endpointUri "http://pharmgkb.bio2rdf.org/sparql" .
    _:c0 adhoc:numberOfInstances ?numberOfInstances .
  }
WHERE
  { { SELECT  ?Class (COUNT(?resource) AS ?numberOfInstances)
      WHERE
        { ?resource  rdf:type  ?Class }
      GROUP BY ?Class
    }
  }
 failed
java.lang.RuntimeException: Leaf nodes must not be blank nodes: _:Ba09a0b35X2D2286X2D4138X2D9101X2D961dbe84fe69
	at org.aksw.jena_sparql_api.conjure.algebra.common.ResourceTreeUtils.createGenericHash(ResourceTreeUtils.java:238)
	at org.aksw.jena_sparql_api.conjure.algebra.common.ResourceTreeUtils.createGenericHash(ResourceTreeUtils.java:225)

SPIN API not available during build

During build I get the following error:

[ERROR] Failed to execute goal on project lsq-model: Could not resolve dependencies for project org.aksw.simba.lsq:lsq-model:jar:2.0.0-SNAPSHOT: Could not find artifact org.topbraid:spin:jar:2.0.0 in com.topquadrant.internal (https://www.topquadrant.com/repository/spin)

Turns out SPIN API is not available anymore at https://www.topquadrant.com/repository/spin, so the pointers from the register in Maven central repository are broken.

How to run the jar bundle

Hi, we are trying to run LSQ with the java build

Unfortunately, the documentation does not reflect the actual code, it says:

mvn -P bundle clean install

java -jar lsq-bundle/target/

But there is no lsq-bundle folder in the repository, and unfortunately, there is no further documentation on how to run and use the .jar file (all documentation is for the debian installed package)

The debian package documentation mention using lsq-cli so we assume there is confusion in the documentation and we should use the lsq-cli subfolder to run LSQ

So we tried running the lsq-cli jar file:

java -jar .\lsq-cli\target\lsq-cli-2.0.0-SNAPSHOT-jar-with-dependencies.jar

But it does not seem to have been packaged properly:

no main manifest attribute, in .\lsq-cli\target\lsq-cli-2.0.0-SNAPSHOT-jar-with-dependencies.jar

It is quite a common error in Java packages building, you need to define the main function used when the package will be run

For example here: https://github.com/AKSW/LSQ/blob/develop/lsq-cli/pom.xml#L98

In the build configuration we might need to add something like:

				<configuration>
					<archive>
						<manifest>
							<mainClass>org.aksw.simba.lsq.cli.main.MainCliLsq</mainClass>
						</manifest>
					</archive>
				</configuration>

Any idea how we can fix the Jar build and run LSQ with Java?

Note: running on Windows 10 with Java 15

Incompatible versions of log4j and slf4j

Launching lsq as standalone from the jar (lsq-cli-2.0.0-SNAPSHOT-jar-with-dependencies.jar) I get the following exception.

Exception in thread "main" java.lang.NoSuchMethodError: 'void org.apache.logging.slf4j.Log4jLoggerFactory.<init>(org.apache.logging.slf4j.Log4jMarkerFactory)'
	at org.apache.logging.slf4j.SLF4JServiceProvider.initialize(SLF4JServiceProvider.java:54)
	at org.slf4j.LoggerFactory.bind(LoggerFactory.java:153)
	at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:141)
	at org.slf4j.LoggerFactory.getProvider(LoggerFactory.java:419)
	at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:405)
	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:354)
	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:380)
	at org.aksw.simba.lsq.cli.main.MainCliLsq.<clinit>(MainCliLsq.java:114)

There is an incompatibility between the used versions of slf4j (currently 1.8.0-beta4) and Log4j (currently 2.17.1)

Annotate queries that mention spatial literals.

LSQ should annotate queries that contain spatial literals with at least a bounding box of the union of those literals.
This would allow browsing queries on a map and seeing which areas they affect.

SELECT  (count(*)โ€‰ AS ?c)
WHERE
  { GRAPH <http://www.example.org/graph/1>
      { BIND("POLYGON((-78.75 -90, -78.75 -78.75, -67.5 -78.75, -67.5 -90, -78.75 -90))"^^geo:wktLiteral AS ?queryGeom)
        ?feature  spatial:intersectBoxGeom  ( ?queryGeom ) ;
                  geo:hasGeometry       ?featureGeom .
        ?featureGeom  geo:asWKT         ?featureGeomWkt
        FILTER geof:sfIntersects(?featureGeomWkt, ?queryGeom)
      }
  }

Maven install fails

When I do mvn clean install, I receive this error:

Failed to execute goal on project lsq-vocab-jena: Could not resolve dependencies for project org.aksw.simba.lsq:lsq-vocab-jena:jar:2.0.0-SNAPSHOT: org.apache.jena:jena-core:jar:4.5.0-SNAPSHOT was not found in https://maven.aksw.org/archiva/repository/snapshots during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of maven.aksw.snapshots has elapsed or updates are forced -> [Help 1]

Make it available as a Docker container

Hi, I am not sure if there are still development on this project. But if you want to make it easier for people to use your weird Jar file that does not run on windows: publish a docker image.

It is really easy to include a jar file in a docker image and publish it for free (it's 2021, not 2009 anymore)

It will solve all your issues about distributing your software: will work out of the box on macos, windows, linux, can be easily deployed on clusters, etc

And it will avoid everyone who mostly just want to execute LSQ to need to git clone + mvn build (which takes a lot of time, is error prone)

You will not need to write and maintain multiple weird documentations and process to bundle with maven, then build a deb file... http://lsq.aksw.org/v2/setup.html

So you will take maybe 1 day to produce the docker image which will save you multiple days in the future

One cross platform solution: docker (anyway LSQ is not something used everyday so you dont need to make a deb package out of it, calling it with a docker run fits well it's use-cases)

Improve documentation of some vocabulary with examples

Although there are labels and comments in the ontology file, especially the selectivity-related metrics are probably not very clear.
In addition, it also makes sense to generate a reference page from a markdown table derived from the ontology file.

  • lsqo:in
  • lsqo:tpSelJoinVarRestricted
  • hypertree model of bgps

Improve confusing / inaccurate exceptions

I started a wiki page for queries with errors. LSQ2 should catch most exceptions and continue anyway, however, in some cases this results in a generic ''internal error" message' when in fact LSQ2 is functioning fine, however an incorrect query made it into the benchmarking pipeline. There are many reasons how this can happen; an odd one was a due to a bug in jena's shallowClone (fixed by now) which raised an exception for perfectly fine queries.

In any case, exception handling should be improved such that no more generic "internal error" message is needed.

https://github.com/AKSW/LSQ/wiki/Erroneous-Queries

vocabulary file

If I am not wrong, LSQ_Vocab.ttl has FromNamed instance as a feature, which represents FROM NAMED clause, but there is no feature for FROM clause.

Optimize prefixes throws exception if there are multiple FROM (NAMED) clauses with the same IRI

SELECT DISTINCT  ?instance
FROM <http://data.semanticweb.org/conference/eswc/2012/complete>
FROM <http://data.semanticweb.org/conference/eswc/2012/complete>
WHERE
  { ?instance  rdf:type  foaf:Person }
ORDER BY ?instance

In the jena Query class there is the code:

    public void addNamedGraphURI(String uri)
        ...
        if ( namedGraphURIs.contains(uri) )
            throw new QueryException("URI already in named graph set: "+uri) ;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.