Git Product home page Git Product logo

dbpedia-live-mirror's Introduction

DBpedia Live Mirror

DBpedia-Live continuously generates zipped N-Triples files containing added/deleted triples upon its run.

This tool downloads those files and updates a local Virtuoso triple store.

Virtuoso (Enterprise or Open Source) Setup

DBpedia Live triple store update happens on different Named Graphs. We have the following enabled:

  • http://live.dbpedia.org — contains real time extracted data from Wikipedia
  • http://static.dbpedia.org — contains external datasets and data that cannot be extracted from Wikipedia but is useful to have.
  • http://dbpedia.org/resource/classes# — contains the up-to-date DBpedia ontology
  • http://dbpedia.org — virtual graph group that contains all the aforementioned graphs

To create graph groups in your local Virtuoso, you can adapt and run this script.

Execution

To execute from source:

  1. Download the code from the repo
    git clone https://github.com/dbpedia/dbpedia-live-mirror.git
  2. Set up your Virtuoso instance and mirror-live.ini file.
  3. Download and load the latest dump
  4. Copy lastDownloadDate.dat.default to lastDownloadDate.dat, and adapt the date according to the dump file
  5. Run one of the scripts in the bin/ folder
    • sh bin/liveSync.sh — applies existing triple patches and waits until new ones get published
    • sh bin/liveSyncOnce.sh Onetime — applies existing triple patches and exits.
    • sh bin/ontologySync.sh — keeps the DBpedia ontology up-to-date
    • sh bin/ontologySync.sh Onetime — updates the DBpedia ontology to the latest version and exists

Jar files are not distributed at the moment but will be made available on request.

Dependencies

  • Maven 3
  • Java 7

Contact

DBpedia Developers mailing list

dbpedia-live-mirror's People

Contributors

jimkont avatar kurzum avatar mgns avatar skovorodkin avatar tallted avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbpedia-live-mirror's Issues

Repository failing

Seems to be failing in finding module within repository

-bash-4.1$ [ERROR] Failed to execute goal on project live-mirror: Could not resolve dependencies for project org.dbpedia:live-mirror:jar:1.1-SNAPSHOT: Failed to collect dependencies at com.openlink.virtuoso:virtjdbc4:jar:7-20140918: Failed to read artifact descriptor for com.openlink.virtuoso:virtjdbc4:jar:7-20140918: Could not transfer artifact com.openlink.virtuoso:virtjdbc4:pom:7-20140918 from/to aksw (http://maven.aksw.org/repository/internal): Connect to maven.aksw.org:80 [maven.aksw.org/139.18.2.226] failed: Connection timed out -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

liveSync.sh failing

Lately, the liveSyn.sh is failing with
[INFO LiveSync] Up-to-date with last published changeset, sleeping for a while ;)
-bash-4.1$ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:java (default-cli) on project live-mirror: An exception occured while executing the Java class. null: InvocationTargetException: File ./lastPublishedFile.txt not fount! ./lastPublishedFile.txt (Too many open files) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

Any idea what might be wrong ?

Thx,
Hari

Create ontology feeder

use dbpedia live feeder to get notifications on ontology updates.
ATM we update at regular intervals

Is it normal that latest dump file is only 123K?

Hi. Trying to get a local instance of DBpedia live running here.

However, it seems the latest dump file on http://live.dbpedia.org/dumps/ (dbpedia_2017_02_21.nt.gz) is only 123K in size, whereas the others are around 6G. Any idea on whether this is normal?

i.e., if I load this dump, will the scripts just pull everything in directly? Or do I start from the next-to-last dump? (dbpedia_2016_09_26.nt.gz)

In any case, a mention of this in the README would definitely help.

Thanks in advance!

Maven virtjdbc4 dependency fails

Hello,
The below dependency fails when trying from http://maven.aksw.org/archiva/repository/internal .

com.openlink.virtuoso
virtjdbc4
7-20140918

A check at the site displays that the file is unavailable.
HTTP ERROR 404
Problem accessing /repository/internal/com/openlink/virtuoso/virtjdbc4/7-20140918/virtjdbc4-7-20140918.jar. Reason:
Resource does not exist

Can this be fixed ? or the Jar be made available so that I can run liveSync.sh .

Thanks,

  • Hari

Create dbpedia style data dumps from dbpedia live

Is it possible to create dbpedia style data dumps from dbpedia live tables.

I am using an application which uses dbpedia data dumps. The ones available on downloads page are a year old. dbpedia-live gives me access to more recent data. Is there a simple way to export this data as dumps similar to the ones available on downloads page?

Difference between REINSERTED and ADD

Hi,

What is the difference between the "reinsert" and "add" operations in the syncing process. I looked at the codebase, in particular the ChangesetExecutor.java file and the two seem to perform the same SPARQL 'INSERT DATA INTO' operation. My assumption is that 'reinserted' updates a triple statement. So if the fact <person_uri> :hasAge "12" exists, a reinserted file with the statement <person_uri> :hasAge "15" will update the literal value of property :hasAge to "15. The reason I'm asking is because I'm trying to repurpose this code to support SPARQL updates for my own personal Virtuoso server.

if (changeset.triplesReinserted() > 0) {
            boolean status_a = executeAction(changeset.getReinserted(), Action.ADD);
            logger.info("Patch " + changeset.getId() + " REINSERTED " + changeset.triplesReinserted() + " resources");
            status = status && status_a;
        }
if (changeset.triplesAdded() > 0) {
            boolean status_a = executeAction(changeset.getAdditions(), Action.ADD);
            logger.info("Patch " + changeset.getId() + " ADDED " + changeset.triplesAdded() + " triples");
            status = status && status_a;
        }

Can someone clarify this whether the two are indeed different operations?

documentation for the 4 parts of a changeset

Is the meaning of the 4 parts of a changeset (added clear reinserted removed) documented somewhere?

Ideally, maybe there's a paper where this is described in detail?

DBpedia live sync failing

2016-05-06-03-000017-bash-4.1$ sh bin/liveSync.sh
[INFO Global] Options file read successfully
[INFO ChangesetExecutor] Patch 2015-08-27-15-000201 CLEARED 160 resources
[INFO ChangesetExecutor] Patch 2015-08-27-15-000201 DELETED 977 triples
[WARN ChangesetExecutor] Error in query execution:
org.dbpedia.extraction.live.mirror.sparul.SPARULException: org.dbpedia.extraction.live.mirror.sparul.SPARULException: virtuoso.jdbc4.VirtuosoException: COL..: Insert stopped because out of seg data here or elsewhere host 0 key RDF_QUAD slice 0
at org.dbpedia.extraction.live.mirror.sparul.SPARULVosExecutor.execSQLWrapper(SPARULVosExecutor.java:74)
at org.dbpedia.extraction.live.mirror.sparul.SPARULVosExecutor.executeSPARUL(SPARULVosExecutor.java:29)
at org.dbpedia.extraction.live.mirror.changesets.ChangesetExecutor.executeSparulWrapper(ChangesetExecutor.java:140)
at org.dbpedia.extraction.live.mirror.changesets.ChangesetExecutor.executeAction(ChangesetExecutor.java:96)
at org.dbpedia.extraction.live.mirror.changesets.ChangesetExecutor.applyChangeset(ChangesetExecutor.java:54)
at org.dbpedia.extraction.live.mirror.LiveSync.main(LiveSync.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.dbpedia.extraction.live.mirror.sparul.SPARULException: virtuoso.jdbc4.VirtuosoException: COL..: Insert stopped because out of seg data here or elsewhere host 0 key RDF_QUAD slice 0
at org.dbpedia.extraction.live.mirror.sparul.SPARULVosExecutor.execSQL(SPARULVosExecutor.java:88)
at org.dbpedia.extraction.live.mirror.sparul.SPARULVosExecutor.execSQLWrapper(SPARULVosExecutor.java:40)
... 11 more
Caused by: virtuoso.jdbc4.VirtuosoException: COL..: Insert stopped because out of seg data here or elsewhere host 0 key RDF_QUAD slice 0
at virtuoso.jdbc4.VirtuosoResultSet.process_result(Unknown Source)
at virtuoso.jdbc4.VirtuosoResultSet.(Unknown Source)
at virtuoso.jdbc4.VirtuosoStatement.sendQuery(Unknown Source)
at virtuoso.jdbc4.VirtuosoStatement.executeQuery(Unknown Source)
at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
at org.dbpedia.extraction.live.mirror.sparul.SPARULVosExecutor.execSQL(SPARULVosExecutor.java:86)
... 12 more
[WARN ChangesetExecutor] Tried to ADD 772 but failed, splitting into chunks to spot the error
[WARN ChangesetExecutor] Error in query execution:

[ERROR ChangesetExecutor] Cannot ADD triple:
http://dbpedia.org/resource/Alexander_Tachie_Mensah http://dbpedia.org/ontology/wikiPageExtracted "2015-08-27T13:24:03Z"^^http://www.w3.org/2001/XMLSchema#dateTime .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.