Git Product home page Git Product logo

links's Introduction

DBpedia-Links

A repo that contains links and alternative classifications for DBpedia. Other database owners can contribute links into the links folder. The link framework is run each day and validates all link contributions. An overiew and current errors can be seen at the LinkViz.

About

Links are the key enabler for retrieval of related information on the Web of Data and DBpedia is one of the central interlinking hubs in the Linked Open Data (LOD) cloud. The DBpedia-Links repository maintains linksets between DBpedia and other LOD datasets. System for maintenance, update and quality assurance of the linksets are in place and can be explored further.

In this README, we will include descriptions on how to download and use the links, run all available tools as well as pointers to the most important documentation. if questions remain please use the GitHub Issue tracker. If you want to give us more feedback, feel free to use the DBpedia Discussion mailinglist.

Why upload your links?

All links you are contributing will be loaded (after a quality check) into the main DBpedia datasets and therefore will link to your data. Users of DBpedia can then better find your data. Also, we will be able to tell you which other external databases link to your data.

Repository license

All data in the repository links folder is provided as CC-0. All software is provided under Apache 2.0 License.

Please cite our paper :

@inproceedings{DojchinovskiDBpediaLinks,
  author = {Dojchinovski, Milan and Kontokostas, Dimitris and R{\"o}ßling, Robert and Knuth, Magnus and Hellmann, Sebastian},
  booktitle = {Proceedings of the SEMANTiCS 2016 Conference (SEMANTiCS 2016)},
  title = {DBpedia Links: The Hub of Links for the Web of Data},
  year = 2016
}

How to contribute links to DBpedia

If you're interested in contributing links and to learn more about the project, please visit the how to wiki page for more detailed informations.

How to create/update links for one dataset

If you want to update links for one dataset, either create a new folder or update/patch an existing linkset and send a new pull request. Please follow the how to to learn more about how to create a patch for the dataset which will be applied automatically on the next release.

To make sure that your dataset is following proper conventions as mentioned in the how to. Below are instructions to run the framework and validation for your contributed folder before sending the pull request.

How to download the monthly link release

If you want to download the current, or older, releases of the given links, please go here and click at the corresponding month.

How to download the daily link snapshot

The publishing process is automated via a cronjob which will run all given downloads, scripts, LIMES/SILK configurations, patches, etc., to generate the linksets. It is executed daily on our own server and published (http://downloads.dbpedia.org/links/snapshot).

Please check out the how to for more informations regarding the automated process, how to set it up, run it and customize it.

Overview of current linksets

An overiew and current errors can be seen at the LinkViz.

How to run the link extraction framework

Install

mvn clean install

Tests are deactivated by default. Tests will test the links, so activating tests will make a full run of the software.

mvn clean install -DskipTests=true

Running

Create a Snapshot

mvn exec:java -Dexec.mainClass="org.dbpedia.links.CLI" -Dexec.args="--generate"

Create a Snapshot and Run Scripts (increases runtime immensely)

mvn exec:java -Dexec.mainClass="org.dbpedia.links.CLI" -Dexec.args="--generate --scripts true"

Run Everything for One Folder (e.g., your contributed link folder)

mvn exec:java -Dexec.mainClass="org.dbpedia.links.CLI" -Dexec.args="--generate --scripts true --basedir links/dbpedia.org/YOUR_PROJECT"

description of further tools in the repo and how to access/execute them

  • backlinks.py This script can be executed via python 3 (please note that rdflib must be installed). On start, it will prompt for a full folder path; please insert the full path to the linkset destinated main folder. All n-triple files found there will be read and compared to every other linkset within the links folder, to check if subjects within the given linkset are contained in other linksets. All triples will be stored within a backlinks.nt file.

Contributors (alphabetically)

links's People

Contributors

akirsche avatar chile12 avatar dr0i avatar dvcama avatar holycrab13 avatar indigo-lab avatar jimkont avatar kurzum avatar m1ci avatar mgns avatar rpod avatar tallted avatar vivianazuluaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

links's Issues

metadata.ttl description and download link

<#1> a void:Linkset ; dc:author <amit> ; dbp:ntriplefilelocation <http://downloads.dbpedia.org/2016-04/links/nytimes_links.ttl.bz2> ; void:objectsTarget <http://data.nytimes.com/> ; dct:description "nytimes information copied statically" .

  1. extend description
  2. test whether full external URI download works

update NYTimes semantic URLs

I transfered this issue from the extraction framework github.
Originaly posted by @VladimirAlexiev

http://dbpedia.org/page/3M says that nyt:3M is
http://data.nytimes.com/N38115567920937854392.
However, that is long obsolete and doesn't resolve.

If you go to https://tools.wmflabs.org/sqid/#/browse?type=properties
and do "filter labels" by "new york" or "nyt", you'll see 5 new URL patterns that work.

(Note: I wrote this in some notes Dec 2016, so don't remember what is "odd" about P3221, and have no clue how to go from N38115567920937854392 to nytd_org/3M Company)

Generate a simple website for backlinks

  • protect with .htaccess and .htpwd
  • sync on download server
  • Table should have at least two columns with linkset name and download link for backlinks
  • optionally the table could also link to the linkset in the repo and links download server

structure download folder better

Problem:
http://downloads.dbpedia.org/links/ has too many folders:
2016-10/ 10-Oct-2016 08:23 -
links/ 17-Oct-2016 17:32 -
tools/ 17-Oct-2016 17:32 -
types/ 17-Oct-2016 17:31 -
README.md

should only contain:
2016-10/ 10-Oct-2016 08:23 -

proposed solution:
clone repo in a non-public place, then link only the folder with the links vie ln -s

fix the implementation

  • find the error that produces 0 links (if metadata is wrong links can still be generated, this is a message for jan)
  • remove static duplicates from the metadata.ttl ntriplefilelocation
  • change order of execution in java-maven generatelinks

Linkset Visualization

below is a list of improvements for linkviz.

  • could you move the tool to links/tools/linkviz ?
  • linkviz as github.io page

Above the table add (as a summary X links to Y datasets):

  • number of total links in the latest release
  • number of datasets

Add the following fields in the overview table

  • Max links for every dataset

data for the tasks below

  • @akirsche will provide the links to github and the metadata.ttl in a new data file

Improve the leftmost column "Linkset"

  • Informative linkset names in the first, ideally rdfs:label from metadata.ttl, if label is missing use the current name and display a warning, that the label is missing
  • link to github repo folder, displayed as edit pen
  • Check for valid metadata.ttl
  • for us it was unclear that the title can be clicked, so that the extra info opens

For each linkset chart

for each linkset info (on the right side)

  • Add the content of metadata.ttl to the right pane,

  • if any data is missing in metadata.ttl this can be displayed as a warning

  • Fix error with firefox browser and d3js

Validate all turtle in the repo in CI

We had a few syntax errors in some files that took time to identify & fix.
The idea is to create a TravisCI script that takes all rdf files in the repo and tries to validate them with rapper.
Travis has a new container based infrastructure were we can use normal bash scripts to do the validation
http://docs.travis-ci.com/user/migrating-from-legacy/

This approach will also validate all incoming pull requests

update readme

  • choose good examples for each contribution way and put in readme
  • static ways should be marked as discouraged
  • merge readme from tools/java-maven

Missing Validation

  • URIs for Linksets should start with a letter, there is an old Jena incompatibility and if the linksets are called <#1> the code will not work. They should rather be called
  • scripts should always have a dedicated outputfile as parameter
  • each linkset can have a combination of several ntriplefiles, scripts, sparqlqueries and linkConfs, however only one endpoint
  • check whether sparql endpoint is active

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.