cygri / void Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 1.0 7.93 MB

An RDF schema and associated documentation for expressing metadata about RDF datasets

Home Page: http://www.w3.org/TR/void/

CSS 2.46% HTML 97.54%

void's People

Contributors

Stargazers

Watchers

Forkers

dataibi

void's Issues

property names for void:Dataset - void:Linkset - void:Dataset

Raised by: the gang of four ;)

Content: need to decide if it is 

void:Dataset <-[void:from]- void:Linkset - [void:to] -> void:Dataset

OR rather

void:Dataset -[void:contains]-> void:Linkset - [void:target] -> 
void:Dataset

Original issue reported on code.google.com by Michael.Hausenblas on 12 Jun 2008 at 1:17

Explain difference between SPARQL datasets and voiD datasets

The term "dataset" is used in voiD (inherited from the LOD community), but it 
is also used in 
SPARQL in a different sense (as "RDF Dataset", see definition here: 
http://www.w3.org/TR/rdf-
sparql-query/#sparqlDataset )

We should have a section somewhere that clearly explains the difference.

If we get an extra section on describing SPARQL endpoints (aligned with the 
SPARQL WG's output), 
that might be a good place.

Original issue reported on code.google.com by [email protected] on 21 Sep 2009 at 4:37

Mirroring of datasets should be describable with voiD

How to describe the relationship between a dataset and its mirror?
eg: I have a copy of some part of dbpedia (but not an exact copy), how
should I link my dataset to dbpedia in the voiD description, so that people
can use my endpoint as an alternatives to dbpedia, and know what caveats
there are if doing so?

Original issue reported on code.google.com by [email protected] on 25 Sep 2009 at 9:09

SPARQL endpoint description terms/voc

The group ack the need for such terms but decides to wait and see what the
SPARQL WG does. There are some existing proposals such as [1] and we have
compiled some use cases which will be given to the SPARQL WG as an input.
We will liaise with the WG on this issue.


[1] http://ontologi.es/sparql

Original issue reported on code.google.com by Michael.Hausenblas on 27 Aug 2009 at 1:23

Merged into: #43

[deleted issue]

[deleted issue]

Lift some text from LDOW09 paper into voiD Guide?

The voiD Guide is very brief in its definition about datasets and linksets. I 
wrote some text about 
this for the LDOW'09 paper, it became Section 3.1 and 3.2. Is some of this 
worth lifting into the 
Guide?

The paper is here: 
http://events.linkeddata.org/ldow2009/papers/ldow2009_paper20.pdf

Original issue reported on code.google.com by [email protected] on 10 Aug 2009 at 7:08

provenance information

Raised by: [http://semanticweb.org/wiki/Olaf_Hartig Olaf Hartig]

Content:
A description of an RDF dataset should contain information about the 
provenance of the data in the dataset.

See also:
* http://community.linkeddata.org/MediaWiki/index.php?
MetaLOD#Provenance_information

Original issue reported on code.google.com by Michael.Hausenblas on 12 Jun 2008 at 1:13

Blocking: #109

Proposal: URI lookup endpoints

This has been requested by Hugh Glaser and Ian Millard from ECS Southampton to 
model their 
CRS (Co-reference Service) deployments.

The idea is to add another method of accessing datasets, in addition to linked 
data (simply 
dereferencing the URIs), RDF data dumps, and SPARQL endpoints, all of which can 
already be 
described in voiD.

The proposal is to add one more property which can be used on datasets:

:MyDS a void:Dataset;
    void:uriLookupEndpoint <http://example.com/lookup?uri=>;
    .

The semantics of this: There is a service that will give you RDF triples about 
a certain URI x. To 
invoke it, you URL-encode x and append the result to the lookup endpoint URI, 
and look up the 
resulting URI.

Examples of services that can be described as a voiD dataset using this 
approach:

* Sindice and other search engines that offer URI lookup APIs
* Hugh Glaser's CRS (Co-reference Service)
* Converter services that translate from other data formats into RDF
* Theoretically: caches that serve linked data for someone else (using the 
original URIs)

Open issue: Should the object of void:uriLookupEndpoint be an RDF resource or a 
literal?

Real-world example of a voiD description for a CRS:

<http://dblp.explorer.com/id/crs> a void:Dataset;
    foaf:homepage <http://dblp.rkbexplorer.com/crs/>;
    rdfs:label "Coreference Resolution Service for the DBLP dataset at rkbexplorer.com";
    void:vocabulary <http://www.rkbexplorer.com/ontologies/coref>;
    void:uriLookupEndpoint <http://dblp.rkbexplorer.com/crs/export/?type=uri&term=>;
    .

Original issue reported on code.google.com by [email protected] on 17 Dec 2008 at 2:26

Should void:Linkset be a subclass of void:Dataset?

A linkset is a kind of dataset, isn't it? Or not? Maybe the two classes are 
disjoint? We should discuss 
this and maybe express it explicitly in the vocabulary spec.

Original issue reported on code.google.com by [email protected] on 10 Dec 2008 at 11:51

Back up user story for using void in the bio2rdf project

The talk given by Marc from Bio2RDF in the HCLS call in November 2008
results in a possible story of using voiD in Bio2RDF. But we had not enough
information to understand their real problems and their work was also just
at the starting point. We keep the story here for future reference.

Example Use Case -  the Bio2RDF dataset: (JUN)
Basically, they have a lot of datasets now. For the sake of performance,
they are partitioning these datasets and storing them on separate SPARQL
endpoints. It is very possible that every SPARQL endpoint hosts the dataset
from a different source. At the same time, they want the ability of
backtracking the statements about a thing from each of the SPARQL endpoint.

For example, to find out which SPARQL endpoint provides statements about a
UniProt gene. What they are doing now is by creating a map of how the data
from different endpoints are linked with each other by properties, and
using this map to guide their query processing.

How this can be supported using voiD. I am thinking the following process:
1. Bio2RDF gets the URI of a data resource, say a :uniprot_genexx
2. they query for the dataset where this data resource belongs to, say
:uniprot_ds
3. they search for the links from/to:uniprot_ds, and they found the
linksets, one of them could be :uniprotToKegg_ls
3. they search for which datasets that :uniprot_ds is linked with/to, say
one of them is :kegg_ds
4. then, they can send SPARQL queries to :kegg_ds, to find out what they
say about :uniprot_genexx

Maybe, the linkset :uniprotToKegg_ls contains all the owl:sameAs statements
about :uniprot_genexx, which will help with the queries.

Original issue reported on code.google.com by [email protected] on 12 Dec 2008 at 3:07

Clarify that rdf:type does not create linksets

Minor point: rdf:type can also be seen as "linking to external resources" 
because the objects of 
rdf:type statements reside on a different domain. We should clarify that this 
does not constitute a 
linkset, and that the void:vocabulary construct is to be used for classes.

Original issue reported on code.google.com by [email protected] on 10 Nov 2009 at 2:15

Which linking property is used in a void:Linkset?

There should be a property to be used in Linksets, which specifies the kind of 
links that are 
contained in the Linkset. Are they sameAs links? Are they seeAlso links? Are 
they foaf:knows links?

Proposal:

void:linkPredicate a rdf:Property;
    void:domain void:Linkset;
    void:range rdf:Property;
    rdfs:label "link predicate";
    rdfs:comment "The linkset contains links of this RDF predicate.";
    .

Original issue reported on code.google.com by [email protected] on 23 Jun 2008 at 7:25

Describe the graphs in a SPARQL endpoint

SPARQL endpoints provide access to an "RDF dataset", that is, a default graph 
and zero or more 
named graphs.

voiD could be handy for characterising the contents of these different graphs. 
This can be 
helpful for SPARQL user interfaces, for query optimization, and for federation.

I suppose one would define several void:subsets of the main void:Dataset, and 
then express that 
certain of these subsets reside in certain named graphs or in the default 
graph. The SPARQL 
endpoint URI (specified via void:sparqlEndpoint) could be used as a connecting 
node to tie the 
different graphs together.

Some of this will probably be provided by the SPARQL vocabulary under 
development in the 
SPARQL WG. So we could just wait and see, or try to anticipate their design.

A strawman from Gregory Williams is pasted below. I assume that the top-level 
blank node 
would be the SPARQL endpoint URI.

[] a sd:Service ;
    sd:dataset [
        sd:defaultGraph [
            sd:datasetDescription <uri of voiD data>
        ] ;
        sd:namedGraph [
            sd:graphName <uri of named graph>
            sd:datasetDescription <uri of voiD data>
        ]
    ] .

Original issue reported on code.google.com by [email protected] on 21 Sep 2009 at 5:50

Blocking: #85, #82, #89

Adding a type for VoiD description documents?

Some vocabularies define a class for documents that mainly use the vocabulary. 
For example, FOAF 
has foaf:PersonalProfileDocument. Shall we add something similar to void, e.g. 
void:DatasetDescription? This would be a subclass of foaf:Document.

Benefit: It might encourage people to add metadata to their voiD files, e.g.

<> a void:DatasetDescription;
    foaf:primaryTopic :MyDataset;
    dc:modified "2009-02-15"^^xsd:date;
    .

Original issue reported on code.google.com by [email protected] on 16 Feb 2009 at 12:16

DS/LS picture in Section 2 is confusing

Stuart Williams commented on the picture at the start of Section 2:

| The diagram at the start of section 2 is actually a little confusing. It
| looks like it presents two datasets :DS1 and :DS2 (each being a collection
| of statements) and that each dataset'contains a named subset :LS1 and LS2
| respectively, of linksets - in the example expressing populations of links
| using foaf:knows, rdfs:seeAso and owl:sameAs properties. However, as the
| later examples unfold, :LS1 and :LS2 are not 'subgraphs' of their
| respective graphs, they are (optionally) named linkset resources that act
| as statement subjects for some statements describing *a* particular
| linkset. Indeed even for the example illustrated there are (or would be)
| three defined linkset nodes (two in :DS1 and one in :DS2) and the regions
| that are demarque as :LS1 and :LS2 don't exist quite as presented (AFAICT).

Opinions? Does it make sense that all the three arrows originate/end in the 
same two :LS 
resources?

Original issue reported on code.google.com by [email protected] on 27 Aug 2009 at 8:50

Licenses vs. waivers, and align with Talis licensing recommendations

Talis is spending considerable resources to improve the general situation 
around dataset licensing. 
They've contributed to the PDDL effort, there was a high-profile article by Ian 
Davis [1], and several 
Talis folks did a tutorial on dataset licensing at ISWC [2].

They tie licenses to void:Datasets, but at least in Ian's proposal, it doesn't 
use dc:license, but a 
separate vocabulary, the Waiver vocabulary. See [1] for example use.

Should we align this somehow? Document the use of the Waiver vocabulary? At 
least I think we 
should create consensus with them about the proper way of marking up different 
licenses/waivers 
in voiD.

[1] http://blogs.talis.com/nodalities/2009/07/linked-data-public-domain.php
[2] 
http://www.opendatacommons.org/events/iswc-2009-legal-social-sharing-data-web/

Original issue reported on code.google.com by [email protected] on 10 Nov 2009 at 12:47

Guide section on dataset partitioning could be improved

Jiri Prochazka complained about the Guide section on dataset partitioning:

“Another thing - dataset partitioning. Combination of dataset
categorization and partitioning led me to great confusion - I have
thought voiD also wanted to categorize the data in the dataset.
Better to put a notice that partitioning should be used carefully and
that it was designed for mirroring of datasets.”

I think the section should be improved to make the following points clear:

- partitioning is about the case where a voiD author wants to say something 
that applies to just 
a part of the dataset, and wants to stress that it does *not* apply to all of it
- a partition is itself a void:Dataset and can be described using any of the 
usual properties
- list more examples why we would want to have partitions: different 
provenance, different 
publication dates, different SPARQL endpoints, different dumps, different 
vocabularies used, 
different topic
- probably also worth mentioning that the same mechanism can also be used to 
*aggregate* 
datasets

Original issue reported on code.google.com by [email protected] on 2 Feb 2009 at 1:31

void:feature example abuses dcterms:format

Pointed out by Simon Renhardt on public-lod:

The example void:features in section 1.5 use dcterms:format in a poor way. It's 
probably more 
intended to be used on documents (rather than on features), and its object 
should be a resource 
rather than a literal.

We should probably change the example to something a bit more neutral, and make 
clearer that it's 
just an example and not something that everyone should do.

We should work towards collecting things that people want to express as 
void:features, and maybe 
include a list of predefined features in the next voiD version.

Original issue reported on code.google.com by [email protected] on 30 Jan 2009 at 4:59

Better discussion of void:linkPredicate

void:linkPredicate is currently only mentioned in passing. I'd like to see a 
fuller discussion.

- What does it mean if no linkPredicate is stated for a linkset?
- What does it mean if a linkPredicate is stated?
- What does it mean if several linkPredicates are stated?

What can a consumer of the data expect in each of the cases above?

Original issue reported on code.google.com by [email protected] on 10 Nov 2009 at 2:13

Expressing URI patterns for a dataset

VoiD should have the ability to express that a dataset contains URIs of a 
certain shape. For 
example, DBpedia has resources of this shape:

http://dbpedia.org/resource/*

Knowing this can be useful to find datasets that contain information about a 
given URI. Note that 
Semantic Sitemaps can express this using the <sc:linkedDataPrefix> element. 
This is especially 
useful for integrating several SPARQL endpoints, because knowing which SPARQL 
endpoint has 
information about what URIs can save a lot of processing resources.

Proposal:

:MyDataset a void:Dataset;
        void:uriPattern "^http://example\.com/data/".

The value of void:uriPattern would be a regular expression. Making this a regex 
has one nice 
advantage. Assuming we want to know which dataset contains information about a 
resource 
http://example.com/data/12341234, We can ask SPARQL queries like this:

SELECT ?dataset
WHERE {
    ?dataset a void:Dataset .
    ?dataset void:uriPattern ?pattern .
    FILTER(REGEX("http://example.com/data/12341234", ?pattern))
}

This will return all datasets matching the URI.

Two other possibilities would be:

        void:uriPattern "http://example.com/data/{id}";
        void:uriPrefix "http://example.com/data/";

The first one uses URI Templates (see [1]), the second one is a much simpler 
prefix-based 
solution.

[1] http://bitworking.org/news/URI_Templates

Original issue reported on code.google.com by [email protected] on 11 Dec 2008 at 10:25

Align voiD with LRDD

To check if it makes sense to enable LRDD-based discovery for voiD, based
on [1] and  [2].



[1] http://linkeddata.deri.ie/tr/2009-discovery
[2]http://uldis.deri.ie/

Original issue reported on code.google.com by Michael.Hausenblas on 13 Aug 2009 at 10:17

Dataset topic is the intersection of dc:subjects?

Section 1 of the Guide says: "Note: It is assumed that the intersection of all 
resources defines the 
topic." And it gives the following example:

    :DBLP a void:Dataset; 
          dcterms:subject <http://dbpedia.org/resource/Computer_science> ;  
          dcterms:subject <http://dbpedia.org/resource/Journal> ;  
          dcterms:subject <http://dbpedia.org/resource/Proceedings> .  

Stuart Williams commented on this:

| Conjunctive use of dcterms:subject... whilst I think I understand the
| pragmatic appeal, given the example I think that the intersection of
| Computer Science (some conceptual domain of study/investigation; a Journal
| (a form of publication); and Proceedings (a different form of publication
| usually arising from a workshop or conference and IIUC dijoint with
| Journals); is empty. Yes I know that's very anal (and maybe big 'O'ist) of
| me. I think that you have several dimensions squeeze int one - here
| computer science truely is a subject domain, but journal and proceeding
| really are more modes or category of publication that being subject
| domains.

I think he's right. I don't remember why we put that sentence in there. I would 
prefer to simply 
strike it.

Original issue reported on code.google.com by [email protected] on 27 Aug 2009 at 8:44

backlink to voiD description

Raised by: [http://rdf.ecs.soton.ac.uk/person/21 Hugh Glaser]

Content: backlink from any part of the dataset to the voiD description

See also:

Original issue reported on code.google.com by Michael.Hausenblas on 12 Jun 2008 at 1:09

void:Dataset sub-classes and external mapping

I was thinking that void:Dataset might get sub-classed to dcmitype:Dataset 
(see <http://dublincore.org/documents/dcmi-type-vocabulary/>) *but* - 
sometimes it may not be a true dataset but just a service 
(dcmitype:Service).

See, there are different kinds of providers of Linked Data at the moment 
(which might be defined as sub-classes of void:Dataset):
- Those that have RDF stored somewhere and just export it. The storage 
might be a native RDF store or a relational database (which doesn't even 
have to store the RDF directly but may convert on-the-fly from a regular 
schema). Data may be produced locally or come from somewhere else, like a 
dump of data that is to be exposed as Linked Data.
- Those that call a Web service and wrap its response as RDF on-the-fly.
- Those that call a Web service and cache the results for later requests.

I'm not sure what value there is in defining these sub-classes. Describing 
the costs of making certain calls (in the case of wrappers) or the up-to-
dateness of the data (in the case of RDFised dumps) might hold more value.
But the point is that a wrapper is actually a service and not a dataset 
itself (then again DBpedia is a service which exposes their dataset).

Thoughts?

Original issue reported on code.google.com by [email protected] on 19 Jan 2009 at 10:54

Void v2 - creation of void:isPartOf or such

As per my discussion on #swig:


http://chatlogs.planetrdf.com/swig/2009-05-27.html#T16-08-26

Original issue reported on code.google.com by [email protected] on 27 May 2009 at 3:41

Proposal: Describing dataset contents by giving a typical RDF snippet

Raised by: [http://moustaki.org/foaf.rdf#c4dm Yves Raimond]

Content:
It should be possible to describe the content of a SPARQL end-point 
(proposal add a void:example property).

See also:
* http://community.linkeddata.org/MediaWiki/index.php?MetaLOD#Requirements
* http://blog.dbtune.org/post/2008/06/12/Describing-the-content-of-RDF-
datasets

Original issue reported on code.google.com by Michael.Hausenblas on 12 Jun 2008 at 1:03

Change frequency

A property void:changeFrequency could give an estimate of update times, 
just like <changefreq> in sitemaps <http://www.sitemaps.org/
protocol.php#changefreqdef> (which is also used in the semantic sitemap 
extension: <http://sw.deri.org/2007/07/sitemapextension/#xml-tags>).

Maybe this should even be a scovo:Dimension to be used for void:statItem?

Someone please set this as an enhancement request.

Original issue reported on code.google.com by [email protected] on 20 Jan 2009 at 9:19

Can we use SPARQL aggregates to calculate statistics?

The idea was mentioned to me by Alex Passant. It's something we should explore, 
especially since 
aggregates are going to be part of SPARQL2.

Original issue reported on code.google.com by [email protected] on 10 Feb 2009 at 2:07

GNU license URLs

The licensing section of the VoiD Guide lists this URI for the GFDL:
http://www.gnu.org/copyleft/fdl.html

This should be:
http://www.gnu.org/licenses/fdl.html

A link to the list of all GNU licenses might also be a useful addition:
http://www.gnu.org/licenses/

Original issue reported on code.google.com by [email protected] on 18 Jun 2009 at 11:10

Alignment with DARQ/DOSE

This issue explores alignment between voiD and the DARQ Service Description 
ontology (DOSE – 
Description Of a SErvice).

An example DOSE description:
http://darq.sourceforge.net/index.html#Service_Descriptions

The DOSE ontology definition (incomplete, e.g. doesnt have selectivity):
http://darq.svn.sourceforge.net/viewvc/darq/darq/trunk/Schema/dose.n3?
revision=9&view=markup

=== SUMMARY OF DOSE ===

The main item in a DOSE description is an sd:Service, which is a SPARQL service 
(the ontology 
comment suggests that it could be renamed to sd:Endpoint).

An endpoint can have these properties: sd:url (as a literal), sd:isDefinitive 
(not clear what it 
means), sd:totalTriples, and a number of sd:Capabilities and 
sd:RequiredBindings.

An sd:Capability states that the service can process queries that include a 
certain triple pattern. 
A capability can be described in more detail using thse properties: 
sd:predicate (predicate of the 
triple pattern), sd:triples (how many of the triples exist), sd:sofilter 
(SPARQL FILTER expression 
involving ?subject and ?object that holds for all triples; typically used to 
constrain URIs using 
regexes), sd:subjectSelectivity and sd:objectSelectivity (selectivity if the 
subject/object is known).

sd:Requiredbindings come in two flavours, subjectBinding and objectBinding. A 
subjectBinding 
instance declares that a certain property may only occur with a fixed subject 
(that is, no variable) 
in the query.

=== ALIGNMENT VOID/DOSE ===

The simplest alignment would be to state that the object of void:sparqlEndpoint 
is an sd:Service. 
When a voiD description includes a SPARQL endpoint, then further DOSE 
descriptions can simply 
be added using the endpoint URI.

Another option would be to add triple pattern statistics directly to voiD. A 
dataset could have 
TripleSets. They would be something similar to a LinkSet, but not requiring 
external URIs. A triple 
set could have: predicate, number of triples, URI templates for subject and 
object, selectivity 
information (modelled as another stat?). Adding a RequiredBindings mechanism to 
voiD seems a 
bit more difficult since it's really a property of the SPARQL access mechanism 
and not of the 
dataset as such.

:myDataset void:contains [
    a void:TripleSet;
    void:predicate foaf:knows;
    void:subjectTemplate "http://sn.example.com/user/{username}";
    void:objectTemplate "http://sn.example.com/user/{username}";
    void:statItem [
        scovo:dimension void:numberOfTriples;
        rdf:value 25432;
    ];
    void:statItem [
        scovo:dimension void:objectSelectivity;
        rdf:value 0.00002;
    ];
] .

Original issue reported on code.google.com by [email protected] on 11 Jul 2008 at 7:11

Clarify use of void:vocabulary

Which URI exactly do I use for any given vocabulary?

We say, the one that's the object of isDefinedBy triples for the vocab terms, 
but often isDefinedBy is 
not used in real-world vocabularies.

Should we say "downloadable location"? Should we say "namespace URI"? What 
about trailing 
hashes, leave them or remove them?

I would prefer having some really clear guidance in the Guide.

Original issue reported on code.google.com by [email protected] on 10 Nov 2009 at 12:26

Linking a dataset to its website

In the current examples, owl:sameAs is used to link a dataset to its website.
Yet, I think it would be better to make a difference between the dataset itself 
and the website / 
homepage that hosts / gives information about it.

foaf:topic is a solution (domain = owl:Thing / range = foaf:Document), but as 
this is an IFP, we 
cannot have 2 different datasets linked to dbpedia.org for example. It would 
imply to be careful 
with that and use, http://wiki.dbpedia.org/Downloads30#titles rather that 
http://dbpedia.org to 
make the link for each 'subdataset' hosted by dbpedia.

Original issue reported on code.google.com by [email protected] on 13 Jun 2008 at 7:34

voiD's versioning capabilities

As pointed out by Jacco van Ossenbruggen on 2009-02-11:

'One of the key things I would like to see added is some simple versioning.
For interlinking, statistics and publishing this seems a crucial feature.
If you could only make the version of a Dataset explicit by adding a  
void:version datatype property, this would also also allow you to say 
that, for example, a Linkset interlinks two specifically versioned 
Datasets, or that a Dataset uses a specific version of a vocabulary, etc.'

Original issue reported on code.google.com by Michael.Hausenblas on 11 Feb 2009 at 10:55

HTTP link header using voiD discovery

As recently discussed on #swig [1] one could 'Link header to point from the
RDF document response to the void description' (via Shepard).

Maybe we should address this in the voiD guide (eg in section 4. Consuming
Process)

Related:
 + http://esw.w3.org/topic/FindingResourceDescriptions
 + http://www.mnot.net/drafts/draft-nottingham-http-link-header-03.txt

[1] http://chatlogs.planetrdf.com/swig/2009-01-11#T12-38-19

Original issue reported on code.google.com by Michael.Hausenblas on 11 Jan 2009 at 1:00

way to relate a Dataset to a text search service over the dataset

A discussion the public-lod mailing list[1] raises the issue that linked
datasets benefit greatly in usability by having full text search over their
data.
Many datasets already provide this feature (with some notable exceptions),
and we should provide a way to include the search service of a dataset in
its voiD description.

One point to bear in mind may be that the service might be external to the
dataset itself - for example, if the dataset is deployed as RDFa, Google
could be used to provide the search service, using the site: syntax for
restricting results to a particular domain.

It may be worth looking at http://schemas.talis.com/2005/service/schema#
which provides generic terms for describing web services. 

I'd probably like something more obvious and specific though - subclassing
from void:feature perhaps. eg: void:textSearchService  





[1] http://lists.w3.org/Archives/Public/public-lod/2009Feb/0058.html

Original issue reported on code.google.com by [email protected] on 9 Feb 2009 at 8:58

Which vocabulary is used in a dataset

VoiD should be able to express which vocabulary is used in a dataset. Example 
use cases would 
be queries like:

“Give me datasets that contain FOAF data.”
“Does MusicBrainz use the Music Ontology?”
“Which vocabulary or ontology is used by the largest number of datasets?”

This information is also useful for tools like query builders, they can use it 
to pre-populate fields 
for selecting classes and properties.

We already have the void:linkPredicate property for linksets, but that has a 
much more narrow 
scope and doesn't address the use cases above.

Proposal 1:

Have a new property void:vocabulary with domain void:Dataset and range 
owl:Ontology. It links a 
dataset to a vocabulary. The vocabulary would be something like the formal FOAF 
spec, that is, 
the object of rdfs:isDefinedBy for the FOAF terms.

Proposal 2:

Maybe we want to list not just used vocabularies, but something more 
fine-grained: some of the 
classes and properties used in the dataset? I guess this will never be an 
exhaustive list, but only 
the most important/frequent ones:

void:prominentClass
void:prominentProperty

Original issue reported on code.google.com by [email protected] on 4 Dec 2008 at 12:32

voiD guide section 4.1 needs copy-editing

noticed a few typos and so on that I'd like to fix

Original issue reported on code.google.com by [email protected] on 28 May 2009 at 8:28

Issues with statistics in voiD

We are aware of the following issues with the statistics mechanism as it stands 
today:

1. In SCOVO, scovo:Items are grouped into scovo:Datasets, and there seems to be 
an implicit 
assumption that all items in such a dataset share the same dimensions. As 
described here, we 
attach items directly to a void:Dataset, which leads to mixing of items of 
different dimensionality. 
On the other hand, the correct SCOVO modelling would lead to awkwardly complex 
notation for 
simple statistics.

2. We encourage the use of classes and properties in places where SCOVO 
requires an instance of 
scovo:Dimension. This breaks the symmetry of the SCOVO model. SCOVO would 
require us to 
create a scovo:Dimension for each class or property. This would be quite 
verbose.

3. Because of the issues above, SPARQLing for statistics can be awkward. It 
will often require a 
verbose check to make sure that an item has only certain dimensions and no 
others.

Two possible approaches for fixing these issues:

1. Adapt SCOVO to better suit our needs, e.g. by making it a bit less verbose 
(esp. around 
definition of dimensions), making it easier to query (e.g. 
scovo:numberOfDimensions on the 
dataset, scovo:domainObject property for connecting dimensions to the domain, 
removing the 
subclassing of dcterms:Event) (downside: will still be verbose)

2. Create a new mechanism based on simple properties like void:numberOfTriples 
and having a 
powerful mechanism for specifying void:subsets (downside: how to do attribution 
of statistics 
with this is completely unclear)

We decided not to take action on those issues until after the first release of 
the Guide.

Original issue reported on code.google.com by [email protected] on 14 Jan 2009 at 9:10

linking void:DataSet to data access points

Is the sitemaps ontology developed prior to sitemaps.xml usable?  how do we
link void:DataSet to data access points (sparql endpoints, REST interfaces,
etc)?

Original issue reported on code.google.com by [email protected] on 1 Jul 2008 at 1:42

Adding a list of built-in standard feature instances

Should voiD 2.0 define a list of “standard” feature URIs that people can 
just use so they don't 
need to define their own?

Example:

void:ContentNegotiation a void:Feature .

Ideally, such a list would be composed from examples found in the wild but I 
don't know if 
void:Feature is used at all out there in the wild.

A starting point might be to trawl the Linking Open Data wiki pages for dataset 
descriptions such 
as "It supports X, Y and Z". Or to look at the homepages of some datasets for 
similar descriptions.

Original issue reported on code.google.com by [email protected] on 27 Aug 2009 at 1:42

Comments re voiD guide from Stuart Williams

From: "Williams, Stuart (HP Labs, Bristol)"
Date: Mon, 23 Feb 2009 16:22:28 +0000

Subject: voID

Hello Richard, Michael,

Just had a rapid read through the voID Guide
(http://rdfs.org/ns/void-guide) and thought that I'd offer some comments
for whatever they may be worth...

I think that it would be useful to comment in the difference between the
SPARQL conceptualisation of dataset (a default graph and collection of
named graphs) and the voID conceptualisation of a dataset (which I think is
a single graph - though 

Section 1:
Conjunctive use of dcterms:subject... whilst I think I understand the
pragmatic appeal, given the example I think that the intersection of
Computer Science (some conceptual domain of study/investigation; a Journal
(a form of publication); and Proceedings (a different form of publication
usually arising from a workshop or conference and IIUC dijoint with
Journals); is empty. Yes I know that's very anal (and maybe big 'O'ist) of
me. I think that you have several dimensions squeeze int one - here
computer science truely is a subject domain, but journal and proceeding
really are more modes or category of publication that being subject
domains. I certainly think that the range of dcterms:subject should be
something like skos:Concept (not looked as skos of late). But I think that
composite subjects are hard.

Section 2:
The 'subset' property could do with being renamed 'hasSubset' or
'isSubSetOf' - I think that the sense of it is the former, but at least for
me the directionality does not stick in my memory for long.

The diagram at the start of section 2 is actually a little confusing. It
looks like it presents two datasets :DS1 and :DS2 (each being a collection
of statements) and that each dataset'contains a named subset :LS1 and LS2
respectively, of linksets - in the example expressing populations of links
using foaf:knows, rdfs:seeAso and owl:sameAs properties. However, as the
later examples unfold, :LS1 and :LS2 are not 'subgraphs' of their
respective graphs, they are (optionally) named linkset resources that act
as statement subjects for some statements describing *a* particular
linkset. Indeed even for the example illustrated there are (or would be)
three defined linkset nodes (two in :DS1 and one in :DS2) and the regions
that are demarque as :LS1 and :LS2 don't exist quite as presented (AFAICT).

Section 3:
I don't quite understand how you could attribute a value to
void:numberOfDocuments. Taking a deliberatly obtuse stand, a dataset
contains triples, numbers of  distinct subjects and objects makes sense,
but numbers of documents - doesn't seem to me to be a dimension of such a
dataset.

In the voID ontology, void:LinkSet is defined to be a subclass of
void:Dataset which gives some syntactic convenience in the re-use of
statistical properties (and probably some others too) but I'm not convinced
that ontologically a :LinkSet us a subclass of a :Dataset - particularly in
the form given where an instance of :LinkSet really can only establish that
a single given property is used to link between a pair of :Dataset.

Section 5.1:
Hmmm... lots of scope for confussion.

    <document.rdf> dcterms:isPartOf <void.ttl#MyDataset> .

Kind of curious from the point of view of having previously established
sparql, uriLookup and dump endpoints why one would be remotely interested
in <document.rdf> as being a part of the dataset (unless separately it was
a dataset in its own right with it's own set of endpoints etc).

Original issue reported on code.google.com by Michael.Hausenblas on 4 Mar 2009 at 12:13

Alignment with MetaVocab

MetaVocab is a proposed “vocabulary for describing vocabularies”, dating 
back to 2002. It has 
found fairly widespread use because it is a documented module for RSS 1.0 feeds:

http://web.resource.org/rss/1.0/modules/admin/

and because it is used in the popular FOAF-a-matic. The vocabulary itself 
should be defined 
here, but seems to be a victim of bitrot:

http://webns.net/mvcb/

The two key term of the vocabulary are generatorAgent (pointing to a URI that 
identifies the 
software that generated an RDF document) and errorReportsTo (usually a mailto: 
URI for 
contacting the webmaster responsible for an RDF document).

GeneratorAgent triples could be attached to void:Datasets and void:Linksets to 
point to the 
software that generated the dataset. ErrorReportsTo could also be provided with 
a linkset as a 
nice shortcut for getting in touch with the publisher. We could put examples 
into the Guide.

On the other hand, the vocabulary appears to suffer from some bitrot and isn't 
properly 
published according to best practice guidelines. Therefore, it might be better 
to first contact the 
authors and get them to fix it, or just copy the terms into our own namespace.

Original issue reported on code.google.com by [email protected] on 18 Jul 2008 at 12:06

how is quality represented

One wants not only functional description but also know about the quality
of a dataset. How are we gonna deal with this? Using a review voc such as
http://hyperdata.org/xmlns/rev/ or invent new entities for it?

Original issue reported on code.google.com by Michael.Hausenblas on 23 Jan 2009 at 10:55

Are void:uriRegexPatterns anchored?

The Guide and vocabulary documentation leaves this question unclear: Does the 
URI  pattern 
"http://example.com/" match only the single URI "http://example.com/", or does 
it match any URI 
containing the string, e.g., "http://example.com/myresource"?

This should be stated explicitly, and all examples should clearly communicate 
this.

Original issue reported on code.google.com by [email protected] on 17 Nov 2009 at 3:25

Alignment with DC Collections Application Profile

http://dublincore.org/groups/collections/collection-application-profile/

This describes an interesting meta-model for collections, has some terms that 
could be useful to 
align to, and is also interesting from an editorial POV because the voiD Guide 
is, in some sense, 
also an “application profile” of several vocabularies.

Original issue reported on code.google.com by [email protected] on 2 Feb 2009 at 1:12

what datasets accept

some datasets are actively seeking data. How can I indicate what my dataset
will accept, and how to submit it? RDF Forms ?

Original issue reported on code.google.com by [email protected] on 19 Jun 2008 at 5:22

Example resources for linksets?

We have void:exampleResource to give example URIs for a dataset. Now I have a 
scenario where I 
want examples for linksets. How to do that? The items in a linkset are triples, 
not simple resources, 
so it's not so easy.

The simplest answer would be: Just take your example triple and give either the 
subject or the 
object (depending on what you consider the better example) with 
void:exampleResource. For 
linksets that are hosted inside a larger dataset, probably the resource which 
is within the DS should 
be given, not the target.

If this design is acceptable, then we should briefly document it in the next 
Guide.

Original issue reported on code.google.com by [email protected] on 15 Feb 2009 at 11:57

[deleted issue]

[deleted issue]

Alignment with SIOC

Looking at the SIOC ontology, 2 classes might be of interest for voiD

sioc:Space : A Space is defined as being a place where data resides. [...]
I guess it will make sens to subclass void:DataSet from it

siocs:Service (http://rdfs.org/sioc/services#Service) :  Service is web service 
associated with a 
Site or part of it 
Actually it says 'Site' but in the related siocs:has_service property, there is 
no range so that a 
void:SparqlEndpoint could be a subclass of it, linked to the DataSet using this 
siocs:has_service.
There are also service_endpoint / service_protocol properties that could be used




What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 13 Jun 2008 at 7:25

Describe relationship between inferred datasets

This is requirements from Andy Gibson, as quoted below.
{{{
I'm now at the point where I want to describe the relationship between:

1) A graph (dataset) of asserted triples
    - lets say a skos vocabulary

2) (optional) A graph containing some 'semantics'
    - like an OWL ontology, which may of course be a part of 1)
    - lets say I've extended SKOS with some of my own Classes / Properties
and I'm inferring broader/narrower relationships

3) A graph of inferred triples obtained by some reasoning process
    - I can combine 1) with several different 2)s and get very different 3)s

This would allow me to effectively add metadata to a graph of *inferred*
triples about the dataset from which they were derived, what semantics were
applied to generate them and which reasoner was used etc. Crucially, I
would be able to find out if a dataset has been through any sort of
reasoning process, including any reasoning process inherent in a SPARQL
endpoint.

Like I said, this seems to me to be in the scope of VoID, and thought it
was worth raising. Would be glad to hear your thoughts. If you think its
valid perhaps you could forward it to the others.

Best regards, 
Andrew
}}

Original issue reported on code.google.com by [email protected] on 13 Aug 2009 at 8:45