w3c / rdf-ucr Goto Github PK

Home Page: https://w3c.github.io/rdf-ucr/

License: Other

HTML 100.00%

rdf-ucr's Introduction

Specification 'rdf-ucr'

This is the repository for use cases for the RDF-star Working Group. It is being used to collect information on use cases and to drive the progress of the working group. A complete use case description will contain sufficient information about representation, querying, and reasoning to drive the activities of the working group and be checked against the group's recommendations.

A working group NOTE may be created from the use case descriptions.

Each use case will be assigned a member of the working group's Use Case Task Force to help the process of creating a complete use case description.

Contributing and Updating Use Cases

The first step in contributing a use case is to create an issue in this repository that will be used to track discussion on the use case. The use case issue should include at least a brief description of your use case, and you are encouraged to include more information. Comments in the use case issue will be used for discussion of the use case.

Each use case will have a wiki page that contains a clean version of the use case for those not interested in how it was developed.

All information submitted to this repository is publicly available. Please read CONTRIBUTING.md, about licensing contributions.

RDF-star Working Group Repositories

There are ten RDF 1.2 and twelve SPARQL 1.2 documents produced by the RDF-star Working Group.

RDF 1.2 Documents:

SPARQL 1.2 Documents:

The working group also has a use case repository collecting use cases for quoted triples and how they can be handled by RDF 1.2 and SPARQL 1.2.

Code of Conduct

W3C functions under a code of conduct.

rdf-ucr's People

Contributors

Stargazers

Watchers

Forkers

lisp

rdf-ucr's Issues

RDF-star for Wikibase/Wikidata

See https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Wikidata for a single document on this use case.

Contact information:

Your name: @Tpt
How to contact you: Contacts in my GitHub profile + Tpt on W3C IRC

Note that I am not representing Wikibase/Wikidata/Wikimedia in any way, I just wanted to describe this use case.

Brief Description of your use case:

Wikibase is the sofware that powers Wikidata. Wikibase is using its own data model but provides a RDF mapping. Wikibase contains a native reification system. Each main "snak" (aka triple) like "USA president JoeBiden" can be annotated with "qualifiers" like "start date January 20th 2021" or "predecessor DonaldTrump", "references" (i.e. blank nodes describing a source) and a "rank" (a processing annotation that can have three values "preferred"/"normal"/"deprecated"). Wikibase calls this full construction a "statement".

The current RDF encoding uses a specific RDF node to encode each statement. For example (Wikibase uses opaque identifiers, I have tweaked the RDF to make it more readable):

wd:USA a wikibase:Item ;
    p:president wd:JoeBidenPresidencyStatement wd:DonaldTrumpPresidencyStatement . # p:X are relations between a subject and a statement. The statement subject is the triple subject (here "USA) and the statement predicate is the relation predicate (here "president")

wds:JoeBidenPresidencyStatement a wikibase:Statement  ;
     ps:president wd:JoeBiden ; # ps:X are relations between a statement and an object. The statement object is the triple object (here "JoeBiden") and the statement predicate is the relation predicate (here "president")
     wikibase:rank wikibase:PreferredRank ;
     pq:start_date "2021-01-20"^^xsd:dateTime ; # A qualifier
     pq:predecessor wd:DonaldTrump ; # A qualifier
     prov:wasDerivedFrom wdref:a_reference , wdref:an_other_reference .

wds:DonaldTrumpPresidencyStatement a wikibase:Statement  ;
     ps:president wd:DonaldTrump ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "2017-01-20"^^xsd:dateTime ;
     pq:start_date "2021-01-20"^^xsd:dateTime .

wd:USA wdt:president wd:JoeBien . # For statements with the "best" rank a direct edges is inserted in the RDF with the "wdt:" prefix.

Note that in the previous example the wd:USA wdt:president wd:JoeBien direct triple have been generated because the statement rank is "preferred". Statements about the older presidencies also exists but have only the "normal" rank such that the direct triples are not generated.

Paper about Wikibase RDF encoding design Reifying RDF: What Works Well With Wikidata?

What you want to be able to do:

It would be great to provide a way to have nice RDF syntax to encode this use cases.

What is the role of RDF-star quoted triples in your use case:

They might be used to simplify the RDF encoding. For example one might hope to write:

<< wd:USA wd:president wd:JoeBiden >>  a wikibase:Statement  ;
     wikibase:rank wikibase:PreferredRank ;
     pq:start_date "2021-01-20"^^xsd:dateTime ;
     pq:predecessor wd:DonaldTrump ;
     prov:wasDerivedFrom wdref:a_reference , wdref:an_other_reference .

<< wd:USA wd:president wd:DonaldTrump >>  a wikibase:Statement  ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "2017-01-20"^^xsd:dateTime ;
     pq:start_date "2021-01-20"^^xsd:dateTime .

wd:USA wd:president wd:JoeBien .

Why it is hard or impossible to do what you want to do without quoted triples:

Wikidata needs reification to encode statements.

How you want quoted triples to behave in your use case:

(For example, do you want the precise syntax of subjects, predictes, and objects in quoted triples to be important?)

The RDF-star encoding written above is only valid if the existance of a quoted triple does not implies the assertion of the triple itself. Indeed we would like this to be in Wikidata RDF graph:

<< wd:USA wd:president wd:DonaldTrump >>  a wikibase:Statement  ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "2017-01-20"^^xsd:dateTime ;
     pq:end_date "2021-01-20"^^xsd:dateTime .

But the triple wd:USA wd:president wd:DonaldTrump should not be in the graph.

We also need to be able to distinguish two statements on the same base triple. We can't merge the following two statements because it would make the start date, end date pairs meaningless:

<< wd:Russia wd:president wd:VladimirPutin >>  a wikibase:Statement  ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "1999-12-31"^^xsd:dateTime ;
     pq:end_date "2008-05-07"^^xsd:dateTime .

<< wd:Russia wd:president wd:VladimirPutin >>  a wikibase:Statement  ;
     wikibase:rank wikibase:PreferredRank ;
     pq:start_date "2012-05-07"^^xsd:dateTime ;

An example RDF graph that shows part of your use case:

The Wikidata graph exposed by the Wikidata Query Service.

Representation of Language Tags in the Abstract Syntax

Provide sufficient information so that a member of the working group's Use Case Task Force can contact you and enhance your description so that it can be used by the working group to guide their activities. You do not have to fill out all the information requested.

** Contact information

Your name: Gregg Kellogg
How to contact you: @gkellogg

** Brief Description of your use case:

As an aggregator of RDF information, I want to have a predictable number of triples when parsing triples where literals may vary only in the case of the language tag element. I would also like the serialized (possibly canonicalized) form to use the BCP14 formatting recommendations, so that the language tag en-us might canonically be represented as en-US.

[ISO639-1] recommends that language codes be written in lowercase ('mn' Mongolian).
[ISO15924] recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).
[ISO3166-1] recommends that country codes be capitalized ('MN' Mongolia).

When aggregating data, input can be combined from different documents, where different conventions of formatting language tags are in use, leading the potential duplication of data.

*** What you want to be able to do:

When parsing a document that may be composed of several overlapping triples, I would like the resulting graph to have a unique abstract representation for otherwise equal language tags. As it is, the following Turtle can generate either one or two triples in the abstract representation, depending on if the implementation chooses to normalize language tags, e.g., to lower case.

_:a rdf:value "foo"@en-us, "foo"@en-US .

Implementations that normalize language tags will result in a single triple, those that do not will result in two triples.

*** What is the role of RDF-star quoted triples in your use case:

Not related to quoted triples.

*** Why it is hard or impossible to do what you want to do without quoted triples:

Not related to quoted triples.

*** How you want quoted triples to behave in your use case:
(For example, do you want the precise syntax of subjects, predictes, and objects in quoted triples to be important?)

From the start, RDF should have mandated a normalized form for language tags in literals, ideally based on BCP47 formatting. It would also be acceptable if all parsers normalized language tags to lower case for the abstract representation. Concrete syntaxes which can perform canonicalization could then require a particular form for language tags without danger of potentially serializing different graphs, depending on how they were parsed on input.

*** An example RDF graph that shows part of your use case:

_:a rdf:value "foo"@en-us, "foo"@en-US .

If changed to require normalizing to lower case, this would be the same as the following:

_:a rdf:value "foo"@en-us .

N-Triples/N-Quads canonicalization could then either represent using that lower case form, or use BCP47 formatting.

rdf-ucr NOTE repo should be added to the common rdf-star README.md

RDF 1.2 NOTEs section should then be parallel to the RDF 1.2 recommendations
rdf-ucr note should be listed in RDF 1.2 NOTEs section
Any other RDF 1.2 Notes should be added to the RDF 1.2 NOTEs section
Common README.md should be updated across all RDF 1.2 & SPARQL 1.2 repos

RDF‐star for Artsdata.ca

See https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Artsdata.ca for a clean version of this use case.

Contact information

Your name: Gregory Saumier-Finch
How to contact you: [email protected]

Brief Description of your use case

Artsdata.ca is a Knowledge Commons (ODI pattern) project. Artsdata.ca aims to empower the Canadian arts sector to actively promote a more fair and equitable digital ecosystem. Visit the website for more information http://kg.artsdata.ca.

This use case seeks to represent information typically found in the cultural sector about performing arts productions, works, upcoming events, people, places and organizations. It also aims to carefully track provenance from multiple data sources (due to numerous sector wide community contributors) and organize the information, which at times may be conflicting and overlapping, into a an actionable knowledge graph that facilitates data re-use.

What you want to be able to do

The system should capture statements about upcoming events, works, people, places, organizations and metadata about each statement. Metadata includes provenance and other meta information. Specifically, the system needs to be able to distinguish between identical statements (identical subject-predicate-object triples) in order to capture metadata for each.

What is the role of RDF-star quoted triples in your use case:

RDF-star has a role in recording metadata on triples and is used in conjunction with named graphs. Artsdata.ca uses a combination of named graphs and RDF-star in order to manage triple metadata.

Artsdata.ca uses Named Graphs to group together triples from external data feeds. These graphs are replaced frequently and are in sync with the external data feed. In other words, the graphs are secondary storage in sync with the master data of the external source. Triples can only be added, updated and deleted in the external data source. There can be automated transformations when ingesting the external data source, but the data always flows from the external data source to the internal data source, not the other way around. Edits are never made in the Artsdata graph, and nothing flows back out to the external source.

Further, in order to facilitate data re-use, Artsdata.ca provides several services. Examples of services that Artsdata.ca provide include an IRI minting service, a reconciliation service and a query service (SPARQL). In Artsdata.ca, triples across graphs can have identical subject-object patterns with sometimes different or even contradictory values in the object position. Therefore, in order to provide the said services, an internal selection of triples is made based on a common data model, data shape conformance, and several other factors. This is similar to the Wikidata best rank approach (Wikidata triples using the predicate prefix wdt:). Select triples are copied into a separate graph to avoid being deleted or changed when those graphs are synced with the data source. These select triples form a collection of master data graphs that are unique to Artsdata.ca. RDF-star is used to track metadata of these triples inside the master data graphs.

In other words:

RDF-star is used to record metadata on select triples used for minting new URIs, and deemed authoritative for answering questions posed to the knowledge graph.
Named graphs are used to record provenance metadata on large collections of statements synced with external sources and manage their lifecycle.

Why it is hard or impossible to do what you want to do without quoted triples:

When answering a query to Artsdata, the JSON-LD response needs to have easy-to-parse metadata about individual triples. The client application (front-end) will want to display this metadata next to each triple (like a footnote). Metadata should be nested within the relevant objects.

AFAIK graphs cannot be added within graphs. Otherwise I could use graphs to track metadata inside an existing graph.

It is important that this metadata be part of the triples, exported with the triples, and copied over when the triples are copied, not something that is non-standard like a named graph which requires quads.

Turtle syntax is used for many activities including importing and exporting triples.

In this use case JSON-LD-Star will need to be used with JSON-LD Framing to create the tree layout needed to meet different client needs.

How you want quoted triples to behave in your use case:

Example of an event with 2 locations from 4 datasets. Each dataset has provenance metadata such as source website, downloadUrl and dateCreated. Datasets have their most recent version loaded into a separate named graph.

adr:Event1  a schema:Event .

<< adr:Event1 schema:location  adr:Place1 >>  prov:wasDerivedFrom  adr:Dataset#1, adr:Dataset#2 .

<< adr:Event1 schema:location  adr:Place2 >>  prov:wasDerivedFrom  adr:Dataset#3, adr:Dataset#4 .

An algorithm is run to assert the best location for this event, knowing that only a single location should be chosen. The location adr:Place1 is selected and asserted in the graph because it is the more precise location (Place1 is the theatre within the arts centre building Place2). The triple was asserted by adr:AssertionAlgorithm#1. The AssertionAlgorithm#1 looks for the best place for the event using several factors including containedInPlace relations and ranking of the source dataset.

The other location is claimed by datasets 3, 4 but is not asserted in the graph.

Here is the result:

adr:Event1  a schema:Event ; 
    schema:location  adr:Place1 .

<< adr:Event1 schema:location  adr:Place1 >>  prov:wasDerivedFrom  adr:Dataset#1, adr:Dataset#2 ;
      prov:wasGeneratedBy [ a prov:Activity ;
               prov:used adr:AssertionAlgorithm#1 ;
               prov:endedAtTime "2023-09-20T20:00:00-04:00"^^xsd:dateTime ] .

<< adr:Event1 schema:location  adr:Place2 >>  prov:wasDerivedFrom  adr:Dataset#3, adr:Dataset#4 .

An example RDF graph that shows part of your use case:

The GraphDB Workbench https://db.artsdata.ca/sparql

Use case "RELEVEN STructured Assertion Record model"

This non-trivial use case is work in progress.

Re-Evaluating the Eleventh Century with Linked Events and Entities
Tara Andrews (@tla), Carla Ebel, Nina Richards (@NinaBrundke)
Linked Pasts 7, Ghent (online), 16.12.2021
[Presentation, 73 pages]
https://ucloud.univie.ac.at/index.php/s/jhJ60pyadwDmPg8

Page 10:

Digital Humanities/LOD:
– While LOD triples are useful to most researchers, historians work with conflicting ideas
– Factoids, stored as triples, too positivistic
– Need to capture data beyond factoids

Page 11:

Aim: develop and test new ways how digital data about historical phenomena can be created and curated
● Aim: find a way to map conflicting data
● -> make data most helpful and usable for historians
● By using the STAR model

Page 13:

STructured Assertion Record model
● As modified LOD model
● Subject, predicate, object
● Source: primary source material/provenance
● Authority: Scholar who interprets the source

Page 14:

It allows to make assertions
● Link subject, predicate and object to sources and authorities
● The authority of one assertion can be the subject of another assertion
● → different views can be mapped, visualized, exported

RDF-star for labelled property graphs

See https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-labelled-property-graphs for the current status of this use case.

Taken from w3c/rdf-star#33

** Brief Description of your use case:

As a KG vendor, we want Stardog customers to have easy to use means to attach properties to edges in their RDF graph or load property graph data with edge properties. Here "easy" specifically means that neither the customer nor the database should have to wreck the data model (and queries) to use any of the workarounds available in plain RDF for that purpose (like the RDF reification).

*** What you want to be able to do:

We want to be able to easily assert properties on edges and query them using SPARQL.

Also we want to enable customers to store that annotated statement in any named graph they want so we don't want to use named graphs for representing statement-level metadata.

*** What is the role of RDF-star quoted triples in your use case:

RDF quoted triples will be the subject of properties on edges.

*** Why it is hard or impossible to do what you want to do without quoted triples:

Using RDF reification or other approaches requires changes to the data model and, particularly, complex SPARQL queries to retrieve the data.

Regarding named graphs, there's a very simple argument why we want to keep both annotated triples and named graphs. We regularly see people wondering if they should manage different parts of their data in i) separate datasets (i.e. separate physical databases inside a server instance) ii) separate named graphs inside one dataset. There are pros and cons to both. Sometimes the choice isn't clear.

So far they've been able to just take data stored in the default graph of database X and move it into a named graph inside Y. Importantly, they won't need to change queries (or apps), they only need a connection string to a different database and a different query dataset (ie. FROM in SPARQL). The latter can be defined outside of queries as defined in the SPARQL Protocol. Now, we don't want RDF-star to limit that flexibility: if you want to take a bunch of triples with annotations and move them into a named graph, that should be similarly easy.

*** How you want quoted triples to behave in your use case:

*** An example RDF graph that shows part of your use case:

For example, if the customer has :pavel :worksAt :Stardog edge in the data and wants to add ... :since 2011 to it, neither they nor the database should have to transform it into a bunch of different triples like [] rdf:subject :pavel ; rdfs:predicate :worksAt ... (and then also rewrite queries so that ?s :worksAt :Stardog still returns :pavel).

As a further example we want to be able to have data like

<< :a skos:closeMatch :b >> :score 0.9

and queries like

?x a :Type1 .
?y a :Type2 .
<< ?x skos:closeMatch ?y >> :score ?score
``
with subsequent filtering and aggregation on ?score.

Add rdf-ucr to list of WG-repositories

https://github.com/w3c/rdf-ucr is missing from this list:
https://www.w3.org/groups/wg/rdf-star/tools

Integrating different ontology designs through entailment upon triple terms

[This is a recurring usage pattern suggested to be captured as a UC in telecon 240725.]

Situation

With RDF-star, it becomes possible to model data using simple binary relationships, and incrementally capture more concrete circumstances behind them, described as resources who reify (with rdf:reifies) the simple relationships (encoded as triple terms).

Example:

<Alice> :bought <SomeComputer> {| :date "2014-12-15"^^xsd:date |} .

being shorthand for:

<Alice> :bought <SomeComputer> .
_:r1 rdf:reifies <<( <Alice> :bought <SomeComputer> )>> ;
    :date "2014-12-15"^^xsd:date .

When concrete circumstances are known to be needed beforehand, this case is usually instead modeled differently, e.g. as an N-ary relationship, following the general relationship or association class design.

That could look like:

<purchase1> a :Purchase ;
    :date "2014-12-15"^^xsd:date ;
    :buyer <Alice> ;
    :seller <ComputerStore> ;
    :item <SomeComputer> ;
    :cost 2500 ;
    :currency :USD .

This is sometimes perceived as an obstacle in RDF, since the simpler form is often more desirable, and may initially appear good enough. Only later on are further needs of detail discovered (perhaps in production, when change is expensive). (See e.g. PROV-O and examples thereof in #23.)

Requirement

In the above case, in order to remove the need for remodeling the first example (such as when integrating it with the data in second), it would be beneficial to be able to use semantic technology (OWL) to map these models, to avoid a rewrite of existing queries and readjustments by external data consumers.

Since triple terms are new in RDF 1.2, to be able to do so using the existing OWL 2 standard and tooling, a rule for triple terms to entail its constituent properties appears to be needed.

This has been proposed before by @Antoine-Zimmermann as "RDF-reification interpretations", which seems adequate:

An ^azinterpretation 𝓘 = (Δ, 𝓟, 𝓘S, 𝓘L, 𝓘T, 𝓘EXT, rs, rp, ro) is an ^azRDF-reification interpretation if it additionally satisfies the following:
𝓘S(rdf:subject) = rs;
𝓘S(rdf:predicate) = rp;
𝓘S(rdf:object) = ro.

(That was referenced in the "Seeking Consensus" table from 2024-01.)

Note

For this use case, it is not necessary to reuse the "classic reification" terms. It seems frugal to do so, but if there are reasons for minting new properties, that should also work with the suggested solution.

Solution

Here follows an example of how to map the two models above.

Entailed relationship description

Let's assume that the suggested entailment is part of RDF semantics. Let us also name the reifying resource as <purchase1>.

Then the reification triple above:

<purchase1> rdf:reifies <<( <Alice> :bought <SomeComputer )>> .

entails:

<purchase1> rdf:reifies [ rdf:subject <Alice> ; rdf:predicate :bought ; rdf:object <SomeComputer> ] .

Ontology

Define a "rolification" property (used to match class membership in property chains):

:Purchase rdfs:subClassOf [ owl:onProperty _:RolifiedPurchase ; owl:hasSelf true ] .

Define a class for the :bought relationship, additionally tied to another "rolified" property:

_:BoughtRelationship owl:equivalentClass [ owl:onProperty rdf:predicate ;
                                            owl:hasValue :bought ] ;
    rdfs:subClassOf [ owl:onProperty _:RolifiedBoughtRelationship ; owl:hasSelf true ] .

Define subproperty chain axioms:

_:boughtRelation rdfs:subPropertyOf rdf:reifies ;
    owl:propertyChainAxiom (_:RolifiedPurchase rdf:reifies _:RolifiedBoughtRelationship) .

:buyer rdfs:domain :Purchase ;
    owl:propertyChainAxiom (_:boughtRelation rdf:subject) .

:item rdfs:domain :Purchase ;
    owl:propertyChainAxiom (_:boughtRelation rdf:object) .

Entailed purchase description

From this it is entailed that <purchase1> is a full-fledged :Purchase entity:

    <purchase1> a :Purchase ;
        :date "2014-12-15"^^xsd:date ;
        :buyer <Alice> ;
        :item <SomeComputer> .

(See live example using the OWL-RL online service.)

Note

The <purchase1> could also reify <Alice> :shoppedAt <ComputerStore> (making it another case for many-to-many); which with more rules can entail <purchase1> :seller <ComputerStore>.

See full examples in this gist.

Remarks

Conversely, with a chain axiom defined for the simple property:

:bought owl:propertyChainAxiom ( [ owl:inverseOf :buyer ] :item ) .

the simple relationship can be entailed from the purchase description:

<Alice> :bought <SomeComputer> .

Note

Note that full rdf:reifies relationships are not entailed in the converse example. That may be beyond what is achievable, and should be taken into account when modeling in use cases like this.

RDF-Star: Some biological database use cases

See https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-explanation-and-provenance-in-biological-data for a clean version of this use case

** Contact information

Your name: Jerven Bolleman
How to contact you: https://www.sib.swiss/directory

** Brief Description of your use case:

In UniProt we want to refer to triples to explain or "attribute" why they where added to the UniProtKB graphs.
These triples are always asserted and we might have multiple explanations/attributions or none at all.
The explanations and attributions are themselves complicated resources named by an IRI.

At this moment we use RDF reification with consistent IRI's for each triple.

<Q14739#SIP9E6E0C5B850FBF4F> up:fullName "3-beta-hydroxysterol Delta (14)-reductase" .
<#_9297DC3B72792B1A_up.fullName_D4E77F494F58CEA9> rdf:type rdf:Statement ;
  rdf:subject <Q14739#SIP9E6E0C5B850FBF4F> ;
  rdf:predicate up:fullName ;
  rdf:object "3-beta-hydroxysterol Delta (14)-reductase" ;
  up:attribution <Q14739#attribution-XX> .

Q14739#attribution-XX> up:manual true ;
  up:evidence ECO:0000303 ;
  up:source citation:16784888 .

This syntax is inconvenient and also hard to optimize in general. This is important when the RDF graphs are 100+ billion triples in size.

The example above is evidence to support why a certain protein is described with a specific name.

As the data is extremely large we can not afford to maintain mappings that depend on order of visitation inside a single file to derive an temporary IRI. (e.g. in RDF/XML rdf:ID uniqueness constraint is violated and expensive to check for in UniProt when using it for reification quads). In other words, the identity function for deriving an id for a triple should be stateless and allowed to be invoked multiple times, we should not be forced to gather all triples for using a triple reference into one co-localized set.

Our use-cases for un-asserted triples are extremely rare and would preferably be described explicitly as "inversions" of the normal case, or explicit non-membership of an class. e.g. something like the following

uniprot:P1 owl:disjointWith <things_named_X> .
<<uniprot:P1 owl:disjointWith <things_named_X> >> rdfs:comment "P1 does not have an Xthingy so should not be called an X" .

For other databases we might want to do things like .

ex:1 ex:likes ex:2 .
<< ex:1 ex:likes ex:2 >> ex:confidence ex:high .

and then use the "star" syntax for quickly selecting the triples we have a high confidence for.

*** What you want to be able to do:

Talk about why triples are added to the dataset and how confident our users should be in trusting them.

*** What is the role of RDF-star quoted triples in your use case:

Quoted triples (or content identified triples) would replace the usecase for rdf reification by allowing a more convenient and clearer way to talk about "edges" in an RDF graph.

*** Why it is hard or impossible to do what you want to do without quoted triples:

Reification, not only is a lot of typing to get right. It is also difficult to optimize in the general case for SPARQL engines.

*** How you want quoted triples to behave in your use case:
(For example, do you want the precise syntax of subjects, predicates, and objects in quoted triples to be important?)

They must be transparent for owl reasoning. UniProt is re-used and re-mixed in many different end user databases. In these they might use different identifiers and map them with owl:sameAs. e.g. often http://identifiers.org/uniprot/X owl:sameAs http://purl.uniprot.org/uniprot/X. Given that sameAs relation all queries should be able to use either of these identifiers and get the "same" result.

SELECT * WHERE { << <http://purl.uniprot.org/uniprot/X> ?p ? o >> }

and

SELECT * WHERE { << <http://identifiers.org/uniprot/X> ?p ? o >> }

must return the same results in an owl:sameAs aware setting.

RDF-star for recording commit deltas

See https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-recording-commit-deltas-to-an-RDF-graph for the current status of this use case.

Taken from https://w3c.github.io/rdf-star/UCR/rdf-star-ucr.html#annotate-commit-deltas

I want to annotate commit deltas to an RDF graph, e.g.:

r:47e1cf2 a :Commit ; 
     :delete <<:bob :age 23>> ;
     :add <<:bob :age 24>>, <<:bob :gender :male>> .

So that a triple can be searched for across commit history in SPARQL-star.

RDF-star for CIDOC-CRM events

See https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-CIDOC-CRM-events for the current description of this use case.
Taken from #12 (comment)

** Brief Description of your use case:

Quoted triples may be useful for representing events using a method similar to that used in labelled property graphs. For example, a marriage is an event that can be represented as a quoted triple (usually asserted as well) with added information, such as place, time, officiant, etc.

*** What you want to be able to do:

Represent the effect of an event along with other information about it, so that a marriage asserts the marriage between two people and the other information is attached to the quoted triple. This more closely couples the state information with the event information.

*** What is the role of RDF-star quoted triples in your use case:

The quoted triple is used as the subject of the information associated with the event.

*** Why it is hard or impossible to do what you want to do without quoted triples:

Without quoted triples events have to be represented using something like RDF reification, where there is a node (often a blank node) for the event that is linked to information about the event. The effect of the event is then a separate triple, and the connection between the event and its effect is not part of the RDF model.

*** How you want quoted triples to behave in your use case:

What matters in events is the objects involved, so using different names for the same object should not affect the event.

*** An example RDF graph that shows part of your use case:

Here is an example of how CIDOC-CRM is used to represent events without using quoted triples, here the birth and castration of an historical figure. In this example genders are classes, not values of a gender property.

@prefix ex: <http://example.org/tla/ontologies/2023/3/gender-assignment/> .
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://example.org/tla/ontologies/2023/3/gender-assignment> .

<ex:Gender_Eunuch> rdf:type <crm:E55_Type> ;
                   rdfs:label "Eunuch" .

<ex:Gender_Male> rdf:type crm:E55_Type ;
                 rdfs:label "Male" .

<ex:Ioannes_68> rdf:type crm:E21_Person ;
                rdfs:label "John the Orphanotrophos" .

<ex:Paphlagonian_family> rdf:type crm:E74_Group ;
                         rdfs:label "Family of John, Michael, & brothers" .


<ex:Ioannes_68_gender_birth> rdf:type crm:E17_Type_Assignment ;
                             crm:P14_carried_out_by <ex:Paphlagonian_family> ;
                             crm:P183_ends_before_the_start_of <ex:Ioannes_68_gender_castration> ;
                             crm:P41_classified <ex:Ioannes_68> ;
                             crm:P42_assigned <ex:Gender_Male> ;
                             rdfs:label "Birth gender assignment" .

<ex:Ioannes_68_gender_castration> rdf:type crm:E17_Type_Assignment ;
                                  crm:P14_carried_out_by <ex:Paphlagonian_family> ;
                                  crm:P41_classified <ex:Ioannes_68> ;
                                  crm:P42_assigned <ex:Gender_Eunuch> ;
                                  rdfs:label "Castration gender assignment" .

Here is the same information utilizing quoted triples.

@prefix ex: <http://example.org/tla/ontologies/2023/3/gender-assignment/> .
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://example.org/tla/ontologies/2023/3/gender-assignment> .

<ex:Gender_Eunuch> rdf:type <crm:E55_Type> ;
                   rdfs:label "Eunuch" .

<ex:Gender_Male> rdf:type crm:E55_Type ;
                 rdfs:label "Male" .

<ex:Ioannes_68> rdf:type crm:E21_Person ;
                rdfs:label "John the Orphanotrophos" .

<ex:Paphlagonian_family> rdf:type crm:E74_Group ;
                         rdfs:label "Family of John, Michael, & brothers" .


<< ex:Ioannes_68 rdf:type ex:Gender_Male >> crm:P14_carried_out_by <ex:Paphlagonian_family> ;
                             crm:P183_ends_before_the_start_of << ex:Ioannes_68 rdf:type ex:Gender_Eunuch >> ;
                             rdfs:label "Birth gender assignment" .
                             
ex:Ioannes_68 rdf:type ex:Gender_Eunuch {| crm:P14_carried_out_by <ex:Paphlagonian_family> ;
                                  rdfs:label "Castration gender assignment" .

Representing triple origin information during Federated SPARQL querying

See https://github.com/w3c/rdf-ucr/wiki/Capturing-triple-origin-in-SPARQL-star for a version of this use case.

** Contact information

Your name: Ruben Taelman
How to contact you: [email protected]

** Brief Description of your use case:

When executing a Federated SPARQL Query (i.e., a query across multiple SPARQL endpoints), users may want to know which sources contributed to which query results.

*** What you want to be able to do:

When executing a Federated SPARQL Query, I want to annotate triples with the source they originate from.

*** What is the role of RDF-star quoted triples in your use case:

For example, the following query could produce all triples with corresponding ?source URL.

SELECT * WHERE {
  ?s ?p ?o.
  << ?s ?p ?o >> :federatedSource ?source.
}

*** Why it is hard or impossible to do what you want to do without quoted triples:

This could be achieved using named graphs, but semantics may clash with other usages of named graphs.

*** How you want quoted triples to behave in your use case:

(For example, do you want the precise syntax of subjects, predictes, and objects in quoted triples to be important?)

N/A

*** An example RDF graph that shows part of your use case:

N/A

Similar to the "Combination of RDF-star and graph-level metadata (named graphs)" use case, this use case has as limitation that it's not possible to annotate triples inside named graphs.
For instance, the following may be desired by users, but this is not possible given the restriction of RDF-star to only annotate triples:

SELECT * WHERE {
  GRAPH ?g { ?s ?p ?o }.
  << GRAPH ?g { ?s ?p ?o } >> :federatedSource ?source.
}

If extending RDF-star to named graphs is not desired, then this limitation could be worked around as follows (alternatives may be possible):

SELECT * WHERE {
  ?s ?p ?o.
  << ?s ?p ?o >> :federatedSource [ :federatedSourceUrl ?source, :federatedSourceGraph ?g ] .
}

RDF-star for contextualizing historical assertions

See https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-contextualizing-historical-assertions for the current status of this use case.

Contact information

Tara Andrews, University of Vienna
[email protected]

Brief Description of your use case:

In the RELEVEN project (https://releven.univie.ac.at/) we are developing a data model for recording contextual information about triples, so that users of our datastore can understand who has asserted the information in the triples and what they are basing these assertions on. (We actually call this the STAR model, for STructured Assertion Record, since the project proposal was submitted in February 2020 and we hadn't heard of RDF* yet...)

What you want to be able to do:

The motivation is to record information about historical figures even when the information we have is contradictory and cannot be definitively resolved one way or another (which, after all, frequently happens), and to be able to do this without causing naive validation errors in ontology-based software.

Why it is hard or impossible to do what you want to do without quoted triples:

At the moment our model is based on regular RDF, and specifically on the CIDOC-CRM entity E13 Attribute Assignment. The subject and object of the original triple become the objects of relationships P140 assigned attribute to and P141 assigned respectively, and the predicate is reified as an E55 Type to become the object of relationship P177 assigned property type. We can then use the predicates P14 carried out by to indicate who is responsible for the assertion, and P17 was motivated by to indicate the evidence for the assertion (e.g. a text passage, an inscription on an object, or even another assertion).

While our current approach works fine for representing the data, we don't have a good way to validate the content of the attribute assignments - that is, to make sure that the subject, reified predicate, and object conform to the specification of the ontology. The range of the predicates P140 and P141 is sent intentionally broadly (to E1 CRM Entity) and the range of P177 can likewise be any reified predicate. For the time being we have to handle this in the application logic, taking care not to allow users to create assertions of invalid triples.

What is the role of RDF-star quoted triples in your use case:

The role of quoted triples would be to allow us to have the validation on the base triples, and still be able to attach the information about authority and context that we currently express via P14 and P17 properties.

How you want quoted triples to behave in your use case:

I don't understand this question well enough to be able to answer.

An example RDF graph that shows part of your use case:

The attached image shows a pair of gender assignments of Konstantinos Doukas; according to Anna Komnene he was male (presuambly from birth) but according to Michael the Syrian, he was castrated (thus assigned to the eunuch category) sometime during the reign of the emperor Botaneiates.

Cataloguing Use Cases From The National Library Of Sweden

See https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Annotations-as-Miscellaneous-Marginalia
https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Detailed-Provenance-in-Cooperative-Union-Cataloguing
https://github.com/w3c/rdf-ucr/wiki/Describing-a-Union-of-Changes-to-a-Named-Graph for clean versions of scenarios for this collection of use cases.

Contact information

Name: Niklas Lindström (@niklasl)
Organization: National Library of Sweden

Brief Description of our Use Cases

The National Library of Sweden serves the Swedish cooperative union catalog (Libris), which has different audiences both nationally and internationally. To overcome the silo effect of old technology, and to interoperate with different metadata standards, we have developed a cataloging system based on RDF, using linked vocabularies and datasets.

We have encountered a set of overlapping use cases In this catalog, based on needs for descriptive metadata, and by extension projects and data pipelines depending upon that. We believe that RDF-star may provide an effective means for dealing with these cases.

What we want to be able to do

Manage and reason (in the general sense) about descriptions, as sets of triples, about library resources; in whole or part, including suggestions and redactions thereof.
Reason about the versions of these descriptions.
Establish and maintain (teach) descriptive best practices to work for convergent datasets across libraries and other knowledge institutions, utilizing effective interoperability of shared metadata.

Why it is hard or impossible to do what we want to do without quoted triples

RDF Statement reification could be used, but is unwieldy, especially in order to keep annotations coordinated with assertions. There is no syntax support for it apart from rdf:ID on predicate elements in RDF/XML.

We use named graphs for effectively working with "record"-sized set of facts, in a single source (our system), commonly about one main entity. But for detailed provenance they are too coarse-grained. Multiple such "records" about the same thing are hard to succinctly display and edit as a combination of description sources. Also, since named graphs have no defined formal semantics (neither what the name denotes, nor what is considered in the union graph of a published dataset), formal interoperability isn't possible today.

Thus, RDF-star annotations appear to fill the gap here, but their semantics remain to be tested in practice.

Various patterns for qualification conflate metadata sources (triples as "occurrences of facts"), logical facts (the statement which has a truth value) and the events or entities that these facts conceptually describe. This is the kind of "creative modelling" that tends to lead to divergent practices and weak interoperability across applications.

If RDF-star semantics can work to clarify and unify design patterns here this would be a major argument in its favor.

What is the role of RDF-star quoted triples in our use cases

Detailed provenance.
Proposed facts.
Additional "marginalia" of detailed or obscure facts that don't fit the fixed shape of a specific application profile.
Aggregate views of historical facts.

How we want quoted triples to behave in our use cases

As far as I can see, referentially transparent (at least for annotations).

We don't need referential opacity for quoted triples since we treat owl:sameAs (and owl:differentFrom if ever used) to be about the reference, not the sense (as in Sense and reference). We are very careful of ingesting data using owl:sameAs because of that, due to the obvious risk for conflation of identity it entails.

In the same way, no opacity is needed to prevent datatype entailment on quoted triples. Any encoded, lexical representation difference is an implementation detail, and not a semantically relevant difference. ("Provenance" here is about "who said what, where", not "how (was it encoded)". The moment a quote occurs in a graph, it is expressed within that context.)

We mainly need "opacity", as in "separate worlds", between graphs, until we deem them truthful and put them in the union graph. Quotation of suggested assertions are enough, we only "let in" owl:sameAs assertions that we are certain of are aliases of the exact same identity. (I'm not sure even these has to be referentially opaque in the linguistic sense; which seems to be supported by Carroll, Bizer, Hayes, Stickler - Named Graphs (2005), notably p. 6 and 7, along with this email.)

Of course, this differs from the view in the CG report, and we need to work out if our use cases would work the same in either interpretation.

Example RDF graphs that shows parts of our use cases

I have added draft scenarios with example data to the wiki:

[EDIT: fixed links broken when pages were moved]

Use cases compiled by CG

Several interesting use cases were compiled by the RDF-star CG:

Talking About Multiple Triples at Once

Contact information

Name: Niklas Lindström (@niklasl)
Organization: National Library of Sweden

Brief Description

I want to describe a bunch of triples together — often describing one resource or a chain thereof — succinctly, for instance to assign when and where their occurrences where discovered.

What you want to be able to do

Assert provenance (and possibly other marginalia) about multiple triples from a common source. Often, as in the case of RDF lists or blank nodes, these triples share a subject or are chained together, comprising an "integral subgraph", if you will (or a rooted tree in graph theoretical terms).

What is the role of RDF-star quoted triples in your use case

It is at odds with current practises of using named graphs for this. It theoretically will provide missing semantics, which is promising. But in its current design (in the CG report) it becomes unwieldy, both syntactially, and in its reliance upon types, not tokens, for what is expressed.

Since a triple term denotes itself, any connection to an occurrence must be through an explicit relation, and not be a fact about the abstract triple itself, which is mathematically platonic in nature.

Also, using blank nodes is not uncommon in these cases, which raises other questions (e.g. how to quote an RDF list, or a "person named Alice born in 1852"). Representing that as disjoint quoted triples quickly becomes as untenable for humans as is reading NTriples.

Why it is hard or impossible to do what you want to do without quoted triples

It is not impossible, using named graphs. But the semantics thereof are undefined, and storing this as multiple named graphs today is cumbersome, implementation-dependent and requires assumptions of interpretations to hold.

How you want quoted triples to behave in your use case

I have not seen any practical cases where opacity is required for a combination of asserted and quoted, i.e. annotated data. For unasserted "suggestions" in our real use cases we would require transparent semantics (to be able to navigate to and understand the suggestions).

I would ideally be able to quote all constituent parts of the blank node expressions below. Otherwise, only the arc with the blank node would be quoted, and lots of "dangling triples" would be in the asserted graph.

The problem of quoted bnodes with lots of "dangling, asserted facts" might be handled by user convention, along the lines of "all bnodes only linked to from a quoted triple are to be practically taken as belonging to the quote". But that is cumbersome and brittle.

It is conceivable that other use cases would prefer to "quarantine" chunks from external sources or automatically computed suggestions (e.g. using machine learning). We would use actual literals for that, probably in combination with blank nodes (thus increasing the number of triples in the chunk). But if named graphs where to have conditional "opacity" (if they are "accepted" or treated separately from the active interpretation), this would be a useful alternative. (Literals of course allow for quoting only certain subjects or objects, for instance.)

Example 1: annotating a description of something unknown

To quote something described but unknown, you can do this in Notation 3:

<charlesdodgson> :says { [] :name "Alice" ; :birthDate "1852" } .

This in TriG:

<charlesdodgson> :says _:g1 .
_:g1 { [] :name "Alice" ; :birthDate "1852" }

But in Turtle-star, you have to do this:

<charlesdodgson> :says << _:b1 :name "Alice" >> , << _:b1  :birthDate "1852" >> .

Example 2: Annotating Chunks of Triples

This is bad practise (since an abstract triple is not an occurrence in itself):

  << _:b1 :givenName "Alice" >> dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
    dc:date "2023-10-23" .
  << _:b1 :familyName "Liddell" >> dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
    dc:date "2023-10-23" .
  << _:b1 :birthDate "1852-05-04" >> dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
    dc:date "2023-10-23" .

This is more correct:

<< _:b1 :givenName "Alice" >> rdfg:subGraphOf _:d1 .
<< _:b1 :familyName "Liddell" >> rdfg:subGraphOf _:d1 .
<< _:b1 :birthDate "1852-05-04" >> rdfg:subGraphOf _:d1 .
_:d1 dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
  dc:date "2023-10-23" .

Given RDF 1.1 Semantics, which defines:

A subgraph of an RDF graph is a subset of the triples in the graph. A triple is identified with the singleton set containing it, so that each triple in a graph is considered to be a subgraph.

The above is OKish but not 1:1, since a triple identified does not (necessarily) mean denoted. Cf. (from the same section):

For example, an IRI used as a graph name identifying a named graph in an RDF dataset may refer to something different from the graph it identifies.

This is already possible, but means something else(?):

_:d1 {
  [] :givenName "Alice" ;
    :familyName "Liddell" ;
    :birthDate "1852-05-04" .
}
_:d1 dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
  dc:date "2023-10-23" .

Example 3: RDF Lists

Unsurprisingly the cons nature of ordered lists as triples unravel in the seams here.

You cannot easily quote the entire list, just its association. So this:

<report> bibo:authorList (<a> <b> <c>) {| dc:source <a> |} .

Means this:

<report> bibo:authorList _:l0 .
<< <report> bibo:authorList _:l0 >> dc:source <a> .
_:l0 rdf:first <a>; rdf:rest (<b> <c>) .

Instead of the preferred:

<report> bibo:authorList (<a> <b> <c>) .
_:g1 { <report> bibo:authorList (<a> <b> <c>) }
_:g1 dc:source <a> .

Here is a combo of one "chosen" list and a "suggested" list, using suggested new syntax for unasserted, annotated triples:

<report> bibo:authorList (<a> <b> <c>) {| dc:source <a> ; ex:disputedBy <c> |} ,
  -- (<c> <b> <a>) {| dc:source <c> |} |} .

Preferably meaning:

<report> bibo:authorList (<a> <b> <c>) .
_:g1 { <report> bibo:authorList (<a> <b> <c>) }
_:g1 dc:source <a> ; ex:disputedBy <c> .
_:g2 { <report> bibo:authorList (<c> <b> <a>) }
_:g2 dc:source <c> .

w3c / rdf-ucr Goto Github PK

rdf-ucr's Introduction

Specification 'rdf-ucr'

Contributing and Updating Use Cases

RDF-star Working Group Repositories

RDF 1.2 Documents:

SPARQL 1.2 Documents:

Code of Conduct

rdf-ucr's People

Contributors

Stargazers

Watchers

Forkers

rdf-ucr's Issues

Contact information:

Brief Description of your use case:

What you want to be able to do:

What is the role of RDF-star quoted triples in your use case:

Why it is hard or impossible to do what you want to do without quoted triples:

How you want quoted triples to behave in your use case:

An example RDF graph that shows part of your use case:

Contact information

Brief Description of your use case

What you want to be able to do

What is the role of RDF-star quoted triples in your use case:

Why it is hard or impossible to do what you want to do without quoted triples:

How you want quoted triples to behave in your use case:

An example RDF graph that shows part of your use case:

Situation

Requirement

Solution

Entailed relationship description

Ontology

Entailed purchase description

Remarks

** Contact information

** Brief Description of your use case:

*** What you want to be able to do:

*** What is the role of RDF-star quoted triples in your use case:

*** Why it is hard or impossible to do what you want to do without quoted triples:

*** How you want quoted triples to behave in your use case:

*** An example RDF graph that shows part of your use case:

Contact information

Brief Description of your use case:

What you want to be able to do:

Why it is hard or impossible to do what you want to do without quoted triples:

What is the role of RDF-star quoted triples in your use case:

How you want quoted triples to behave in your use case:

An example RDF graph that shows part of your use case:

Contact information

Brief Description of our Use Cases

What we want to be able to do

Why it is hard or impossible to do what we want to do without quoted triples

What is the role of RDF-star quoted triples in our use cases

How we want quoted triples to behave in our use cases

Example RDF graphs that shows parts of our use cases

Talking About Multiple Triples at Once

Contact information

Brief Description

What you want to be able to do

What is the role of RDF-star quoted triples in your use case

Why it is hard or impossible to do what you want to do without quoted triples

How you want quoted triples to behave in your use case

Example 1: annotating a description of something unknown

Example 2: Annotating Chunks of Triples

Example 3: RDF Lists

Recommend Projects

Recommend Topics

Recommend Org