vladimiralexiev / rdf2rml Goto Github PK

View Code? Open in Web Editor NEW

37.0 5.0 5.0 21.42 MB

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Batchfile 0.12% Ruby 26.54% Shell 1.79% Perl 63.77% Makefile 5.44% C 2.34%

rdfpuml plantuml rdf2rml r2rml r2rml-mapping uml-diagram visualization graphviz

rdf2rml's Introduction

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Introduction
Installation
- Docker Image
Debian Repo
Change Log
To Do Tasks

Introduction

See these publications:

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation. Vladimir Alexiev. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016: Presentation, HTML, PDF, Video
Generation of Declarative Transformations from Semantic Models. Vladimir Alexiev. In European Data Conference on Reference Data and Semantics (ENDORSE 2023), Mar 2023: paper, presentation, video

RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required.

rdfpuml makes true diagrams directly from Turtle examples using PlantUML and GraphViz. Diagram readability is of prime concern, and rdfpuml introduces various diagram control mechanisms using triples in the puml: namespace. Special attention is paid to inlining and visualizing various Reification mechanisms (described with PRV). We give examples from Getty CONA, Getty Museum, AAC (mappings of museum data to CIDOC CRM), Multisensor (NIF and FrameNet), EHRI (Holocaust Research into Jewish social networks), Duraspace (Portland Common Data Model for holding metadata in institutional repositories), Video annotation.

If the example instances include embedded source field names, they can describe a mapping precisely. I’ve implemented a few more tools to generate transformations:

rdf2rml generates R2RML transformations for RDBMS tables or SQL queries. Compared to R2RML, this saves about 15x in complexity and is competitive with the dedicated DSL YARRML
rdf2sparql generates OntoRefine or TARQL transformations from CSV/TSV that take the form of SPARQL UPDATE (for direct GraphDB loading) or CONSTRUCT (for conversion to RDF). (Subsumes two deprecated tools rdf2tarql and rdf2ontorefine)

See http://twitter.com/hashtag/rdfpuml for news, diagrams and announcements.

Citation

If you use this software, please cite it as shown above.

Github shows a link “About> Cite this repository” (see about-citation-files)
CITATION.cff describes both the software and the above publications. It’s a YAML CFF file, see https://citation-file-format.github.io/
CITATION.bib describes only the above publications. It’s a bibtex file

Documentation

rdfpuml.md or rdfpuml.html
rdf2rml.md or rdf2rml.html
rdf2sparql.md or rdf2sparql.html (subsumes rdf2tarql and rdf2ontorefine)

Related Work

The following works use or mention this software:

V. Alexiev, A. Kiryakov, P. Tarkalanov (2017) euBusinessGraph: Company and economic data for innovative products and services. 13th International Conference on Semantic Systems (Semantics 2017)
L. Zhuhadar, M. Ciampa (2017). Leveraging learning innovations in cognitive computing with massive data sets: Using the offshore Panama papers leak to discover patterns. Computers in Human Behavior. doi:10.1016/j.chb.2017.12.013
C. Debruyne, D. Lewis, D. O’Sullivan (October 2018). Generating Executable Mappings from RDF Data Cube Data Structure Definitions. In Confederated International Conferences “On the Move to Meaningful Internet Systems” (OTM 2018), pages 333-350. doi:10.1007/978-3-030-02671-4_21
V. Alexiev (2018). Museum Linked Open Data: Ontologies, Datasets, Projects (invited report). In Digital Presentation and Preservation of Cultural and Scientific Heritage (DIPP 2018). Volume 8, pages 19-50. Burgas, Bulgaria, September 2018
A.D. Junior (2019). A Jigsaw Puzzle Metaphor for Representing Linked Data Mappings. PhD Thesis, Knowledge and Data Engineering Group (KDEG), Trinity College, Dublin, Ireland
V. Alexiev, P. Tarkalanov, N. Georgiev, L. Pavlova (2020). Bulgarian Icons in Wikidata and EDM. Digital Presentation and Preservation of Cultural and Scientific Heritage (DIPP 2020).
Matjaz Rihtar. https://github.com/mrihtar/rdfgraph: inspired by rdfpuml, written in Python 2.7, uses Redland’s librdf library. I worked with Matjaz in the euBusinessGraph project.

Installation

Checkout this repo and add rdf2rml/bin to your path. Install the following prerequisites:

both tools: Perl. Tested with version 5.22 on Windows (cygwin and Strawberry).
rdfpuml:
- GraphViz
- PlantUML. You need a recent version for new features like arrow length and color. I’m currently running 1.2018.10beta7. See in particular plantuml class diagrams.
- Perl modules: use cpan or cpanm to install them: RDF::Trine RDF::Query Encode FindBin Carp::Always Slurp
- RDF::Prefixes::Curie. This is my own module located in ./lib, and rdfpuml needs FindBin to locate it.
rdf2rml:
- Apache Jena: riot, update. Tested with version 3.1.0 of 2016-05-10.
- cat, grep, rm

Docker Image

If you prefer to work with Docker so you don’t need to install software manually, you can use this rdf2rml image from the public Nexus (Docker Registry) of Ontotext. To run it, use:

docker run -v <directory>:/files --rm docker-registry.ontotext.com/rdf2rml:latest`

Where <directory> is the local directory holding your .ttl files. It was made on 31 May 2023 and uses the following versions:

rdf2rml: 31 May 2023, with fixed issue 22
PlantUML: 1.2023.7
Jena: 4.8.0

Note: pull request 7 of 17 Sep 2019 by Jem Rayfield (@jazzyray) dockerizes the installation, and makes extra changes related to input/output and configuration. However, it has not been merged yet

Debian Repo

Jonas Smedegaard (@jonassmedegaard, dr at jones fullstop dk) has volunteered for some of the tasks below. His development is at https://salsa.debian.org/debian/rdf2rml/branches. To adopt changes, do something like this.

To merge all commits in the salsa/develop branch:

cd rdf2rml    # i.e. your local clone of your Github project
git remote add salsa https://salsa.debian.org/debian/rdf2rml.git
git fetch salsa
git merge salsa/develop

To adopt only single commits from the salsa/develop branch, issue remote and fetch as above, then issue:
```
git cherry-pick $commit1 $commit2 $commit3
    
```

Change Log

2023-06-07 rdf2sparql.pl: minimize binds in `delete` clause

Issue 27: minimize the delete clause to include only necessary binds:

--filterColumn variable prebind
templated GRAPH URL and its constituent variables

2023-06-06 rdf2sparql.pl: global `--filter` options

Issue 26: add command-line options --filterColumn, --filter that are useful for handling both initial loading and data updates. See global filtering and test/graphs-crunchbase

2023-06-01 rdfpuml.pl: remove Carp::Always

Issue 2 remove Carp::Always since it produces a stack trace that’s too verbose

2023-05-17 rdf2sparql.pl: Conditional Nodes

Support “Conditional Nodes”, i.e. URLs that are conditional on the existence of some fields.
issue 22 fixed (2023-05-31)

2023-05-05 rdfpuml.pl: don’t mangle round brackets

issue 21: Round brackets in fields (eg “(name)”) and URLs (eg <type/(type)>) are not mangled to square brackets anymore

2023-04-29 rdfpuml.pl: puml:option

issue 18 Add puml:option for left to right direction etc

2023-04-19 rdf2sparql.pl: per-model filter, dynamic graph

issue 19 Implement filter function, see test/filter-content
issue 20 Allow dynamic graph (computed from a data column), see test/graphs-crunchbase

2022-08-23 rdf2sparql.pl: add datatype to var name instead of UPPERCASING

Datatype attachment eg strdt(?var,xsd:date) now outputs to ?var_xsd_date to avoid conflict with input field names in ALL_UPPERCASE

2022-08-23 rdfpuml.pl: handle blank-node types; add shell scripts

issue 10 Handle blank-node types that occur on owl:Restriction (see test/blank-node)
Duplicate rdfpuml.bat, puml.bat as shell scripts rdfpuml, puml for use in Makefiles across Linux and Windows

2022-08-15 rdf2sparql.pl: merge to one tool

Merge rdf2tarql and rdf2ontorefine to one tool rdf2sparql

2022-04-08 rdf2ontorefine.pl: generate OntoRefine Update queries

Add script to generate OntoRefine SPARQL Update queries from model.

2021-09-02 rdfpuml.pl: Unicode Processing

Use Perl option -C when invoking for proper Unicode processing. See doc section rdfpuml.html#Unicode

2020-09-17 rdf2rml: logicalTable

Use URL for logicalTable instead of blank node, so that R2RML generated from different models for different tables can be merged more easily. Warning: this assumes that all instances of one subjectMap use the same query.

2020-06-01 rdf2tarql.pl: generate TARQL scripts

Add rdf2tarql.pl script to generate TARQL script (CSV-RDF conversion) from model.

2020-06-01 rdf2rml: improve scripts, SQL query/table propagation

Improve script to abort if the first pipeline step (“update”) fails
Improve script to work on Cygwin (invokes the Jena tools as riot.bat and update.bat)
Filter out harmless warnings from Jena update’s error log for datatypes like xsd:integer, xsd:date etc since the mention of a source field doesn’t match the syntax of such literals.
If a node has single outgoing link and no SQL query/table (puml:label), propagate that property backward across the link into the node (previously that was done only for incoming links)

2020-05-30 rdf2rml: handle inverse edge

When an edge Y-P-X is recorded in the RDB table of X (as foreign key) or in an association table, it is awkward to specify that table in the node Y. So I added this SPARQL UPDATE clause:

If a node ?y has no SQL, is not Inlined, has a single outgoing edge, then add the SQL of its counterparty ?x as default

2018-11-14 rdfpuml.pl: avoid puml:stereotype class node

I often define puml:stereotype for some classes in prefixes.ttl. If the class is not used in some particular turtle, it should avoid emitting a disconnected puml class.

stereotypes(): Avoid emitting
has_statements_different_from(): Check that a node has statements other than puml:stereotype

2018-06-29 rdfpuml.pl bug: class and puml:InlineProperty

When a type is also used with puml:InlineProperty, it caused this error:

Can't locate object method "uri_value" via package "RDF::Trine::Node::Literal" at rdfpuml.pl line 261.
   main::puml_qname(RDF::Trine::Node::Literal=ARRAY(0x4fd0920)) called at rdfpuml.pl line 279
   main::puml_node2(RDF::Trine::Node::Literal=ARRAY(0x4fd0920)) called at rdfpuml.pl line 128

An inline is converted to a literal, but rdf:type is always assumed to be a URL. Test: ./test/regression/type-inlineProperty.ttl

2018-04-05 rdfpuml.pl: Arrow Attributes

Add arrow attributes (dotted, dashed, bold) and length Test: ./test/regression/arrowLen.ttl

2018-02-25 rdfpuml.pl: Arrow Color

Support arrow color (named or hex)

2017-08-25 rdfpuml.pl: decorative arrows

Fix unicode of “decorative arrows” on links going to a Reified Relation:

left => "←", right => "→", up => "↑", down => "↓"

2016-02-10 rdfpuml.pl: blank nodes, hidden links

support blank nodes
support new puml “hidden” links that can sometimes help the layout: http://plantuml.com/class-diagram#layout

To Do Tasks

Help needed for the following tasks. Post bugs and enhancement requests to this repo!

Near-term

Modularize and Package Better

Regression Tests

sort is added at various places to make the tool more deterministic, i.e. independent of order of RDF statements in the input file. However, this will interfere with the ability to control the layout, especially of disconnected components (see layout_new_line)
Some regression tests are added.

rdf2rml: disentangle inverse edge

In the case Y-P-X described above:

Also need to record ?y puml:property ?p so this prop name can be added to ?y’s subject map
When making ?map, take puml:property into account
But ?map is made many times, and copy-paste is no good…
Also, this should be done in some cases but not others…
So it’s better to record ?y puml:map ?map …

Release on CPAN

Add Unicode tests

Add ttl with non-ASCII chars: Accented, Cyrillic, French, etc.

Accented: ~”Rudolf Mössbauer”~ in ./test/TRR/societyMember.ttl

Prefixes

Allow specifying the prefixes file

See #7

Eliminate Curie.pm

./lib/RDF/Prefixes/Curie.pm remembers @base and uses that for URL shortening. Once perlrdf#131 is fixed, eliminate this dependency (local module)

Remember prefixes from input file

rdfpuml shortens URLs using prefixes only from prefixes.ttl, but should also use prefixes defined in the individual input file.

Support more RDF Formats

Now it only supports Turtle, because it concatenates prefixes.ttl to the main file. If it can collect all prefixes from RDF files, such concatenation won’t be needed

Batch Processing

Issue #1: plantuml is slow to start up, so we’d like to process a bunch of puml files at once. The best way is to have a smarter script or Makefile that uses the following http://plantuml.com/command-line features:

Keep the intermediate puml files (the current Makefile doesn’t preserve them)
Run plantuml on a whole folder (with -r[ecurse] it can even recurse through subfolders)
Use -checkmetadata to skip png files that don’t need to be regenerated. (The whole puml text is stored in the png, so plantuml can quickly check that there are no changes)
The Makefile should start plantuml only once, if some of the puml files is newer than its respective png file

“Manual” Batching

Before I discovered the -checkmetadata option, I had the idea that rdfpuml could put several diagrams in one puml file:

@startuml file1.png
  # made from file1.ttl
@enduml
@startuml file2.png
  # made from file2.ttl
@enduml

However, this interferes with make processing that regenerates only png for changed ttl files, and makes things less modular overall.

Mid-Term

Upgrade to use Attean

Trine (Perl RDF) is end of life. Attean is the new generation

Integrate in Emacs `org-mode`

Write Turtle, see diagram (easy to do)

Node colors, icons, tooltips

See ./ideas

More arrow types and styles

See arrows arrows-2 from https://github.com/anoff/blog/tree/master/static/assets/plantuml/diagrams:

Arrow styles and colors (bold, dashed etc): https://mrhaki.blogspot.com/2016/12/plantuml-pleasantness-get-plantuml.html
plantuml -pattern regexes:

dotted|dashed|plain|bold|hidden|norank|single|thickness

Extra Layout Options

Local layout options are described in Help on Layout:

“hidden” makes a constraint between two nodes, but does not draw the link (rdfpuml already implements this)
norank ignores a link for layout purposes (same as graphviz constraint=false)
“together” groups classes as if they were in the same package (i.e. puts them in a graphviz cluster)

Global options include (eg see this diagram):

And there are a lot more undocumented features: https://forum.plantuml.net/7095

Custom Reification

Ability to describe custom reification situations using the Property Reification Vocabulary (PRV)

Use MindMap/WBS for Hierarchies

Plantuml now has MindMap and WBS (or OBS) diagrams that use a simple bulleted syntax to draw hierarchies.

It would be nice to use this to draw hierarchies of individuals, in particular taxonomies.

Here are examples of the two styles:

Mindmap
WBS

Long-Term

rdf2soml to Generate Semantic Object Models

A new tool rdf2soml to generate Ontotext Platform SOML from RDF examples.

What’s missing? Most importantly: property cardinality and virtual inverses.

PlantUML can show arrow cardinalities, and this simple and natural PlantUML code:

X "0:1" -left-> "1:m" Y : prop/\ninvProp

Is depicted as follows:

We have two options how to express this in triples:

Cardinality With RDF*

##### model triples
:X :prop :Y.
##### puml triples
<< :X :prop :Y >>
  puml:arrow puml:left; # direction
  puml:min 1; puml:max puml:inf; # cardinality
  puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"]. # virtual inverse

Pros: very natural
Cons:
- Perl RDF doesn’t support RDF*, and few editors support it either.
- Annotating a triple does not assert it, so we need to assert it as well

Cardinality With Blank Node

##### model triples
:X :prop :Y.
##### puml triples
:X puml:left :Y. # direction
:X :prop [ # a puml:Cardinality; # may need this marker class to skip the node from the diagram
  puml:min 1; puml:max puml:inf; # cardinality
  puml:object :Y; # only needed if X has several relations "prop" and they need different annotations
  puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"] # virtual inverse
].

rdf2shape to Describe & Generate RDF Shapes

Visualize RDF Shapes (SHACL and ShEx)

Issue #8: discussion with Thomas Francart of Sparna

I developed this SHACL to PlantUML converter, in Java, based on TopQuadrant SHACL lib, and the result is at https://shacl-play.sparna.fr/play/draw and code at https://github.com/sparna-git/shacl-play/tree/master/shacl-diagram

I don’t have a strong opinion on the example you provide, an alternative idea that comes to my mind is

:node1 :link [
  rdf:value :node2;
  puml:min 1 ;
  puml:max 2 ;
]

But this changes the structure of the example graph itself, which might not be convenient

Generate transformations for other than relational sources

R2RML works great for RDBMS, but how about other sources? Extend rdf2rml to generate:

RML: extends R2RML to handle RDB, XML, JSON, CSV
XSPARQL: extends XQuery with SPARQL construct and JSON input
DONE tarql: handles TSV/CSV with SPARQL construct
DONE OntoRefine: transformation of TSV/CSV and direct loading to GraphDB with SPARQL Update

rdf2rml's People

Contributors

Stargazers

Watchers

Forkers

jazzyray steady137 aahmadai scvflare

rdf2rml's Issues

use config file

@jazzyray:
#3 adds the use of a config file plantuml.cfg. I see the following issues:

It should check for the presence of such file, or should have an option. Should not die if a config file is not present.
hide empty methods etc are necessary for the diagram to look ok. So it's better to leave these in: they could be specified in the config file, but what's the benefit given that they must always be present?
Rather than concating the config file, it should use the -config option as per https://forum.plantuml.net/4266/configuration-file-specification (another option is to use #include inside the file, but I think it's slightly less convenient)
It should document the feature

Add hyperlinks to diagram

With SVG output, PlantUML diagrams can contain links: https://plantuml.com/link

For nodes and edges, this could straightforward to add.

For inlined content with more than one URI, probably PlantUML doesn't support it -- i.e., only one link per row is allowed. However, having nodes and edges linked would already be a great improvement.

`_IF_BOUND` for Conditional Nodes is wrong

@SCVFlare pointed out that https://github.com/VladimirAlexiev/rdf2rml/blob/master/test/conditional-node/organizations.ru#L21 is wrong:
bind(if(bound(coalesce(?country_code,?region,?postal_code,?address),"address",?UNDEF) as ?address_IF_BOUND)

it needs an extra )
bound() takes only variable not expression as argument, so we get
MALFORMED QUERY: Encountered " "coalesce" "coalesce "" at line 111, column 23. Was expecting one of: <VAR1>

So instead of coalesce(), we need to expand to a series of ||:

bind(if(bound(?country_code) || bound(?region) || bound(?postal_code) || bound(?address),
  "address",?UNDEF) as ?address_IF_BOUND)

At least now we get the chance not to require ? before the extra variables.

We could do it with _IF_BOUND_n() where n is 1,2,3,4,5 but that's a bit ugly:

#define _IF_BOUND_1(x,y1)             bind(if(bound(?y1),#x,?UNDEF) as ?x##_IF_BOUND)
#define _IF_BOUND_2(x,y1,y2)          bind(if(bound(?y1)||bound(?y2),#x,?UNDEF) as ?x##_IF_BOUND)
#define _IF_BOUND_3(x,y1,y2,y3)       bind(if(bound(?y1)||bound(?y2)||bound(?y3),#x,?UNDEF) as ?x##_IF_BOUND)
#define _IF_BOUND_4(x,y1,y2,y3,y4)    bind(if(bound(?y1)||bound(?y2)||bound(?y3)||bound(?y4),#x,?UNDEF) as ?x##_IF_BOUND)
#define _IF_BOUND_5(x,y1,y2,y3,y4,y5) bind(if(bound(?y1)||bound(?y2)||bound(?y3)||bound(?y4)||bound(?y5),#x,?UNDEF) as ?x##_IF_BOUND)

Welcome to some CPP magic. Google how to iterate over VA_ARGS returns a number of relevant hits:

https://codecraft.co/2014/11/25/variadic-macros-tricks/ explains a basic idea
https://groups.google.com/g/comp.std.c/c/d-6Mj5Lko_s is the basic idea, but in a bigger form and with less explanation
https://gist.github.com/kbauer/d651bae52ab2f72b8d1e uses https://gist.github.com/kbauer/09e2a4fb916a9524374f to assemble a general solution, but is rather complex
https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms is a bag of mind-twisting tricks including recursion
https://embeddedartistry.com/blog/2020/07/27/exploiting-the-preprocessor-for-fun-and-profit/ explains the boost.preprocessor library
https://www.boost.org/doc/libs/master/libs/preprocessor/doc/index.html is the documentation of that library. It includes MANY things, including https://www.boost.org/doc/libs/master/libs/preprocessor/doc/ref/overload.html to pick a macro variant by arity

The trick is to do it as simple as possible

If possible, don't use external macros
Do it with a minimum of auxiliary macros
A limit of max 5 args is enough

special display of rdf:List

Continuing #10, the display of rdf:Lists is pretty ugly because it puts siblings (list members) at different ranks in the diagram, and thus obscures what is going on. Lists are involved in:

OWL constructs (eg intersectionOf, unionOf)
SHACL constructs (eg sh:and, sh:or)

It would be nicer to have a special display: a horizontal box with dots inside, something like

PlantUML YAML (but horizontal):
or JSON (pretty similar)
dot record-based nodes:

This won't be easy for several reasons.

RDF processing

most lists in the wild don't have type a rdf:List but need to be recognized by rdf:first, rdf:rest
need special processing of the list RDF: to consume it all, produce diagram markup, and delete the triples

PlantUML

PlantUML can mix JSON structures into class/object diagrams, eg see
but those cannot refer to other class/object nodes i.e. they are trees not graphs
(unlike standalone JSON diagrams, JSON embedded in class/object diagram never displays links, only nested tree structures)

So currently I don't see any way to do this with PlantUML. I could do it with Graphviz dot (or with https://plantuml.com/dot) but I can't afford to rework (downgrade) my tool to use dot instead of PlantUML.

Discussion at https://forum.plantuml.net/15481/possible-link-elements-from-two-jsons-with-both-jsons-embeded?show=15567#c15567

Support OWL reification

Another feature request .. Could the OWL vocabulary for axiom annotations be supported, similarly to RDF reification?

Terms used are owl:Axiom, owl:annotatedSource, owl:annotatedProperty, and owl:annotatedTarget.

Discussed, e.g., here: https://stackoverflow.com/questions/45610092/owl-reification-vs-rdf-reification

Keep round brackets in fields

Currently a field like

dc:title "(title)"

Is rendered with square brackets to avoid puml interpreting it as a method.

Prepend {field} to each field to avoid this: plantuml/plantuml#536

rdf2sparql: allow dynamic graph (from data)

Currently rdf2sparql can only handle named graphs hardcoded in the ttl source of the transformation as

# GRAPH <...>

We need to be able to specify the URI of the named graphs from a variable in the source table, i.e. a templated URL, e.g.:

# GRAPH <...(graph)>

Since clear graph allows only constants, use delete where graph {?s ?p ?o}

syntax error in openrefine generated SPARQL

https://github.com/VladimirAlexiev/rdf2rml/blob/master/doc/rdf2sparql.pod#generated-sparql
There's a white lie here:

delete where {graph $GRAPH {?s ?p ?o}};
insert {graph $GRAPH {
  <Insert Patterns>
}}
where {

Since #20 is implemented, the pattern looks like this (test/graphs-crunchbase/organizations.ru):

delete {graph ?graph_organizations_uuid_URL {?_s_ ?_p_ ?_o_}}
where {
  service <rdf-mapper:ontorefine:PROJECT_ID> {
    # binds
  }
  ?graph_organizations_uuid_URL {?_s_ ?_p_ ?_o_}};  ## BUG
insert {graph ?graph_organizations_uuid_URL {
}}
where {
  service <rdf-mapper:ontorefine:PROJECT_ID> {
    # binds
  }
};

The reason is that:

It now computes the graph URL from the binds
It cannot use delete where with a computed URL because that's a non-simple where clause
After the graph is computed, it needs to lookup the triples in that graph that need to be deleted

But there's a bug: the word graph is missing in the line indicated.
Also need to fix the documentation.

Add some screenshots to the README

Everyone loves screenshots. Attract users.

Improve dockerization

Hi @SCVFlare!
I added your info about Docker version here: https://github.com/VladimirAlexiev/rdf2rml#docker-image.
However:

Please upgrade the image to use the latest version, after the #22 fix
Does it make sense to add a Dockerfile as per #7? And maybe some more of those modifications?

Add option `left to right direction`

@mariapoveda, @rgcmme
I made a diagram from one of your examples:

Details in https://github.com/VladimirAlexiev/rdf2rml/tree/master/test/saref4city.
Had to tweak prefixes and base to make it look better.
If there's interest to visualize other SAREF examples like this, we can automate it.

https://github.com/VladimirAlexiev/rdf2rml/tree/master/test/saref4city#why-the-rigmarole:
need to add option left to right direction to my tool.

Maybe something like this in Turtle

[] puml:option "left to right direction"

where the text is directly dumped at the beginning of the puml.

Pros of this simple approach:

This will also handle skinparam and !pragma
"left to right direction" is neither skinparam, nor pragma (https://forum.plantuml.net/17002/how-to-set-left-to-right-direction-from-command-line) so needs to be emitted as text.
cons: the textual order of statements puml:option is not guaranteed, but some puml options may need to come in order
but hey! the user can use a multiline string like this:

[] puml:option """
left to right direction
!pragma layout smetana
"""

Ignore \r in newlines

A turtle string with \r\n results in plantuml failure.

Treat all of these as a single newline: (\n|\r|\r\n|\n\r).
Allow multiple consecutive newlines (of the same kind), eg \r\r\n is two newlines.

cc @jazzyray

Overriding inlining of rdf:type

I have an ontology with both classes and individuals. It would be good to not inline some cases of rdf:type, to emphasise how various individuals belong to different parts of the class taxonomy.

Is there a way to request the opposite of puml:InlineProperty? Preferably, this would be applied only to selected rdf:type triples (the default format with inlining of rdf:type helps a lot to avoid busy diagrams).

Input from stdin

It would be convenient to be able to pipe a file to rdfpuml instead of referring to a file.

(This occurred to me while setting up for using rdfpuml in Emacs with org-babel, as it would be easier to send a Turtle source block to rdfpuml with a :stdin header, avoiding the need to write a temporary file.)

Fails on rdf:type with anonymous node

(Running with Strawberry Perl on Windows 10.)

I find that the tool fails to produce a diagram with a common OWL pattern: when an individual is a member of an anonymous class. The error message is

Can't locate object method "uri_value" via package "RDF::Trine::Node::Blank" at bin/rdf2rml/bin/rdfpuml.pl line 275.

Here is an example that fails (from https://rds.posccaesar.org/ontology/plm/rdl/PCA_100003953/):

rdl:PCA_100003953  a  owl:NamedIndividual , lis:Scale ;

## the next "a", i.e., rdf:type, seems to be the problem ##
        a           [ a                  owl:Restriction ;
                      owl:allValuesFrom  rdl:PCA_100003891 ;
                      owl:onProperty     [ owl:inverseOf  lis:datumUOM ]
                    ] ;
        rdfs:label  "radian per second squared" ;
        om:symbol   "rad/s2" .

If the a on the second line in this example is replaced with, e.g., :x, the diagram is produced. So, I guess this has to do with some special handling of rdf:type.

remove Carp::Always

With use strict; use warnings; use autodie;
I don't think we need use Carp::Always (http://search.cpan.org/~ferreira/Carp-Always-0.13/lib/Carp/Always.pm). It'll show a stack trace on every die but who needs this for a missing file?

cc @jonassmedegaard

rdf2sparql: implement filter function

In the automated generated query from ttl we need filter by string, but in the current version it's no possible because it's add variable name in the SPARQL if make filter macro.

Can you please handle it

batch processing

See https://github.com/VladimirAlexiev/rdf2rml#421-batch-processing.
cc @jonassmedegaard

add `--filter` options (conditional processing by updated_at timestamp)

https://github.com/VladimirAlexiev/rdf2rml/blob/master/doc/rdf2sparql.pod#generated-sparql

Crunchbase tables include updated_at timestamps in every row that we compare to a global timestamp (recorded in the database) and find only updated rows. This is an extra filter generated by a slightly more complex script (not published).

The extra clauses added to SPARQL are:

  service <rdf-mapper:ontorefine:PROJECT_ID> {
    bind(?c_updated_at as ?updated_at) 
    # other binds
  }
  <cb> cb:updatedAt ?UPDATED_AT_DATETIME.
  bind(replace(str(?UPDATED_AT_DATETIME),'T',' ') as ?UPDATED_AT)
  filter(?updated_at > ?UPDATED_AT)

We can control this with command-line options

perl -S rdf2sparql.pl --filterColumn updated_at \
  --filter "<cb> ex:updatedAt ?UPDATED_AT_DT bind(replace(str(?UPDATED_AT_DT),'T',' ') as ?UPDATED_AT) filter(?updated_at > ?UPDATED_AT)"

Notes:

The prebind bind(?c_updated_at as ?updated_at) should be added always, even if that column is not used in the mapping
I use options rather than extra triples in prefixes.ttl because rdf2sparql doesn't read any triples (works purely on the turtle text)
Document after https://github.com/VladimirAlexiev/rdf2rml/blob/master/doc/rdf2sparql.pod#filtering

On "Visualize RDF Shapes (SHACL and ShEx)"

Hi !

What are your ideas about "Visualize RDF Shapes (SHACL and ShEx)" ? I'm thinking about parsing SHACL in Java using https://github.com/TopQuadrant/shacl and then output PLantUML text. Did you had any specific plans on this ? (I would have been happy to do that using RDF4J but I can't find a SHACL Object modelling in RDF4J).

Cheers

Document handling of blank node types

Document handling of blank node types in rdfpuml.pod: #10 (comment)

should also sanitize prop and datatype URLs

The file star-wars-luke-skywalker.ttl defines

@prefix voc <https://swapi.co/vocabulary/>.

but it's not in a prefixes.ttl file, which causes full URLs in the puml output:

_https_swapi_co_resource_vehicle_30_ : <https://swapi.co/vocabulary/manufacturer> "Aratech Repulsor Company"
_https_swapi_co_resource_vehicle_30_ : <https://swapi.co/vocabulary/maxAtmospheringSpeed> "360"^^<http://www.w3.org/2001/XMLSchema#integer>

plantuml then mangles some of the URLs by removing // and emits them in italic:

We need to sanitize not just subject URLs (as above, done in puml_node()), but also prop and datatype URLs.

Low priority, because it's much preferable for prop and datatype URLs to be prefixed (shortened).

absolute URL interpreted as relative URL when there's a `@prefix` equal to `@base`

@base         <https://ontotext.com/knowledge-graph/>.
@prefix OTKG: <https://ontotext.com/knowledge-graph/>.

<researchProject/(acronym)> a s:ResearchProject;
  s:sameAs         <(wikidata)>.

This is rendered as the CURIE OTKG:(wikidata), i.e. the field (wikidata) is interpreted as a URL relative to @base.

This makes sense for templated URLs like <researchProject/(acronym)> because the start it not a URL scheme like http:.
But for a URL that consists of a field alone, it's reasonable to assume that will be an absolute URL.

It's the RDF parser that decides that (wikidata) is a relative URL. How to fool it into not using @base?

I could fake it as https://(wikidata) but that's ugly as hell: how could I know that's the right URL scheme?
If I fake it as /(wikidata), that would use the host from @base, and <https://ontotext.com/(wikidata)> is even worse.

I think it's the safest to EXCLUDE the @prefix that is equal to @base.

(The reason I used it is because GraphDB remembers prefixes as namespaces, but does not remember base.
So OTKG: is used to shorten URLs when displaying resources.)

TODO: document in rdfpuml page

rdfpuml: don't mangle `(...)` to `[...]`

see tests/parens-not-brackets
see https://plantuml.com/class-diagram Adding methods

Take a simple model with field names embedded in URL and attribute;

<person/(id)> a s:Person;
  s:additionalType <type/(type)>;
  s:name "(name)".

<type/(type)> a puml:InlineProperty.

rdfpuml mangles (...) to [...] to avoid parasitic compartments (horizontal lines),
since PlantUML assumes a field name with parens to be a method and moves it in a compartment above attributes:

The system checks for parenthesis to choose between methods and fields.

@startuml
hide empty members
hide circle
skinparam classAttributeIconSize 0
class _person_id_ as "<person/(id)>"
_person_id_ : a s:Person
_person_id_ : s:additionalType <type/[type]>
_person_id_ : s:name "[name]"
@enduml

Which results in this (notice the square brackets):

We can use this new feature from plantuml/plantuml#536 (comment):

You can use {field} and {method} modifiers to override default behaviour of the parser about fields and methods.

@startuml
hide empty members
hide circle
skinparam classAttributeIconSize 0
class _person_id_ as "<person/(id)>"
_person_id_ : {field} a s:Person
_person_id_ : {field} s:additionalType <type/(type)>
_person_id_ : {field} s:name "(name)"
@enduml

Which results in this (notice the parens):

minimize binds in `delete` clause

Currently the delete clause (used with Ontorefine Update) uses the same binds as the subsequent insert clause.
Eg test/graphs-crunchbase/organizations.ru:

delete {graph ?graph_organizations_uuid_URL {?_s_ ?_p_ ?_o_}}
where {
  service <rdf-mapper:ontorefine:PROJECT_ID> {
    bind(?c_updated_at as ?updated_at)
    bind(?c_uuid as ?uuid)
    bind(?c_name as ?name)
    bind(?c_permalink as ?permalink)
    bind(?c_cb_url as ?cb_url)
    bind(?c_rank as ?rank)
    bind(?c_created_at as ?created_at)
    bind(?c_legal_name as ?legal_name)
    bind(?c_roles as ?roles)
    bind(?c_domain as ?domain)
    bind(?c_homepage_url as ?homepage_url)
    bind(?c_country_code as ?country_code)
    bind(?c_state_code as ?state_code)
    bind(?c_region as ?region)
    bind(?c_city as ?city)
    bind(?c_address as ?address)
    bind(?c_postal_code as ?postal_code)
    bind(?c_status as ?status)
    bind(?c_short_description as ?short_description)
    bind(?c_category_list as ?category_list)
    bind(?c_num_funding_rounds as ?num_funding_rounds)
    bind(?c_total_funding_usd as ?total_funding_usd)
    bind(?c_total_funding as ?total_funding)
    bind(?c_total_funding_currency_code as ?total_funding_currency_code)
    bind(?c_founded_on as ?founded_on)
    bind(?c_last_funding_on as ?last_funding_on)
    bind(?c_closed_on as ?closed_on)
    bind(?c_employee_count as ?employee_count)
    bind(?c_email as ?email)
    bind(?c_phone as ?phone)
    bind(?c_facebook_url as ?facebook_url)
    bind(?c_linkedin_url as ?linkedin_url)
    bind(?c_twitter_url as ?twitter_url)
    bind(?c_logo_url as ?logo_url)
    bind(?c_alias1 as ?alias1)
    bind(?c_alias2 as ?alias2)
    bind(?c_alias3 as ?alias3)
    bind(?c_primary_role as ?primary_role)
    bind(?c_num_exits as ?num_exits)
    bind(iri(concat("graph/organizations/",?uuid)) as ?graph_organizations_uuid_URL)
    bind(iri(concat("cb/agent/",?uuid)) as ?cb_agent_uuid_URL)
    bind(strdt(?cb_url,xsd:anyURI) as ?cb_url_xsd_anyURI)
    bind(strdt(?rank,xsd:integer) as ?rank_xsd_integer)
    bind(REPLACE(?created_at,' ','T') as ?created_at_FIXDATE)
    bind(strdt(?created_at_FIXDATE,xsd:dateTime) as ?created_at_FIXDATE_xsd_dateTime)
    bind(REPLACE(?updated_at,' ','T') as ?updated_at_FIXDATE)
    bind(strdt(?updated_at_FIXDATE,xsd:dateTime) as ?updated_at_FIXDATE_xsd_dateTime)
    ?roles_SPLIT1 spif:split (?roles ',').
    bind(LCASE(REPLACE(REPLACE(REPLACE(?roles_SPLIT1, "[^\\p{L}0-9]", "_"), "_+", "_"), "^_|_$", "")) as ?roles_SPLIT1_URLIFY)
    bind(iri(concat("cb/organizationRole/",?roles_SPLIT1_URLIFY)) as ?cb_organizationRole_roles_SPLIT1_URLIFY_URL)
    bind(strdt(?homepage_url,xsd:anyURI) as ?homepage_url_xsd_anyURI)
    bind(LCASE(REPLACE(REPLACE(REPLACE(?status, "[^\\p{L}0-9]", "_"), "_+", "_"), "^_|_$", "")) as ?status_URLIFY)
    bind(iri(concat("cb/organizationStatus/",?status_URLIFY)) as ?cb_organizationStatus_status_URLIFY_URL)
    ?category_list_SPLIT1 spif:split (?category_list ',').
    bind(LCASE(REPLACE(REPLACE(REPLACE(?category_list_SPLIT1, "[^\\p{L}0-9]", "_"), "_+", "_"), "^_|_$", "")) as ?category_list_SPLIT1_URLIFY)
    bind(iri(concat("cb/industry/",?category_list_SPLIT1_URLIFY)) as ?cb_industry_category_list_SPLIT1_URLIFY_URL)
    bind(strdt(?num_funding_rounds,xsd:integer) as ?num_funding_rounds_xsd_integer)
    bind(strdt(?total_funding_usd,xsd:decimal) as ?total_funding_usd_xsd_decimal)
    bind(strdt(?total_funding,xsd:decimal) as ?total_funding_xsd_decimal)
    bind(strdt(?founded_on,xsd:date) as ?founded_on_xsd_date)
    bind(strdt(?last_funding_on,xsd:date) as ?last_funding_on_xsd_date)
    bind(strdt(?closed_on,xsd:date) as ?closed_on_xsd_date)
    bind(if(?employee_count in ("other","not provided","unknown"),?UNDEF,?employee_count) as ?employee_count_IFNOTNULL)
    bind(LCASE(REPLACE(REPLACE(REPLACE(?employee_count_IFNOTNULL, "[^\\p{L}0-9]", "_"), "_+", "_"), "^_|_$", "")) as ?employee_count_IFNOTNULL_URLIFY)
    bind(iri(concat("cb/employeeCount/",?employee_count_IFNOTNULL_URLIFY)) as ?cb_employeeCount_employee_count_IFNOTNULL_URLIFY_URL)
    bind(strdt(?facebook_url,xsd:anyURI) as ?facebook_url_xsd_anyURI)
    bind(strdt(?linkedin_url,xsd:anyURI) as ?linkedin_url_xsd_anyURI)
    bind(strdt(?twitter_url,xsd:anyURI) as ?twitter_url_xsd_anyURI)
    bind(strdt(?logo_url,xsd:anyURI) as ?logo_url_xsd_anyURI)
    bind(LCASE(REPLACE(REPLACE(REPLACE(?primary_role, "[^\\p{L}0-9]", "_"), "_+", "_"), "^_|_$", "")) as ?primary_role_URLIFY)
    bind(iri(concat("cb/organizationRole/",?primary_role_URLIFY)) as ?cb_organizationRole_primary_role_URLIFY_URL)
    bind(strdt(?num_exits,xsd:integer) as ?num_exits_xsd_integer)
  }
  <cb> cb:updatedAt ?UPDATED_AT_DT bind(replace(str(?UPDATED_AT_DT),'T',' ') as ?UPDATED_AT) filter(?updated_at > ?UPDATED_AT)
  graph ?graph_organizations_uuid_URL {?_s_ ?_p_ ?_o_}};

This works and doesn't slow down the query, since all binds are executed in memory.
However, it's a bit unsatisfactory since it complicates the query.

Pare the delete clause down to only necessary binds:

?updated_at, specified with --filterColumn
?graph_organizations_uuid_URL (and its constituent variables) that comes from the templated URL # GRAPH <graph/organizations/(uuid)>

delete {graph ?graph_organizations_uuid_URL {?_s_ ?_p_ ?_o_}}
where {
  service <rdf-mapper:ontorefine:PROJECT_ID> {
    bind(?c_updated_at as ?updated_at)
    bind(iri(concat("graph/organizations/",?uuid)) as ?graph_organizations_uuid_URL)
  }
  <cb> cb:updatedAt ?UPDATED_AT_DT bind(replace(str(?UPDATED_AT_DT),'T',' ') as ?UPDATED_AT) filter(?updated_at > ?UPDATED_AT)
  graph ?graph_organizations_uuid_URL {?_s_ ?_p_ ?_o_}};

Binds are tracked in:

@where  = ('','',''); # Array of WHERE strings, since order of binds matters:
  # [0] OntoRefine prebinds
  # [1] Normal binds inside OntoRefine service
  # [2] Binds after (outside) OntoRefine service

This task can be done by further subdividing @where (all delete binds are also needed by insert):

  # [0] OntoRefine --filterColumn prebind and GRAPH variable: used for both DELETE and INSERT
  # [1] OntoRefine prebinds: used for INSERT only
  # [2] Normal binds inside OntoRefine service: used for INSERT only
  # [3] Binds after (outside) OntoRefine service: used for INSERT only
  # [4] Binds after (outside) OntoRefine service: used for both DELETE and INSERT

PNG output is truncated

Thank you for open-sourcing this tool, including the containerized image.

I am trying to visualize graphs from the domain data.vlaanderen.be. An example of input is https://data.vlaanderen.be/ns/persoon.ttl. This turtle file does not contain any puml: term. I am just trying to see what the visualization looks like without any configuration.

I run docker run -v <directory>:/files --rm docker-registry.ontotext.com/rdf2rml:latest.

Here is the png output which is truncated.

How can I prevent the truncation?

enable the use of Unicode in URLs

<person/LarsWikstrom> a s:Person;
  s:name "Lars Wikström";
  s:worksFor <org/Triona>;

results in this PlantUML and a syntax error:

class _person_LarsWikstr�m_ as "<person/LarsWikström>"
class _person_LarsWikstr�m_ <<(P,pink)>>
show _person_LarsWikstr�m_ circle
_person_LarsWikstr�m_ -down-> _org_Triona_ : s:worksFor

This happens even when invoking perl -C rdfpuml.pl as instructed at https://rawgit2.com/VladimirAlexiev/rdf2rml/master/doc/rdfpuml.html#unicode
The bug is in sanitize()

vladimiralexiev / rdf2rml Goto Github PK

rdf2rml's Introduction

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Table of Contents

Introduction

Citation

Documentation

Related Work

Installation

Docker Image

Debian Repo

Change Log

2023-06-07 rdf2sparql.pl: minimize binds in delete clause

2023-06-06 rdf2sparql.pl: global --filter options

2023-06-01 rdfpuml.pl: remove Carp::Always

2023-05-17 rdf2sparql.pl: Conditional Nodes

2023-05-05 rdfpuml.pl: don’t mangle round brackets

2023-04-29 rdfpuml.pl: puml:option

2023-04-19 rdf2sparql.pl: per-model filter, dynamic graph

2022-08-23 rdf2sparql.pl: add datatype to var name instead of UPPERCASING

2022-08-23 rdfpuml.pl: handle blank-node types; add shell scripts

2022-08-15 rdf2sparql.pl: merge to one tool

2022-04-08 rdf2ontorefine.pl: generate OntoRefine Update queries

2021-09-02 rdfpuml.pl: Unicode Processing

2020-09-17 rdf2rml: logicalTable

2020-06-01 rdf2tarql.pl: generate TARQL scripts

2020-06-01 rdf2rml: improve scripts, SQL query/table propagation

2020-05-30 rdf2rml: handle inverse edge

2018-11-14 rdfpuml.pl: avoid puml:stereotype class node

2018-06-29 rdfpuml.pl bug: class and puml:InlineProperty

2018-04-05 rdfpuml.pl: Arrow Attributes

2018-02-25 rdfpuml.pl: Arrow Color

2017-08-25 rdfpuml.pl: decorative arrows

2016-02-10 rdfpuml.pl: blank nodes, hidden links

To Do Tasks

Near-term

Modularize and Package Better

Regression Tests

rdf2rml: disentangle inverse edge

Release on CPAN

Add Unicode tests

Prefixes

Allow specifying the prefixes file

Eliminate Curie.pm

Remember prefixes from input file

Support more RDF Formats

Batch Processing

“Manual” Batching

Mid-Term

Upgrade to use Attean

Integrate in Emacs org-mode

Node colors, icons, tooltips

More arrow types and styles

Extra Layout Options

Custom Reification

Use MindMap/WBS for Hierarchies

Long-Term

rdf2soml to Generate Semantic Object Models

Cardinality With RDF*

Cardinality With Blank Node

rdf2shape to Describe & Generate RDF Shapes

Visualize RDF Shapes (SHACL and ShEx)

Generate transformations for other than relational sources

rdf2rml's People

Contributors

Stargazers

Watchers

Forkers

rdf2rml's Issues

Recommend Projects

Recommend Topics

Recommend Org

2023-06-07 rdf2sparql.pl: minimize binds in `delete` clause

2023-06-06 rdf2sparql.pl: global `--filter` options

Integrate in Emacs `org-mode`