robstewart57 / rdf4h Goto Github PK
View Code? Open in Web Editor NEWrdf4h is a library for working with RDF in Haskell
License: BSD 3-Clause "New" or "Revised" License
rdf4h is a library for working with RDF in Haskell
License: BSD 3-Clause "New" or "Revised" License
I have rdf4h-1.2.5
installed and want to upgrade it to the latest rdf4h-1.2.7
. I do cabal install rdf4h --reinstall
and it fails with:
src/Text/RDF/RDF4H/ParserUtils.hs:46:33:
Couldn't match expected type `network-2.5.0.0:Network.URI.URI'
with actual type `URI'
In the `rqURI' field of a record
In the expression:
Request
{rqURI = uri, rqMethod = GET,
rqHeaders = [Header HdrConnection "close"], rqBody = B.empty}
In an equation for `request':
request uri
= Request
{rqURI = uri, rqMethod = GET,
rqHeaders = [Header HdrConnection "close"], rqBody = B.empty}
Failed to install rdf4h-1.2.7
cabal: Error: some packages failed to install:
rdf4h-1.2.7 failed during the building phase. The exception was:
ExitFailure 1
It should probably be noted that cabal decided to also upgrade to network-uri-2.6.0.1
package.
[10 of 10] Compiling Main ( src/Rdf4hParseMain.hs, dist/build/rdf4h/rdf4h-tmp/Main.dyn_o )
Linking dist/build/rdf4h/rdf4h ...
Preprocessing test suite 'test-rdf4h' for rdf4h-1.3.2...
testsuite/tests/Test.hs:5:18:
Could not find module ‘Data.RDF.TriplesGraph_Test’
Perhaps you meant
Data.RDF.TriplesGraph (needs flag -package-key rdf4h-1.3.2@rdf4h_H6kO2G7c2mkK5Er9o3qrbC)
Use -v to see a list of the files searched for.
testsuite/tests/Test.hs:6:18:
Could not find module ‘Data.RDF.MGraph_Test’
Perhaps you meant
Data.RDF.MGraph (needs flag -package-key rdf4h-1.3.2@rdf4h_H6kO2G7c2mkK5Er9o3qrbC)
Use -v to see a list of the files searched for.
testsuite/tests/Test.hs:7:18:
Could not find module ‘Data.RDF.PatriciaTreeGraph_Test’
Perhaps you meant
Data.RDF.PatriciaTreeGraph (needs flag -package-key rdf4h-1.3.2@rdf4h_H6kO2G7c2mkK5Er9o3qrbC)
Use -v to see a list of the files searched for.
testsuite/tests/Test.hs:8:18:
Could not find module ‘Text.RDF.RDF4H.XmlParser_Test’
Perhaps you meant
Text.RDF.RDF4H.XmlParser (needs flag -package-key rdf4h-1.3.2@rdf4h_H6kO2G7c2mkK5Er9o3qrbC)
Use -v to see a list of the files searched for.
testsuite/tests/Test.hs:9:18:
Could not find module ‘Text.RDF.RDF4H.TurtleParser_ConformanceTest’
Use -v to see a list of the files searched for.
testsuite/tests/Test.hs:11:18:
Could not find module ‘W3C.RdfXmlTest’
Use -v to see a list of the files searched for.
testsuite/tests/Test.hs:12:18:
Could not find module ‘W3C.NTripleTest’
Use -v to see a list of the files searched for.
testsuite/tests/Test.hs:13:8:
Could not find module ‘Data.RDF.GraphTestUtils’
Use -v to see a list of the files searched for.
testsuite/tests/W3C/TurtleTest.hs:8:8:
Could not find module ‘W3C.Manifest’
Use -v to see a list of the files searched for.
builder for ‘/nix/store/cll0fh0h91di2c75fncw8k6w8gfw93dj-rdf4h-1.3.2.drv’ failed with exit code 1
error: build of ‘/nix/store/cll0fh0h91di2c75fncw8k6w8gfw93dj-rdf4h-1.3.2.drv’ failed
Tried with 1.3.3, too, the same problem.
I suggest we discuss implementing W3C test suite in this thread.
It is currently developed in my repository, in "w3tests" branch: https://github.com/cordawyn/rdf4h/tree/w3tests
ping @wismill
Where are IRIError
and SchemaError
used?
I would've thought that the left value for the following functions:
mkIRI :: Text -> Either String IRI
parseIRI :: Text -> Either String IRIRef
parseRelIRI :: Text -> Either String IRIRef
validateIRI :: Text -> Either String Text
resolveIRI :: Text -> Text -> Either String Text
Would be IRIError
or SchemaError
rather than String
?
Running cabal -O2 new-build
and cabal -O2 new-test --enable coverage
results in the following:
rdf4h-3.1.1-test-rdf4h.log
@wismill has completely re-implemented the RDF/XML parser, using the xmlbf library. Not only is it a correct implementation (passing w3c unit tests), it is also faster. See #67 (comment) .
It doesn't useful to expose both XML parsers to users. The options are:
Keep the XmlParserHXT
module in this repository, for reference, but don't expose it in the cabal file.
Remove the XmlParserHXT
module, i.e the srcText/RDF/RDF4H/XmlParserHXT.hs
file.
@wismill thoughts?
From #67 (comment) .
From xmlbf, we could get XML/RDF serialisation for free from a ToXml instance in rdf4h? I.e.
instance RdfSerializer XmlSerializer where ...
in a new file, src/Text/RDF/RDF4H/XmlSerialzer.hs
.
I don't understand some of the same
definitions for the property test cases for select*. E.g. two triples t1
and t2
are apparently the same for p_select_match_spo
in
same t1 t2 = subjectOf t1 == subjectOf t2 && predicateOf t1 == predicateOf t2 &&
objectOf t1 /= objectOf t2
Why objectOf t1 /= objectOf t2
? I'd have thought objectOf t1 == objectOf t2
.
This oddity is seen in p_select_match_sp
and p_select_match_so
and p_select_match_spo
.
https://github.com/robstewart57/rdf4h/blob/master/testsuite/tests/Data/RDF/GraphTestUtils.hs
In an application I produce and read triples (mostly for error tracking) and found that Triple
is automatically instantiated for Show but not for Read. Is this intentional?
Changing the code to include Read seems to work (at least I could compile and my code does not show any problem). I would appreciate if this change could be incorporated in the hackage version.
thank you!
andrew
Alga is a library for algebraic construction and manipulation of graphs in Haskell. See the Haskell 2017 paper Algebraic Graphs with Class (Functional Pearl) (link).
The idea would be to implement a new module Data.RDF.Graph.Alga
, with an implementation for all methods in the Rdf
class, i.e.
instance Rdf Alga where ...
TravisCI shows that the versions in the cabal and/or stack yaml file are incompatible with:
See https://travis-ci.org/robstewart57/rdf4h/builds/540783687 .
It is not always clear what the "base URI" means in various places of this library (see this comment.
Therefore we should be more precise in naming and documentation to ensure consistency.
References:
What's the syntax for constructing a query matching against a language literal like "apple"@en
?
The docs don't show examples for language literals.
I posted a question on stackoverflow about this recently. Basically, if you have an XML data structure like this:
<pgterms:bookshelf>
<rdf:Description rdf:nodeID="N8d8ab517be5d4d24a574a79c302445fc">
<dcam:memberOf rdf:resource="2009/pgterms/Bookshelf"/>
<rdf:value>Napoleonic(Bookshelf)</rdf:value>
</rdf:Description>
</pgterms:bookshelf>
It seems to be completely ignored by RDF4H. The blank nodes show up if I first convert the XML file to a Turtle file.
When parsing a RDF collection, the turtle parser does not restore the subject context. Hence, all further predicate-object tupels are added to the last list node of the created collection:
Example:
@prefix : <http://example.org/foo#> .
:subject
:predicate1 ( :a ) ;
:predicate2 :b .
The parser will add :predicate2 :b
to the last list blank node instead of the :subject
.
Is there any thought to adding support for JSON-LD? If no, any interest in pull requests that attempt to add rudimentary support?
Question here.
In the rdf4h
tutorial, adding triples to the graph is done the following way:
main :: IO ()
main = do
-- empty list based RDF graph
let myEmptyGraph = empty :: RDF TList
triple1 = triple (unode "...") (unode "...") (unode "...")
graph1 = addTriple myEmptyGraph triple1
triple2 = triple (unode "...") (unode "...") (unode "...")
graph2 = addTriple graph1 triple2
graph3 = removeTriple graph2 triple1
putStrLn (showGraph graph3)
Is it currently possible to simply combine those operations (addTriple
, removeTriple
) without creating temporary variables (graph1
, graph2
) ?
Something in monadic style would be nice. For example:
createGraph = do
let triple1 = triple (unode "...") (unode "...") (unode "...")
addTriple $ triple1
addTriple $ triple (unode "...") (unode "...") (unode "...")
removeTriple $ triple1
I worked on various optimizations of the NTriplesParser. It's not ready for a PR yet, but if you're interested, you can check it out from the optimize1
branch of my fork (https://github.com/mgmeier/rdf4h/tree/optimize1).
There are two major optimizations:
Blank node label parsing
oneOf
/ noneOf
parsers with large sets of characters are always a red flag with parsec. Particularly one oneOf
parser had a range over thousands of characters combined. The problem with these is, that all these values have to be constructed (by enumeration) for the parser to run; in the worst case, multiple times, in case the corresponding closure gets GC'ed. I replaced them with simple range checks: if (c >= x && c <=y) || (c >= x' && c <=y') ...
This brought down parsing a .nt
file (~600000 triples) with many blank node labels from completely unusable to 45 seconds.
UNode parsing
I used a combination of different optimizations:
Text
values prematurely, especially not thousands of singletons; at the same time avoid conversions to and from String
for running several validations. I had to duplicate some of the functionality in Types.hs
for that.Triple
type is replaced by a intermediate type containing hash values; the triples are filled in only at the end of a parse.These optimizations gave a speed-up of ~35% parsing an .nt
file with loads of UNodes/URIs.
If you're interested in my approach and can verify that it does improve things, we could definitely discuss on how to proceed from here (maybe IRC or mail), since what I suggested here is a proof of concept, which may need more cleanup and streamling the design with the rest of the library. Also the optimizations imply some complex changes and are more than simple one-liners. However, I'd be happy to cotribute to rdf4h, so let me know what you think.
BTW I have not touched any other parsers, the library's core types or any RDF representations.
This bug is identified by the conformance-xml-example09
test.
comparing-graphs: FAIL
Exception: user error (Graph xml-example09 not equivalent to expected:
Expected:
Triple (UNode "http://example.org/item01") (UNode "http://example.org/stuff/1.0/prop") (LNode (TypedL "<a:Box xmlns:a=\"http://example.org/a#\" required=\"true\">\n <a:widget size=\"10\"></a:widget>\n <a:grommit id=\"23\"></a:grommit></a:Box>\n " "http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"))
Found:
Triple (UNode "http://example.org/item01") (UNode "http://example.org/stuff/1.0/prop") (LNode (TypedL "<a:Box required=\"true\">\n <a:widget size=\"10\"></a:widget>\n <a:grommit id=\"23\"></a:grommit></a:Box>\n " "http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"))
)
See https://travis-ci.org/robstewart57/rdf4h/jobs/540783693#L1739 .
It's quite likely that xmlbf will proceed with the removal of monad transformers support.
https://gitlab.com/k0001/xmlbf/issues/25
Currently, the XML parser relies on a commit in the xmlbf repository that has transformers support:
https://github.com/robstewart57/rdf4h/blob/master/src/Text/RDF/RDF4H/XmlParser.hs
Could the implementation of XmlParser.hs
be adapted, as suggested in https://gitlab.com/k0001/xmlbf/issues/25#note_178094971 , removing the need to rely on transformers but preserving the stateful nature of the RDF/XML parser with StateT
?
CC @wismill
Unless I'm missing something, the ParseURLs example casts the result as a TriplesList, which doesn't seem to be exposed anywhere.
Parsing the following:
<#datatypes-intensional-xsd-integer-decimal-compatible> a mf:NegativeEntailmentTest;
mf:name "datatypes-intensional-xsd-integer-decimal-compatible";
rdfs:comment """
The claim that xsd:integer is a subClassOF xsd:decimal is not
incompatible with using the intensional semantics for
datatypes.
""";
rdfs:approval rdft:Approved;
mf:entailmentRegime "RDFS" ;
mf:recognizedDatatypes ( xsd:decimal xsd:integer ) ;
mf:unrecognizedDatatypes ( ) ;
mf:action <datatypes-intensional/test001.nt>;
mf:result false .
(from http://www.w3.org/2013/rdf-mt-tests/manifest.ttl) produces invalid output:
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#NegativeEntailmentTest"))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#name"),LNode(PlainL(datatypes-intensional-xsd-integer-decimal-compatible)))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2000/01/rdf-schema#comment"),LNode(PlainL(
The claim that xsd:integer is a subClassOF xsd:decimal is not
incompatible with using the intensional semantics for
datatypes.
)))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2000/01/rdf-schema#approval"),UNode("http://www.w3.org/ns/rdftest#Approved"))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#entailmentRegime"),LNode(PlainL(RDFS)))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#recognizedDatatypes"),BNodeGen(40))
Triple(BNodeGen(40),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#first"),UNode("http://www.w3.org/2001/XMLSchema#decimal"))
Triple(BNodeGen(40),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#rest"),BNodeGen(41))
Triple(BNodeGen(41),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#first"),UNode("http://www.w3.org/2001/XMLSchema#integer"))
Triple(BNodeGen(41),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#rest"),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"))
Triple(BNodeGen(41),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#unrecognizedDatatypes"),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"))
Triple(BNodeGen(41),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#action"),UNode("xxxdatatypes-intensional/test001.nt"))
Triple(BNodeGen(41),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#result"),LNode(TypedL(false,"http://www.w3.org/2001/XMLSchema#boolean")))
Note that it breaks on Triple(BNodeGen(41),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#unrecognizedDatatypes"),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"))
where "subject" must be UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible")
, not BNodeGen(41)
.
Also, the last 2 triples must also have UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#unrecognizedDatatypes")
for their subject.
However, parsing of a list with mf:entries
(earlier in this file) is correct.
So, the parser either stumbled on "inline" presentation of a list, trailing ";" or something in between ;-)
Migrate Unode Text
to:
data Node =
UNode Network.URI
| BNode !T.Text
| BNodeGen !Int
| LNode !LValue
deriving Generic
The primary benefit is the URI validation that network-uri
implements according to the RFC3986 standard. Two current blockers are:
Generic
instance for Node
. However, URI
has no Generic
instance in network-uri. I pull request has been opened. haskell/network-uri#12The new XML parser depends on monad transformer additions added by @wismill to the xmlbf library. This dependency will be reflected in the release of 0.6 of the xmlbf library.
Once xmlbf-0.6 has been uploaded to hackage, the stack.yaml
file for rdf4h should add xmlbf-0.6 as an extra dependency, removing the information about the xmlbf git repo and commit ID, since stack will find xmlbf-0.6 on hackage.
TurtlePaser fails to parse literals with single quotes, e.g.:
<#literal_with_dquote> rdfs:comment 'literal with dquote "x\"y"' .
(see line 418 in http://www.w3.org/2013/N-TriplesTests/manifest.ttl)
Error message:
Left (ParseFailure "(line 418, column 17):\nunexpected \"'\"\nexpecting whitespace-or-comment or object")
nb: Also applies to http://www.w3.org/2013/N-QuadsTests/manifest.ttl
We don't know what the construction or query performance is for MGraph
or TriplesGraph
. We should know about this so that we can spot which is faster for each use case of the rdf4h API. We should also have this in place so that any new implementations of an RDF
instance can be measured against existing ones, which relates to #19 .
Criterion would give us a robust benchmarking platform to understand the performance of the rdf4h API, and its performance limitations. https://hackage.haskell.org/package/criterion
It might be good to have a different common benchmark. There are some datasets here: https://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets
Some w3c tests are supposed to be the retrieval IRI for that file, e.g. turtle-subm-01.ttl
@prefix : <#> .
[] :x :y .
Should be parsed into (turtle-subm-01.nt
):
_:genid1 <http://www.w3.org/2013/TurtleTests/turtle-subm-01.ttl#x> <http://www.w3.org/2013/TurtleTests/turtle-subm-01.ttl#y> .
This is explicit at http://www.w3.org/2013/TurtleTests/ :
Relative IRI Resolution: The home of the test suite is the URL of this page. Per RFC 3986 section 5.1.3, the base IRI for parsing each file is the retrieval IRI for that file. For example, the tests turtle-subm-01 and turtle-subm-27 require relative IRI resolution against a base of http://www.w3.org/2013/TurtleTests/turtle-subm-01.ttl and http://www.w3.org/2013/TurtleTests/turtle-subm-27.ttl respectively.
How should the w3c rdf4h tests deal with this?
See title, basically a follow-up to robstewart57/hsparql#13.
See #67 (comment) .
the function typedL applied to a value of 0 and a schema "http://www.w3.org/2001/XMLSchema#integer"
produces an empty string, not "0"; the empty string is not an acceptable value for an integer according to https://www.w3.org/TR/xmlschema-2/#integer
.
The problem originates with the lines
_integerStr, _decimalStr, _doubleStr :: T.Text -> T.Text
_integerStr = T.dropWhile (== '0')
which consumes all zeros from "0".
I think a conversion using printf would be simpler?
`
Discussion moved from #67 (comment) .
serialize . parse == id
Something like this would be a good property check for all three Ntriples, Turtle and RDF/XML formats.
Ideally we'd have:
parse . serialize == id
Since we use the generator instances we already have for RDF graphs in rdf4h. i.e.
Use the Arbitrary
instances in testsuite/tests/Data/RDF/PropertyTests.hs
to generate RDF graphs.
Serialise the graph to NTriples, Turle and RDF/XML formats.
Parse that data back into RDF graphs in Haskell.
Check the graphs are equivalent.
As you say @wismill , it comes down the how equivalence check is performed. We do have isIsomorphic
and isGraphIsomorphic
in https://github.com/robstewart57/rdf4h/blob/master/src/Data/RDF/Query.hs .
The problem we have with property based testing of serialize . parse == id
, is that we'd need predefined NTriples, Turtle and RDF/XML inputs to parse.
the documentation gives the impression that the conversion from triples to an rdf (eg. TripleGraph) will handle the prefixes which are defined in the namespace. in my tests (and my perusal of the code) this seems not to be the case.
i suggest to update the documentaiton accoringly (or to implement the mapping of prefixes).
thank you for very useful code!
andrew frank
When building for Stackage LTS 8:
/tmp/stackage-build13/rdf4h-3.0.1$ dist/build/test-rdf4h/test-rdf4h
test-rdf4h: rdf-tests/turtle/manifest.ttl: openFile: does not exist (No such file or directory)
if using rdf4h with ghc 7.4.1 the dependency for ghc-prim is noted. it is required for the import of GHC.Generics in Types.
i would suggest to separate the rdf4h library from the code used for testing and move this code to a separate package; some of the librariess, e.g. quickcheck, are not available on the currently popular ARM plattform (e.g. raspberry, cubie etc). The requirements for the rdf4h library is much smaller than for the testing harness.
(it is not a big problem to edit the cabal file for such installations, but it is again one step away from a standardized procedure).
thank you for consideration!
andrew
query_match_spo: FAIL
*** Failed! Falsifiable (after 29 tests):
Triple (UNode "ex:o1") (UNode "ex:o1") (LNode (PlainLL "earth" "fr"))
Triple (UNode "ex:o1") (UNode "http://www.example.org/foo1") (LNode (TypedL "earth" "http://www.w3.org/2001/XMLSchema#string"))
Triple (UNode "ex:p1") (UNode "ex:s1") (BNode ":_genid3")
Triple (UNode "ex:s1") (UNode "http://www.example.org/bar1") (UNode "ex:s1")
Triple (UNode "ex:s1") (UNode "http://www.example.org/bar1") (LNode (TypedL "world" "http://www.w3.org/2001/XMLSchema#token"))
Triple (UNode "http://www.example.org/foo1") (UNode "ex:o1") (UNode "ex:s2")
Triple (BNode ":_genid1") (UNode "http://www.example.org/bar1") (LNode (TypedL "hello" "http://www.w3.org/2001/XMLSchema#int"))
Triple (BNode ":_genid5") (UNode "http://www.example.org/foo0") (UNode "http://www.example.org/bar0")
Just (Triple (UNode "ex:s1") (UNode "http://www.example.org/bar1") (UNode "ex:s1"))
Use --quickcheck-replay=566405 to reproduce.
I'm seeing two tests failing. It's the same test, and both the attoparsec and the parsec instances of Parser
fail.
turtle-subm-27: FAIL
Exception: HUnitFailure (Just (SrcLoc {srcLocPackage = "main", srcLocModule = "W3C.W3CAssertions", srcLocFile = "testsuite/tests/W3C/W3CAssertions.hs", srcLocStartLine = 24, srcLocStartCol = 3, srcLocEndLine = 24, srcLocEndCol = 99})) (Reason "not isomorphic: Triple (UNode \"http://w3c.github.io/rdf-tests/turtle/a1\") (UNode \"http://w3c.github.io/rdf-tests/turtle/b1\") (UNode \"http://w3c.github.io/rdf-tests/turtle/c1\")\nTriple (UNode \"http://example.org/ns/a2\") (UNode \"http://example.org/ns/b2\") (UNode \"http://example.org/ns/c2\")\nTriple (UNode \"http://example.org/ns/foo/a3\") (UNode \"http://example.org/ns/foo/b3\") (UNode \"http://example.org/ns/foo/c3\")\nTriple (UNode \"http://example.org/ns/foo/bar#a4\") (UNode \"http://example.org/ns/foo/bar#b4\") (UNode \"http://example.org/ns/foo/bar#c4\")\nTriple (UNode \"http://example.org/ns2#a5\") (UNode \"http://example.org/ns2#b5\") (UNode \"http://example.org/ns2#c5\")\n compared with Triple (UNode \"http://www.w3.org/2013/TurtleTests/a1\") (UNode \"http://www.w3.org/2013/TurtleTests/b1\") (UNode \"http://www.w3.org/2013/TurtleTests/c1\")\nTriple (UNode \"http://example.org/ns/a2\") (UNode \"http://example.org/ns/b2\") (UNode \"http://example.org/ns/c2\")\nTriple (UNode \"http://example.org/ns/foo/a3\") (UNode \"http://example.org/ns/foo/b3\") (UNode \"http://example.org/ns/foo/c3\")\nTriple (UNode \"http://example.org/ns/foo/bar#a4\") (UNode \"http://example.org/ns/foo/bar#b4\") (UNode \"http://example.org/ns/foo/bar#c4\")\nTriple (UNode \"http://example.org/ns2#a5\") (UNode \"http://example.org/ns2#b5\") (UNode \"http://example.org/ns2#c5\")\n")
Inspired by the rapper executable, which is built on top of the Raptor library:
http://librdf.org/raptor/rapper.html
E.g.
rapper -o ntriples http://planetrdf.com/guide/rss.rdf
rapper -i rss-tag-soup -o rss-1.0 pile-of-rss.xml http://example.org/base/
rapper --count http://example.org/index.rdf
This functionality is already partially supported by the rdf4h
executable.
In Data.RDF.Types
there is an import of instances
import Data.Text.Lazy.Binary ()
however, I cannot find a package that has or ever had such a module. I changed it to import Data.Text.Binary ()
from text-binary and that seems to work, but I am puzzled.
URIs and abbreviated forms are reversed. Literals are written correctly.
I see commit messages about optimizing comparison by storing URIs reversed. Is this some part of that?
Here is my simple Turtle file :
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://example.org> .
<http://example.org> rdf:type owl:Ontology ;
owl:versionIRI <http://example.org/0.1> .
The following code is trying to read the input file and read its baseUri.
main :: IO ()
main = do
graphOpt <- (parseFile (TurtleParser Nothing Nothing) "test.owl" :: IO (Either ParseFailure (RDF TList)))
case graphOpt of
Left _ -> putStrLn "Error..."
Right graph -> do
let myBaseUri = unBaseUrl $ fromJust $ baseUrl graph
putStrLn myBaseUri
However, I'm getting a Maybe.fromJust: Nothing
error. Something I'm doing wrong ?
Hi there,
I am working on parsing a large turtle file, ideally I would like to turn it into an equivalent haskell program.
I have been profiling the read function and see a growth over time in memory and other things 👍
For 30k lines of the file I got these stats from rdf4h-3.0.1 release from stack.
total alloc = 29,235,026,136 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
>>= Text.Parsec.Prim Text/Parsec/Prim.hs:202:5-29 17.4 7.1
satisfy Text.Parsec.Char Text/Parsec/Char.hs:(140,1)-(142,71) 16.2 32.7
noneOf.\ Text.Parsec.Char Text/Parsec/Char.hs:40:38-52 14.3 0.0
We can see that a large amount of memory and time is spent in the parsec. I am wondering the following :
Examples of the files are here:
https://gist.github.com/h4ck3rm1k3/e1b4cfa58c4dcdcfc18cecab013cc6c9
Now that rdf4h
has a complete support for NTriples & Turtle, it may be a good time to focus on performance:
As mentioned in #35 and #44, there are several places where we could improve the parsers. I think it would be a good idea to keep only one modern parser library (attoparsec
or megaparsec
) to keep the implementation simple and make it more efficient.
I think the handling of prefixes in UNode
is not satisfying. For instance, several important operations require expandTriples
which is very expensive. I propose that we remove expandTriples
and make use of a smart constructor unode :: Text -> Either IRIError UNode
for IRI (currently merely a constructor synonym) that ensure the IRI is a valid absolute IRI. Then have a function mkIRI
that accept a namespace (or a prefix mapping using a new type class) to create IRIs, from a relative IRI or a prefixed IRI (see expandURI
).
Edit: change the proposed signature of unode
to use Either rather than Maybe.
Hello,
First of all thank you for implementing this library. I got how it works pretty quickly and it helped me to solve a lot of implementation details pretty quickly.
Currently I am working on a project to generate code for schema.org
schemas which are implemented in rdf-schema format. Here is an example:
https://raw.githubusercontent.com/schemaorg/schemaorg/master/data/releases/3.7/schema.nt
I wanted to parse this format into a schema object which will represent the class structure of rdf schema. After that, I am planing to generate code for different programming languages.
Unfortunately, I am stuck on parsing the schema. I can parse rdf schema as an rdf document like this:
https://github.com/huseyinyilmaz/schemaorg/blob/master/library/Download.hs#L30
This format does not represent the schema itself, instead I would have just have triples that I need to parse to a schema structure. Also it turns out that rdf-schema documents can have references to other schema files with their internet address. So those references should also be downloaded and parsed. Here is an example
<http://schema.org/spatial> <http://www.w3.org/2002/07/owl#equivalentProperty> <http://purl.org/dc/terms/spatial> .
So my question is, does rdf4h library parse rdf-schema links to validate the documents? If not is there a plan to support rdf-schema validation?
dear robert
the bug which lead to the long discussion in the other bug report was hitting me again today. the error is very simply in the turtle serializer code: the code expects a map from url to prefix (not prefix to url - it is reversed in a function a bit above, for reasons i do not understand). then the test for the match of the prefix must be with the second element (not the first). i changed from (k,) to (,k) and it works.
can you check my fix and put it into the repository? thank you!
-- Expects a map from uri to prefix, and returns the (prefix, uri_expansion)
-- from the mappings such that uri_expansion is a prefix of uri, or Nothing if
-- there is no such mapping. This function does a linear-time search over the
-- map, but the prefix mappings should always be very small, so it's okay for now.
findMapping :: Map T.Text T.Text -> T.Text -> Maybe (T.Text, T.Text)
findMapping pms uri =
case mapping of
Nothing -> Nothing
Just (u, p) -> Just (p, T.drop (T.length u) uri) -- empty localName is permitted
where
mapping = find ((_, k) -> T.isPrefixOf k uri) (Map.toList pms)
-- exchanged _ and k: the map is from uri to prefix, check for k match as prefix to uri
-- it was reversed in writeTriples
Data.RDF.Types.isAbsoluteURI
assumes that the URI will be valid (and uses fromJust
), which is not guaranteed to be the case. The functions that use it directly also don't seem to allow a failure. Perhaps everything affected should be wrapped into Maybe
, or something that would report the cause of a failure, in order to save potential debugging effort when it happens.
let Right (g::RDF TList) =
parseString
NTriplesParser
(Data.Text.pack "<http://a.example/s> <http://a.example/p> \"\\r\" .")
Evaluating g:
Triple (UNode "http://a.example/s") (UNode "http://a.example/p") (LNode (PlainL "r"))
So the \r
character is being parsed just as r
.
This is the reason for a number of TurtleParser tests failing, since their results are compared against NTriple golden references parsed with the NTriples parser, e.g.
stack test --test-arguments="--pattern literal_with_escaped_CARRIAGE_RETURN"
stack test --test-arguments="--pattern literal_with_CHARACTER_TABULATION"
I stumbled upon isAbsoluteUrl
function which detects whether a given string presents an absolute URL. It does so by merely detecting a ":" within the given string.
Now, since we already import Network.URI
, I thought we could reuse its isAbsoluteURI
to rely on "industry-proven" power instead of reinventing the wheel. Replacing our custom isAbsoluteUrl
with isAbsoluteURI
resulted in +33 failing test cases. This should be investigated and fixed, I believe.
There are, perhaps, more cases where we could replace our functions with those from Network.URI
. I suggest that we do so at some point.
In the following code, both query results are expected equal, but they are not. Only the first result with TList
is correct.
rdf = "PREFIX ex: <ex:> ex:s1 ex:p1 ex:o1 ; ex:p2 ex:o2 ."
query' g = query g Nothing (Just $ unode "ex:p1") Nothing
parser = TurtleParser Nothing Nothing
g1 = parseString parser rdf :: Either ParseFailure (RDF TList)
query' <$> g1
-- Right [Triple (UNode "ex:s1") (UNode "ex:p1") (UNode "ex:o1")]
g2 = parseString parser rdf :: Either ParseFailure (RDF AdjHashMap)
query' <$> g2
-- Right [Triple (UNode "ex:s1") (UNode "ex:p1") (UNode "ex:o1"),Triple (UNode "ex:s1") (UNode "ex:p1") (UNode "ex:o2")]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.