lisp / de.setf.wilbur Goto Github PK
View Code? Open in Web Editor NEWa fork of net.sourceforge.wilbur updated for mcl and sbcl
a fork of net.sourceforge.wilbur updated for mcl and sbcl
The project is installable with Quicklisp:
(ql:quickload "wilbur")
This should be mentioned in the readme file to let new users know an easy way to try and run the library.
It looks like Wilbur has a problem with certain Unicode chars in certain circumstances.
Code to reproduce:
wget http://dbpedia.org/data/Semantic_Web.rdf
(defvar stream (open #P"Semantic_Web.rdf"
:direction :input
:external-format :utf-8))
(setf wilbur:*db*
(wilbur:parse-db-from-stream stream "http://dbpedia.org/page/Semantic_Web"))
Produces error both on CCL and SBCL:
> Error: Cannot decode this: (#\U+30BB #\U+30DE #\U+30F3 #\U+30C6 #\U+30A3 #\U+30C3 #\U+30AF #\U+30FB #\U+30A6 #\U+30A7 #\U+30D6)
> While executing: (:INTERNAL WILBUR::COLLAPSE WILBUR:COLLAPSE-WHITESPACE), in process listener(1).
debugger invoked on a SIMPLE-ERROR in thread
#<THREAD "main thread" RUNNING {AB2F861}>:
Cannot decode this: (#\HANGUL_SYLLABLE_U #\HANGUL_SYLLABLE_KEU
#\HANGUL_SYLLABLE_RA #\HANGUL_SYLLABLE_I
#\HANGUL_SYLLABLE_NA)
(WILBUR:COLLAPSE-WHITESPACE "우크라이나")
But everything works fine if the external format is not specified:
(defvar stream (open #P"Semantic_Web.rdf"
:direction :input))
(setf wilbur:*db*
(wilbur:parse-db-from-stream stream "http://dbpedia.org/page/Semantic_Web"))
Produces:
#<TEMPORARY-PARSER-DB size 157 #x1862A5C6>
That then can be successfully queried.
The problem is even more evident when using flexi-streams.
Based on: http://www.lassila.org/blog/archive/2012/04/time_to_revisit_1.html
Replace the custom-made HTTP client with Drakma.
I got the following error
XML -- missing NAMESPACE definition "doac:LanguageSkill"
[Condition of type WILBUR:MISSING-NAMESPACE-DEFINITION]
When running
(defvar stream1 (open #P"0675365413696898.rdf"
:direction :input))
(setf wilbur:*db*
(wilbur:parse-db-from-stream stream1 "0675365413696898.rdf"))
The file is actually correct:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bio="http://purl.org/vocab/bio/0.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:doac="http://ramonantonio.net/doac/0.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:fgvterms="http://www.fgv.br/terms/" xmlns:event="http://purl.org/NET/c4dm/event.owl#" xmlns:gn="http://www.geonames.org/ontology#" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:lattes="http://www.cnpq.br/2001/XSL/Lattes" xml:base="http://www.fgv.br/lattes/0675365413696898">
....
There is some documentation for the original project: http://wilbur-rdf.sourceforge.net/docs/
but it is not included in this version and there are no links to it.
For somebody new to this project it is very difficult to understand it without helpful documentation.
Please add documentation (docs) to this project.
Same task, similar code. For use RDFLib I had to convert to ntriples first, but besides that, Wilbur took ~1 hour and RDFLib did the same in ~ 1 min.
Any idea? How to investigate this difference?
$ rapper -c opennlp/Dissertation.pdf.rdf
rapper: Parsing URI file:///Users/ar/work/papers/opennlp/Dissertation.pdf.rdf with parser rdfxml
rapper: Parsing returned 865058 triples
$ rapper -o ntriples -i rdfxml opennlp/Dissertation.pdf.rdf > lixo.ntriples
$ time python3.7 rdf-to-json.py lixo.ntriples lixo.json
real 0m59.568s
user 0m58.309s
sys 0m0.830s
$ sbcl --noinform --noprint --eval "(load \"rdf-to-json.lisp\")" --eval "(main (nth 1 sb-ext:*posix-argv*) (nth 2 sb-ext:*posix-argv*))" --eval "(sb-ext:quit)" opennlp/Dissertation.pdf.rdf lixo.json
real 54m37.053s
user 54m18.341s
sys 0m7.938s
The http://mirror.informatimago.com/lisp/www.holygoat.co.uk/projects/twinql/index.html is included in this repo or not? I missed the functions sparql and parse-sparql.
With two serializer methods, it could be expected that if Wilbur correctly reads or generates triples/DB, then it would serializer correctly to both.
However, there seem to be some inconsistencies between the ntriples and the rdf/xml serializers. Some code that runs in the first one won't run on the other one.
For instance, I have some triples:
10: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:ID #1 {10052D3BC3}>
11: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FORM #"The" {10052D4313}>
12: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:LEMMA #"the" {10052D4523}>
13: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:UPOSTAG #"DET" {10052D4743}>
14: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:XPOSTAG #"DT" {10052D4963}>
15: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FEATS !conll:pronTypeArt {10052D4B73}>
16: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FEATS !conll:definiteDef {10052D4D33}>
17: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:HEAD #3 {10052D4F43}>
18: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:DEPREL #"det" {10052D5163}>
19: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:DEPS #"_" {10052D5373}>
They were generated by the following code (I've removed non-relevant parts):
(let* ((token-node (node (format nil "NAMESPACE:~a-~a" sentence-id (token-id token))))
(slots '(id form lemma upostag xpostag feats head deprel deps))
(slot-nodes
(list
'id `(,(wilbur:literal (slot-value token 'id)))
'form `(,(wilbur:literal (slot-value token 'form)))
'lemma `(,(wilbur:literal (slot-value token 'lemma)))
'upostag `(,(wilbur:literal (slot-value token 'upostag)))
'xpostag `(,(wilbur:literal (slot-value token 'xpostag)))
'feats (convert-features-to-rdf (slot-value token 'feats))
'head `(,(wilbur:literal (slot-value token 'head)))
'deprel `(,(wilbur:literal (slot-value token 'deprel)))
'deps `(,(wilbur:literal (slot-value token 'deps))))))
`(,@(mappend
#'(lambda (slot)
(mapcar
#'(lambda (value-node)
(wilbur:triple
token-node
(node (format nil "conll:~a" (string-upcase slot)))
value-node))
(getf slot-nodes slot)))
slots)))
This head
field is an integer. While serialization as ntriples works correctly, exporting it as a number, serialization as rdf/xml returns an error:
The value
3
is not of type
SEQUENCE
[Condition of type TYPE-ERROR]
Restarts:
0: [RETRY] Retry SLIME REPL evaluation request.
1: [*ABORT] Return to SLIME's top level.
2: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {1001FAFFA3}>)
Backtrace:
0: (SB-IMPL::SEQUENCE-TO-LIST 3) [tl,external]
1: (WILBUR::EXTENDED-STRING->CHAR-CODES 3)
2: (WILBUR:ESCAPE-XML-STRING 3 T)
3: ((LABELS WILBUR::DUMP :IN WILBUR::DUMP-AS-RDF/XML) !NAMESPACE:test-1 ((!#1=conll:DEPS . #"_") (!#1#:DEPREL . #"det") (!#1#:HEAD . #3) (#2=!#1#:FEATS . !#1#:definiteDef) (#2# . !#1#:pronTypeArt) (!#1#:..
4: (WILBUR::DUMP-AS-RDF/XML (#<WILBUR:TRIPLE #1=!#2=NAMESPACE:c6DC441D0-76F3-460E-A332-DC3F66422077 #3=!rdf:type !#4=conll:Corpus {10052D03E3}> #<WILBUR:TRIPLE #1# #5=!rdfs:label #"my-corpus" {10052D0693..
( @arademaker )
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.