Git Product home page Git Product logo

delphin-rdf's Introduction

Open Portuguese WordNet (OWN-PT)

This repository hosts Portuguese WordNet data in textual format, this is an experimental branch of http://openwordnet-pt.org. It is linked to (but independent from) the Open English WordNet.

You can also get the data in JSON and RDF format.

See the Wiki for how the data was generated, how it compares to Princeton WordNet and what is the syntax of the text files. This data is validated and exported by the mill tool โ€” see its repository for more information about validation, export formats, etc.

delphin-rdf's People

Contributors

arademaker avatar fredsonerd avatar yfaria avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

yfaria fredsonerd

delphin-rdf's Issues

DMRS transformations to RDF is broken

SENT: A person is riding the bicycle on one wheel

We have 3 analyses but each analysis should be a separate graph. But notes are being reused.. BTW, <http://ibm.com/sick-b/22/nodes#node10002#predicate> is not a valid URI (no two #)

error installation

@yfaria can you help?

(venv) ar@tenis delphin.rdf % pip install .
Processing /Users/ar/hpsg/delphin.rdf
  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
    ERROR: Command errored out with exit status 1:
     command: /Users/ar/venv/bin/python3 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-req-build-3dxddjin/setup.py'"'"'; __file__='"'"'/private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-req-build-3dxddjin/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-pip-egg-info-qqyop0sk
         cwd: /private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-req-build-3dxddjin/
    Complete output (10 lines):
    running egg_info
    creating /private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-pip-egg-info-qqyop0sk/Delphin_RDF.egg-info
    writing /private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-pip-egg-info-qqyop0sk/Delphin_RDF.egg-info/PKG-INFO
    writing dependency_links to /private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-pip-egg-info-qqyop0sk/Delphin_RDF.egg-info/dependency_links.txt
    writing requirements to /private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-pip-egg-info-qqyop0sk/Delphin_RDF.egg-info/requires.txt
    writing top-level names to /private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-pip-egg-info-qqyop0sk/Delphin_RDF.egg-info/top_level.txt
    writing manifest file '/private/var/folders/b_/7nbv248s2nq019mcx58xrrb80000gn/T/pip-pip-egg-info-qqyop0sk/Delphin_RDF.egg-info/SOURCES.txt'
    package init file 'delphin/__init__.py' not found (or not a regular file)
    package init file 'delphin/cli/__init__.py' not found (or not a regular file)
    error: package directory 'delphin/codecs' does not exist
    ----------------------------------------
WARNING: Discarding file:///Users/ar/hpsg/delphin.rdf. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

interfaces

% delphin -h
usage: delphin [-h] [-V]  ...

PyDelphin command-line interface

optional arguments:
  -h, --help           show this help message and exit
  -V, --version        show program's version number and exit

available subcommands:

    compare            Compare MRS results across test suites
    convert            Convert DELPH-IN Semantics representations
    mkprof             Create [incr tsdb()] test suites
    process            Process [incr tsdb()] test suites using ACE
    profile-to-dmrs-rdf
                       delphin profile to dmrs-rdf
    profile-to-eds-rdf
                       delphin profile to eds-rdf
    profile-to-mrs-rdf
                       delphin profile to mrs-rdf
    profile-to-rdf     delphin profile to rdf
    repp               Tokenize sentences using REPP
    select             Select data from [incr tsdb()] test suites

Let us keep only the profile-to-rdf

% delphin profile-to-rdf -h
usage: delphin profile-to-rdf [-h] [-v] [-q] [-p PREFIX] [-o OUTPUT] [-f FORMAT] [--to SEMREP] profile

Transcribes a profile intro a RDF graph.

For more details, see: {https://github.com/arademaker/delph-in-rdf}.

positional arguments:
  profile        profile path

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose  increase verbosity
  -q, --quiet    suppress output on <stdout> and <stderr>
  -p PREFIX      URI prefix (default: http://example.com/example)
  -o OUTPUT      output file name (default: output.ttl)
  -f FORMAT      output file format (default: turtle)
  --to SEMREP    modeled semantic representation (default: mrs)
  1. update the link to the repository
  2. default format should be rdfxml or ntriples

DMRS representation: sortinfo

Related to #21.

For many tasks, we may not need the variable properties. This are the extra (some morphological) information attached to variables in MRS and projected to notes in DMRS (example).

In the DMRS XML DTD (see https://github.com/delph-in/docs/wiki/RmrsDmrs) these are grouped into a tag descendent from the node tag. I believe we should do the same for RDF.

<sortinfo SF="prop" TENSE="untensed" MOOD="indicative" PROG="bool" PERF="-" cvarsort="e" />

So, instead of

http://example.com/a/nodes/10003 http://www.delph-in.net/schema/erg#perf -
http://example.com/a/nodes/10003 http://www.delph-in.net/schema/erg#mood indicative
http://example.com/a/nodes/10003 http://www.delph-in.net/schema/erg#prog +
http://example.com/a/nodes/10003 http://www.delph-in.net/schema/erg#sf prop

We can have (note URIs according to #21 (comment)). The URI for the new node is because a sortinfo is always attached to a Node.

http://example.com/a/1#node-10003 dmrs:sortInfo http://example.com/a/1#sortinfo-10003


http://example.com/a/1#sortinfo-10003 rdf:type dmrs:SortInfo
http://example.com/a/1#sortinfo-10003 http://www.delph-in.net/schema/erg#perf "-"
http://example.com/a/1#sortinfo-10003 http://www.delph-in.net/schema/erg#mood "indicative"
http://example.com/a/1#sortinfo-10003 http://www.delph-in.net/schema/erg#prog "+"
http://example.com/a/1#sortinfo-10003 http://www.delph-in.net/schema/erg#sf "prop"

highlevel functions

related to #21 , we need some high-level functions over the transformations. For instance, in #21 I discussed the possible demand for a simplifier RDF. In #24 we discussed redundancy that we may also want to remove (with an extra parameter?). We also have to consider that the CLI use case, the batch transformation of a profile to RDF, is not the only use case for this plugin. Consider the programmatic transformation of sentences in a Python code calling Ace interactively, how to encode the sentence input text? Currently, the solution for that would be outside this library.

verbosity

% delphin profile-to-mrs-rdf -v -p http://ibm.com/sick/a/ -o ~/hpsg/sick/sentences-a.ttl ~/hpsg/sick/sentences-ap

does not give me any output...

gifferent graphs definition

Hello. While working in https://github.ibm.com/alexrad/extended-glosstag we had a problem in finding a DRMS by type. After that, I noticed that the DMRS node type definition was made using another context than that from the rest of the DMRS analisys.

In the following example we got a piece of an output in n-quads of a single DMRS annotation, for the sentence 'capable of reproducing; '. Notice the first line is defined in the context _:Nd959b4e2979f439abb3775916655aaae and the rest in the context <http://wordnet.princeton.edu/pwn30/01001689-a-1>.

<http://wordnet.princeton.edu/pwn30/01001689-a-1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.delph-in.net/schema/dmrs#DMRS> _:Nd959b4e2979f439abb3775916655aaae .
<http://wordnet.princeton.edu/pwn30/01001689-a-1#node-10004> <http://www.delph-in.net/schema/hasSortInfo> <http://wordnet.princeton.edu/pwn30/01001689-a-1#sortinfo-10004> <http://wordnet.princeton.edu/pwn30/01001689-a-1> .
<http://wordnet.princeton.edu/pwn30/01001689-a-1#node-10003> <http://www.delph-in.net/schema/hasPredicate> <http://wordnet.princeton.edu/pwn30/01001689-a-1#predicate-10003> <http://wordnet.princeton.edu/pwn30/01001689-a-1> .
<http://wordnet.princeton.edu/pwn30/01001689-a-1#annotation-10001> <http://www.delph-in.net/schema/lemma> "capable%3" <http://wordnet.princeton.edu/pwn30/01001689-a-1> .
...

It look like you're defining in a BlankNode context, and occur in all the cases:

...
<http://wordnet.princeton.edu/pwn30/01000442-a-1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.delph-in.net/schema/dmrs#DMRS> _:Nee6cb2e565ae4abe9406881e8504e405 .
<http://wordnet.princeton.edu/pwn30/01000442-a-3> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.delph-in.net/schema/dmrs#DMRS> _:Nee6cb2e565ae4abe9406881e8504e405 .
<http://wordnet.princeton.edu/pwn30/01000737-a-1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.delph-in.net/schema/dmrs#DMRS> _:Nee6cb2e565ae4abe9406881e8504e405 .
...

Did you guys get this problem?

terminology declarations in the DMRS RDF output?

dmrs:h  rdf:type  dmrs:ScopalRelation .
dmrs:neq  rdf:type  dmrs:ScopalRelation .
dmrs:arg1  rdf:type  dmrs:Role .
dmrs:rstr  rdf:type  dmrs:Role .

Those triples should be in the vocabularies, right? Why we are producing them in the graph of the DMRS?

sortinfo in DMRS

we should not produce a node that doesn't have any info besides its type...

http://wordnet.princeton.edu/pwn30/01000442-a-1#node-10005>
    <http://www.delph-in.net/schema/cfrom> 22 ;
    <http://www.delph-in.net/schema/cto> 25 ;
    <http://www.delph-in.net/schema/dmrs#hasId> 10005 ;
    <http://www.delph-in.net/schema/hasPredicate> <http://wordnet.princeton.edu/pwn30/01000442-a-1#predicate-10005> ;
    <http://www.delph-in.net/schema/hasSortInfo> <http://wordnet.princeton.edu/pwn30/01000442-a-1#sortinfo-10005> ;
    a <http://www.delph-in.net/schema/dmrs#Node> ;
    <http://www.w3.org/2000/01/rdf-schema#label> "_the_q<22,25>" .

...

<http://wordnet.princeton.edu/pwn30/01000442-a-1#sortinfo-10000>
    a <http://www.delph-in.net/schema/SortInfo> .

The quantifiers seem to not carry sortinfo information. See here

README

We need a step-by-step example of how to use the code in a python program. We have so far focused on the use of the CLI interface, but a python programmer may need to use the lib to convert one specific sentence to RDF.

Please add it in the README

version control

We are using version in this plugin , like Delphin-RDF-1.0.4. But we don't inform the produced RDF about the used version. We need to have version in the ttl vocabulary and also, somehow, in the produced RDF.. Maybe that can help debug.

DMRS representation

Considering the graph

image

We want it to be as closer as possible of its graphical representation

image

DMRS representation: named graphs

Related to #21

If a user asks for a representation that supports named graphs, we should be able to produce it. In the CLI, the representations are limited to the ones that RDFLib supports. See https://en.wikipedia.org/wiki/N-Triples#N-Quads as one format.

But a user may want to save in JSON-LD using or not one named graph per semantic representation... Moreover, inside a python code, the user may need to specify if a named graph should be used or all triples should be in the single default graph (more about these concepts). What alternatives do we have?

Regarding triple stores, Allegrograph supports N-Quads and JSON-LD, both formats compatible with named graphs. More about N-Quads.

Code reuse

I noticed the delphin.cli.profile_to_*_rdf modules have almost the same code. We should, eventually, invest um code reuse. An auxiliary function/subcomand such like delphin.cli.profile_to_rdf receiving an additional parameter may be enough to solving this issue.

plugin error

% python test.py
Connecting to AllegroGraph server -- host:'127.0.0.1' port:10035
Available catalogs:
  - <root catalog>
  - system
Available repositories in catalog 'None':
  - own
  - te
Repository te is up!
It contains 0 statement(s).
graph 1
NOTE: parsed 1 / 1 sentences, avg 238028k, time 10.03111s
Traceback (most recent call last):
  File "/Users/ar/venv/lib/python3.9/site-packages/rdflib/plugin.py", line 103, in get
    p = _plugins[(name, kind)]
KeyError: ('a', <class 'rdflib.store.Store'>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ar/hpsg/sick/test.py", line 96, in <module>
    g1 = simplify(filter1(process(sent1,"a")))
  File "/Users/ar/hpsg/sick/test.py", line 75, in process
    triples = rdf.dmrs_to_rdf(d, "http://example.com",id)
  File "/Users/ar/venv/lib/python3.9/site-packages/delphin/rdf/_dmrs_parser.py", line 39, in dmrs_to_rdf
    defaultGraph = Graph(store, identifier=BNode())
  File "/Users/ar/venv/lib/python3.9/site-packages/rdflib/graph.py", line 306, in __init__
    self.__store = store = plugin.get(store, Store)()
  File "/Users/ar/venv/lib/python3.9/site-packages/rdflib/plugin.py", line 105, in get
    raise PluginException(
rdflib.plugin.PluginException: No plugin registered for (a, <class 'rdflib.store.Store'>)

error in the graph

% python mapping.py ~/work/wn/glosstag/data/annotation-aaa.jl
01001689-a def capable of reproducing;
[{'form': 'capable', 'kind': ['wf'], 'lemmas': ['capable%3'], 'meta': {'pos': 'JJ'}, 'senses': ['capable%3:00:00::'], 'tag': 'man', 'begin': 0, 'end': 7}, {'form': 'of', 'kind': ['wf'], 'lemmas': ['of'], 'meta': {'pos': 'IN'}, 'tag': 'ignore', 'begin': 8, 'end': 10}, {'form': 'reproducing', 'kind': ['wf'], 'lemmas': ['reproduce%2'], 'meta': {'pos': 'VBG', 'sep': ''}, 'tag': 'un', 'begin': 11, 'end': 22}, {'form': ';', 'kind': ['wf'], 'meta': {'pos': ':', 'type': 'punc'}, 'tag': 'ignore', 'begin': 22, 'end': 23}]
NOTE: parsed 1 / 1 sentences, avg 3110k, time 0.58293s
Traceback (most recent call last):
  File "/Users/ar/work/wn/glosstag-evaluation/mapping.py", line 169, in <module>
    for t in process(s,s._id):
  File "/Users/ar/work/wn/glosstag-evaluation/mapping.py", line 137, in process
    graph1 = rdf.dmrs_to_rdf(d, "http://wordnet.princeton.edu/pwn30",v)
  File "/Users/ar/venv/lib/python3.9/site-packages/delphin/rdf/_dmrs_parser.py", line 39, in dmrs_to_rdf
    dmrsGraph = Graph(store=defaultGraph.store, identifier=DMRSI)
AttributeError: 'str' object has no attribute 'store'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.