We want to reconcile to a property, lexeme, anything else!

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Related: <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

My use case is: As a user, I want to reconcile a column of Lex

Enable reconciliation for other namespaces about openrefine-wikibase HOT 10 CLOSED

wetneb commented on June 14, 2024 3

Enable reconciliation for other namespaces

from openrefine-wikibase.

Comments (10)

thadguidry commented on June 14, 2024 2

Sure! The real data is proprietary with many columns of metadata, but for my use case, I can shrink to a sample.

LEXEME_FORM, SENSE
"archer", "ninth sign of zodiac in astrology"
"artist", "skilled in art"
"drowned", "died by drowning"
"saving", "rescuing, preserving"

After reconciling the LEXEME_FORM against Forms in Wikidata, the RECONCILED_TO_FORM would be the expected exact match form in the "en" language. The SENSE column is one of those metadata columns, and would not be reconciled, but compared to Senses retrieved from the parent Lexeme, for ex. for the 1st row, comparing to L29863 Senses and then I would choose the matching Sense, or if Sense not found, would choose to create New one.

I put the reconciled id into a new column so it's easier for you to build a test case against or reformat.
"saving" and "archer" are the outlier cases as you'll see when querying and part of the NLP discovery mechanisms for "missing Sense in Wikidata". :-)

LEXEME_FORM, SENSE, RECONCILED_TO_FORM
"archer", "ninth sign of zodiac in astrology", "L29863-F1"
"artist", "skilled in art", "L6357-F1"
"drowned", "died by drowning", "L12156-F3"
"saving", "rescuing, preserving", "L42004-F1"

from openrefine-wikibase.

belett commented on June 14, 2024 1

@thadguidry example is very good but here is a different and maybe more simpler example (could be useful for a first degraded prototype?).

Reconcile on lemmata alone.
All lexemes have one lemma (at least one - a few have several - stored in RDF with wikibase:lemma). So for instance, a simple first step is to reconcile :

first -> L2, L46028, L333590
magic -> L3, L338238, L587694
(I took on purpose examples with multiples homographs - identical string as lemma - as possible values)

If then we can use this reconciled values to gather more info (mainly the language of the lexeme dct:language and the lexical category wikibase:lexicalCategory - which also are always present one time and only one time on Lexemes) we could distinguish homographs and there it could be already very useful (to add a dictionary identifier for instance or to add a grammatical gender P5185, for instance we know that all French noun ending in -ment are masculine).

Having the senses (ontolex:sense) and the forms (ontolex:lexicalForm) would be ideal but it could wait for a next step on the roadmap.

from openrefine-wikibase.

wetneb commented on June 14, 2024

Related: #42

from openrefine-wikibase.

thadguidry commented on June 14, 2024

Hi @wetneb

I'm not currently interested in writing new lexemes to Wikidata, but only reconciling for now.
I have a column of words and want to constrain reconciling to only the Lexeme namespace.

How quickly could Lexeme lookup be added for this issue? Days, Weeks, Months?
If it's Months, what's involved in changing it quickly for my use case? Could the Wikidata reconcile endpoint be changed for this? Or could I just get this quickly, by running a Docker instance with the customized reconcile endpoint I need, tweaking the Python code in necessary places?
Could I also be able to constrain the Suggest Flyout to the Lexeme namespace by only tweaking suggest.py, anything else?

from openrefine-wikibase.

wetneb commented on June 14, 2024

I'd say a few days of work should be enough for the implementation itself, but I haven't really thought about how this should work from a user perspective. Should other namespaces be supported by the current reconciliation endpoint, or should it be a different endpoint altogether? Does it make sense to have fuzzy-matching for lexemes at all? How to deal with forms and senses?

I'd only be convinced that we got this right if I start thinking about doing a data import in lexemes and understand the needs from this perspective. Otherwise it's easy to have something that "supports" lexemes but doesn't actually adress the needs of the community.

from openrefine-wikibase.

thadguidry commented on June 14, 2024

I'd say a few days of work should be enough for the implementation itself, but I haven't really thought about how this should work from a user perspective.

Great!

Should other namespaces be supported by the current reconciliation endpoint, or should it be a different endpoint altogether?

Does it make sense to have fuzzy-matching for lexemes at all? How to deal with forms and senses?

The UI needs to be improved to allow users to choose the right Sense if it's available, showing the description of the sense.
dealing with Lexemes requires more options exposed to a user.

I'd only be convinced that we got this right if I start thinking about doing a data import in lexemes and understand the needs from this perspective. Otherwise it's easy to have something that "supports" lexemes but doesn't actually adress the needs of the community.

Agree, for my use case, I want to perform "exact matches" against the Form in a given language. But there's more to it with options. Some will care about reconciling against the Sense, and others will want reconciling against the Form (like me, regardless of the Sense which might not be known depending on the dataset a user is reconciling against.)
I think it makes sense to put an "[EPIC] -" prefix on this issue, sense it's a lot of work and questions actually beyond my simple use case.

from openrefine-wikibase.

wetneb commented on June 14, 2024

Feel free to describe your own use case more in detail, so that we have at least one data point!

from openrefine-wikibase.

thadguidry commented on June 14, 2024

My use case is:

As a user, I want to reconcile a column of Lexeme Forms against Wikidata's Lexeme Forms.
- I then would like to augment the Lexeme Forms by using the Lexeme Form Id's to query Wikidata for all the Lexeme's Sense IDs and Descriptions.
- I will then use NLP tools to help determine the right Sense ID that I need for my rows (comparing to other columns data, etc. since this is harder to do directly in OpenRefine.

from openrefine-wikibase.

wetneb commented on June 14, 2024

Thanks! Could you perhaps give a sample of the data you want to reconcile, and the expected results (corresponding lexemes / forms / senses)?

from openrefine-wikibase.

wetneb commented on June 14, 2024

OpenRefine has adopted the architecture of one reconciliation service per entity type, so it makes sense to keep this web service for items only. Other reconciliation services can be implemented separately.

from openrefine-wikibase.

Enable reconciliation for other namespaces about openrefine-wikibase HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent