circse / lt4hala Goto Github PK

<u><a href="https://circse.github.io/LT4HALA/" style="color: white">Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA)</a></u>

Python 100.00%

lt4hala's Introduction

Read me

lt4hala's People

Contributors

Stargazers

Watchers

Forkers

andresmlondon chenbingxiayu xy-liao ianliyi1996 sunwen1906 wangjiaqiys mumu77 deng-ge governance-foundation rachelesprugnoli helpfirecode pseudoszechwaniens 209-tongji

lt4hala's Issues

Typo in BellumCivile ?

In the file "Caesar_BellumCivile_LiberII.conllu", I found a surprising "benefficio" with a double-f. At "sent_id 222", you give "non proditi per illum Caesaris benefficio estis conseruati".
I went on the web http://agoraclass.fltr.ucl.ac.be/concordances/caesar_dbcII/lecture/4.htm and in [2,32] I read: "nonne proditi per illum Caesaris beneficio estis conseruati?"
It will have almost no consequences as it is in the training set. But it is supposed to be a "gold standard".

EvaLatin 2024

Given that one of the challenges of the Shared Task 2024 is "to understand which treebank (or combination of treebanks) is the most suitable to deal with new test data", what decision criteria are expected to guide participants in developing a model, if no training data is provided and it has only been disclosed that test data will contain "prose and poetic texts from different time periods"? This information is too generic to guide informed choices, considering that the UD Latin treebanks are quite unbalanced for genre and period and also have annotation scheme differences (e.g., "iobj" is mentioned in the Shared Task guidelines, but it only appears in the LLCT treebank). In the Shared Task guidelines, then, an example of the test data is given (from Caesar, De Bello Gallico, 4.1), but that same sentence is also available as training data in the PROIEL treebank (moreover, this sentence shows another issue left unspecified, i.e. tokenization, as in ne que vs neque). Without a better definition, the outcome of the Shared Task is going to be largely random.

circse / lt4hala Goto Github PK

lt4hala's Introduction

lt4hala's People

Contributors

Stargazers

Watchers

Forkers

lt4hala's Issues

Typo in BellumCivile ?

EvaLatin 2024

When will the data be released

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent