The current proposal says that for surprise languages there will be "no training/dev d

Surprise languages vs. trainable parser task about conll2017 HOT 4 CLOSED

ufal commented on June 3, 2024

Surprise languages vs. trainable parser task

from conll2017.

Comments (4)

martinpopel commented on June 3, 2024

Option E: automatic PoS tags

We could also provide automatic upos tags for the test set of surprise languages, so the participants can use some kind of delexicalized parsing. It is planned to provide automatic upos tags (as well as tokenization, lemmas, xpos and features) for all the test sets anyway. For "standard" languages the upos tags will be assigned by UDPipe trained on the training data. For surprise languages, there will be (unless we decide for option D) no training data, so we will have to use e.g. 10-fold cross-validation. The only difference is that for the standard languages UDPipe annotation is an additional/optional resource (participants could create it themselves by running UDPipe on the training data), while for surprise languages the UDPipe upos tags will be a crucial resource (which cannot be replicated by the participants as they won't have access to the training labels).

from conll2017.

dan-zeman commented on June 3, 2024

I have assumed that option E will be available to enable delex parsing. Participants should indicate if their system is also able to work without it, and we should credit their bravery in our summary paper, yet only the best score per system+dataset will be used in rankings.

from conll2017.

fginter commented on June 3, 2024

I think we should provide UDPipe annotation and I think we should also aim to provide other resources for the surprise language. Namely: unnannotated corpus and if at all possible also some parallel data / dictionary. I think this is a more realistic setting: noone is going to be building a parser if they don't have any text to parse (ie unannotated corpus is a pretty reasonable prior expectation methinks) and I would expect some sort of at least dictionary to exist for a language someone is interested enough to parse. I think it would be good to keep the shared task in a realistic, real-world setting, so that the results would be indicative of our ability to deliver parses to actual applications. Just my $.02

from conll2017.

dan-zeman commented on June 3, 2024

Unannotated corpus should be possible to get, although it may be several orders of magnitude smaller than for the better-resourced languages. Parallel data might not be easy either. If we get the task, I will be happy to discuss this with those of you who want to help but I am not going to discuss it at a public place like this—I don't want to spoil the surprise :-).

As for the timing, my suggestion is (and I also briefly mention it in the proposal) that whatever we reveal about the surprise language will be available at the same time we make the test data accessible, i.e. May 2017. Note that it is mostly motivated by the term "surprise language" - it would be less of a surprise if all this was publicly known two month earlier. I admit that from the perspective of real-world applications, it is quite likely that you have more time (unless it's something like the famous Haiti quake response project). But I'd prefer not to give the people too much time to develop their own tools like, say, morphological analyzers. I would like to see approaches that are in theory prepared for any language, and then you only have time to quickly decide about configuration, selection of cross-language data, and run the trainer.

from conll2017.

Surprise languages vs. trainable parser task about conll2017 HOT 4 CLOSED

Comments (4)

Option E: automatic PoS tags

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent