Git Product home page Git Product logo

Comments (4)

martinpopel avatar martinpopel commented on June 3, 2024

Option E: automatic PoS tags

We could also provide automatic upos tags for the test set of surprise languages, so the participants can use some kind of delexicalized parsing. It is planned to provide automatic upos tags (as well as tokenization, lemmas, xpos and features) for all the test sets anyway. For "standard" languages the upos tags will be assigned by UDPipe trained on the training data. For surprise languages, there will be (unless we decide for option D) no training data, so we will have to use e.g. 10-fold cross-validation. The only difference is that for the standard languages UDPipe annotation is an additional/optional resource (participants could create it themselves by running UDPipe on the training data), while for surprise languages the UDPipe upos tags will be a crucial resource (which cannot be replicated by the participants as they won't have access to the training labels).

from conll2017.

dan-zeman avatar dan-zeman commented on June 3, 2024

I have assumed that option E will be available to enable delex parsing. Participants should indicate if their system is also able to work without it, and we should credit their bravery in our summary paper, yet only the best score per system+dataset will be used in rankings.

from conll2017.

fginter avatar fginter commented on June 3, 2024

I think we should provide UDPipe annotation and I think we should also aim to provide other resources for the surprise language. Namely: unnannotated corpus and if at all possible also some parallel data / dictionary. I think this is a more realistic setting: noone is going to be building a parser if they don't have any text to parse (ie unannotated corpus is a pretty reasonable prior expectation methinks) and I would expect some sort of at least dictionary to exist for a language someone is interested enough to parse. I think it would be good to keep the shared task in a realistic, real-world setting, so that the results would be indicative of our ability to deliver parses to actual applications. Just my $.02

from conll2017.

dan-zeman avatar dan-zeman commented on June 3, 2024

Unannotated corpus should be possible to get, although it may be several orders of magnitude smaller than for the better-resourced languages. Parallel data might not be easy either. If we get the task, I will be happy to discuss this with those of you who want to help but I am not going to discuss it at a public place like thisβ€”I don't want to spoil the surprise :-).

As for the timing, my suggestion is (and I also briefly mention it in the proposal) that whatever we reveal about the surprise language will be available at the same time we make the test data accessible, i.e. May 2017. Note that it is mostly motivated by the term "surprise language" - it would be less of a surprise if all this was publicly known two month earlier. I admit that from the perspective of real-world applications, it is quite likely that you have more time (unless it's something like the famous Haiti quake response project). But I'd prefer not to give the people too much time to develop their own tools like, say, morphological analyzers. I would like to see approaches that are in theory prepared for any language, and then you only have time to quickly decide about configuration, selection of cross-language data, and run the trainer.

from conll2017.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.