Git Product home page Git Product logo

ud_croatian's Introduction

Universal Dependencies for Croatian

The Croatian UD treebank is built on top of the SETimes.HR dependency treebank of Croatian. It comprises roughly 4,000 sentences of newspaper text from the Southeast European Times news website taken from the SETimes parallel corpus.

The training set has 3,557 sentences (78,817 words) of newspaper text, while the development set contains 200 sentences (4,823 words) from the same source. The test set has 200 sentences (4,125 words): the first 100 sentences are newspaper text, while the other 100 sentences come from the Croatian Wikipedia.

Sentence and word segmentation was manually checked. The treebank does not include multiword tokens. No language-specific features and relations were used. The POS tags and features were converted from Multext East v4 and manually checked. The syntactic annotation was done manually.

When using the Croatian UD treebank, please cite the UD handle and the following paper:

See file LICENSE.txt for further licensing information.

Changelog

No change since UD v1.1.

=== Machine-readable metadata ================================================= Documentation status: stub Data source: semi-automatic Data available since: UD v1.1 License: CC BY-SA 4.0 Genre: news wiki Contributors: Agić, Željko; Ljubešić, Nikola

ud_croatian's People

Contributors

dan-zeman avatar fginter avatar jnivre avatar vjeranc avatar zeljkoagic avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.