Git Product home page Git Product logo

kurdnet's Introduction

KurdNet - the Kurdish WordNet

KurdNet is the the Kurdish WordNet. WordNet has been used in numerous natural language processing tasks such as word sense disambiguation and information extraction with considerable success. Motivated by this success, many projects have been under taken to build similar lexical databases for other languages.

The current version of KurdNet is created following the Expand Approach, which is carried out semi-automatically and centred around building a Kurdish alignment for Base Concepts, i.e. the core subset of major meanings in the Princeton WordNet. Additionally, we use a bilingual dictionary and simple set theory operations to translate and align synsets and use a corpus to extract usage examples. The effectiveness of our prototype database is evaluated via measuring its impact on a Kurdish information retrieval task.

WordNet schema for KurdNet
Base Concepts' Entity-Relationship schema in WordNet

Features

The following table shows the main statistical properties of Base Concepts and its alignment in KurdNet:

Base Concepts KurdNet (Max) KurdNet (Min)
Synset No. 4689 3801 2145
Literal No. 11171 17990 6248
Usage No. 2645 89950 31240

Folder ckb contains OMW compatible files of KurdNet. Unlike definitions in Sorani Kurdish which are translated manually by experts, lemmata are collected semi-automatically. Therefore, it is possible to find incorrect cases in the provided lemmata. This is something that should be more focused on in the next version of the resource. According the our experiments, the semi-automatic minimal alignment (referred to as min) outperforms the maximal alignment (max). Therefore, only lemmata which are semi-automatically aligned based on the min technique are provided in the Open Multilingual WordNet version.

Contribute to this project ✨

The current version of KurdNet only contains synsets and translations in Sorani. Since 2014 when we initiated this project, we have been looking for people interested in translating the current definitions and synsets into other variants of Kurdish, particularly Kurmanji. If you can contribute to this project, please get in touch. You can take a look at the glosses to be translated in the Translated_Glosses.tsv file.

Note that the .tsv files are exported based on the original Excel files and some of them are cleaned, e.g. ZWNJ removed. It is recommended to work on the .tsv files for future developments.

Cite this project

If you use this resource, please cite our paper:

@inproceedings{W14-0101,
  title = "Towards Building KurdNet, the Kurdish WordNet",
  author = "Aliabadi, Purya  and
    Ahmadi, Sina  and
    Salavati, Shahin  and
    Esmaili, Kyumars Sheykh",
  booktitle = "Proceedings of the Seventh Global Wordnet Conference",
  year = "2014",
  address = "Tartu, Estonia",
  url = "https://www.aclweb.org/anthology/W14-0101",
  pages = "1--6",
}

kurdnet's People

Contributors

fcbond avatar goodmami avatar klpp avatar sinaahmadi avatar

Watchers

 avatar  avatar  avatar

Forkers

fcbond goodmami

kurdnet's Issues

Preparing for inclusion in Wn's index

I'm the maintainer of the Wn project and I'm looking to include KurdNet in the project index for the next release. Before I do, I have some suggestions, but feel free to ignore them as they are not necessary for inclusion in the index, but they can make things more useful and/or convenient.

  1. Change the lexicon ID to kurdnet (or similar). The ckbwn ID is less memorable and less "branded" to this project, and it follows the ID scheme of the previous release of OMW despite it not being distributed by OMW. This way, when a user of Wn wants to download KurdNet, they can do:

    >>> wn.download('kurdnet')

    instead of

    >>> wn.download('ckbwn')
  2. Build an WN-LMF package for the release. This is basically a directory containing the XML file along with (optionally) a README, LICENSE, or citation.bib file. This is useful for distributing the full texts of these additional files along with the wordnet itself. If you don't wish to distribute these files, just the XML file is fine.

  3. Compress the package or XML file and attach it to a release on GitHub (examples: OMW v1.4, OdeNet v1.4). I strongly recommend this one, as currently the ckbwn.xml file is checked into the repository directly, so it's not clear which commit corresponds to the released version. In other projects we have used GitHub Actions scripts to automate this process for us, and I can help you get this set up.

  4. (Unrelated to packaging for Wn) The link at the bottom of https://sinaahmadi.github.io/resources/kurdnet.html says "Download KurdNet at https://github.com/klpp/kurdnet." but I think that link should refer to this repository instead.

Let me know if you have questions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.