Git Product home page Git Product logo

rosimlex-999's Introduction

Paper

The aproach is described in the following paper:

Barbu, Eduard; Barbu Mititelu, Verginica (2022). Evaluating Computational Models of Similarity against a Human Rated Dataset. Baltic Journal of Modern Computing, 10 (3), 295โˆ’306. DOI: 10.22364/bjmc.2022.10.3.03. Download
For convenience the paper is also uploaded under "Evaluating Computational Models of Similarity against a Human Rated Dataset.pdf" in this Github distribution

The RoSimLex-999 data and code

This is the data and the code for the paper "Evaluating computational models of similarity against a human-rated dataset."
The code has been tested on Mac OS X and Linux.

Prerequisites

  1. Python 3.6.10 or higher. We recommend installing the Anaconda distribution. In any case, you have to have NumPy and SciPy installed.
  2. If you want to reproduce the corpus-based results you need to install more things (see below).

Data

  1. If you prefer the data in spreadsheet format, you can download it from here: RoSimLex-999 Data
  2. You can find the data in text format under the "Data" directory.
  • RoSimLex-Maximal.txt contains the original SimLex-999 set, the Romanian mappings, the human scores, and the scores computed from corpora embeddings and the semantic networks.
  • RoSimLex-Common.txt is the common set described in the paper

Compute the correlation coefficients

After running the following script, the results are in the "results_corr.txt" file.

  • python correlations.py --scores Data/RoSimLex-Final.txt --results results_corr.txt

Reproduce the similarity scores for the Semantic Networks

  1. For Princeton WordNet, you have to install NLTK
  2. For the Romanian Wordnet, you have to install RoWordNet
  3. Run the script "wordnet_correlations.py." The results are already stored under "Results/Wordnet_Similarities" for your convenience.
  • python wordnet_correlations.py

Reproduce the similarity scores for Corpora

  1. Install the Gensim library
  2. Download the following word embeddings:
    • The CoRoLa embeddings with configurations (300_20 and 400_5) from here
    • The Romanian CoNLL_2017 embbedings from here, position 64.
    • The FastText Romanian embeddings from here
    • Place all the models in a directory called "Embeddings" with the following subdirectories
      • CONLL2017_Word2vec. Inside this directory, place "model.bin" representing the Romanian CoNLL_2017 embeddings
      • Corola. Inside this directory place "corola.300.20.vec" and "corola.400.5.vec" representing the CoRoLa embeddings with configurations (300_20 and 400_5)
      • Facebook. Inside this directory, place "cc.ro.300.bin" representing the FastText Romanian embeddings
    • Run the script "corpus-cosine_embeddings.py." The results are already stored under "Results/Corpus_Similarities" for your convenience.
      • python corpus-cosine_embeddings.py

Contact

  1. If you have questions or comments regarding the code, write to Eduard Barbu (eduard dot barbu at yahoo dot com)
  2. If you have questions or comments regarding the data, write to Verginica Barbu Mititelu (vergi at racai dot ro)

rosimlex-999's People

Contributors

soimulpatriei avatar verginicabm avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.