Git Product home page Git Product logo

siamese-cognate-detection's Introduction

Computation for Indian Language Technology Logo

Siamese Feed Forword Network for Cognate Detection

GitHub issues GitHub forks GitHub stars GitHub license Twitter Follow

About

The repository contains the Siamese FeedFoward implementation for the NLP task of True Cogante Detection. It can be extended for various similarity measure tasks for textual input.The code is implemented using PyTorch framework and uses the FastText Word Embeddings (P. Bojanowski,et al. Enriching Word Vectors with Subword Information) for Word Vector Representation. The paper can be found here and here.

It should be noted that the work on this paper was carried out by Sravan Munukutla and Sayali Ghodekar.

Siamese FeedForward Architecture

Architecture Diagram

Siamese Neural Networks tries to learn a common subspace, by tying the trainable parameters of the Network.It uses the same weights while working in tandem on two different input vectors to compute comparable output vectors. The output vectors are compared using similarity scores like cosine similarity, Manhattan or Euclidean distances. The above architecture is designed to detect True Cognates amongs various Indian language pairs.

Usage

    git clone https://github.com/cfiltnlp/siamese-cognate-detection.git

For result reproducibility, please use the code provided here with the data which can be obtained from here.

Maintainer(s)

Sravan Munukutla
Sayali Ghodekar
Diptesh Kanojia

Citation

Kanojia, D., Munukutla, S., Ghodekar, S., Bhattacharyya, P. and Kulkarni, M., 2020. Keep your dimensions on a leash: True cognate detection using siamese deep neural networks. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (pp. 324-325).

BiBTeX Citation

@inproceedings{10.1145/3371158.3371207,
    author = {Kanojia, Diptesh and Munukutla, Sravan and Ghodekar, Sayali and Bhattacharyya, Pushpak and Kulkarni, Malhar},
    title = {Keep Your Dimensions on a Leash: True Cognate Detection Using Siamese Deep Neural Networks},
    year = {2020},
    isbn = {9781450377386},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3371158.3371207},
    doi = {10.1145/3371158.3371207},
    abstract = {Automatic Cognate Detection helps NLP tasks of Machine Translation, Information Retrieval, and Phylogenetics. Cognate words are defined as word pairs across languages which exhibit partial or full lexical similarity and mean the same (e.g., hund-hound in German-English). In this paper, we use a Siamese Feed-forward neural network with word-embeddings to detect such word pairs. Our experiments with various embedding dimensions show larger embedding dimensions can only be used for large corpora sizes for this task. On a dataset built using linked Indian Wordnets, our approach beats the baseline approach with a significant margin (up to 71\%) with the best F-score of 0.85\% on the Hindi-Gujarati language pair.},
    booktitle = {Proceedings of the 7th ACM IKDD CoDS and 25th COMAD},
    pages = {324โ€“325},
    numpages = {2},
    keywords = {Cognate Detection, Word Embeddings, Indian Languages},
    location = {Hyderabad, India},
    series = {CoDS COMAD 2020}
}

siamese-cognate-detection's People

Contributors

dipteshkanojia avatar sayalighodekar avatar sravsai avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.