Git Product home page Git Product logo

alsii's Introduction

Language and Society Project

Group:

  • Pratyaksh Gautam (2020114002)
  • Nukit Tailor (2020114012)

The original code is under the directory code_release/

Data

The Facebook, Twitter and Whatsapp data was all downloaded from: http://amitavadas.com/Code-Mixing.html

Resources

  1. The English word list "resources/EN.words.txt" was downloaded from: http://wordlist.aspell.net/
  2. The Hindi transliteration word list "resources/HI.trans.fire2013.txt" was downloaded from: https://web.archive.org/web/20160312153954/http://cse.iitkgp.ac.in/resgrp/cnerg/qa/fire13translit/
  3. The Hindi word list was compiled by Gupta et al. (2012): http://www.lrec-conf.org/proceedings/lrec2012/pdf/365_Paper.pdf

Running the code

The main annotation script is "process.py". It should be run as follows: python3 process.py <src_file> [-top_n int] -out <out_file> Where <src_file> is the input text file in CoNLL-format (1 token per line), and <out_file> is the name of the output file that will be generated. The -top_n flag controls how much of the manually created word list will be used to classify tokens. By default, it uses the whole word list.

Now , to check the scores , run the following command: python3 scorer.py -hyp <out_file> -ref <ref_file> [-v] Where <out_file> is the output file generated by process.py, and <ref_file> is the reference file. The -v flag is optional and will print the scores.

Results

With our modifications to the source, we were able to achieve the following improved F-scores as compared to the original code:

--------
WHATSAPP                          
--------
        en      hi      univ
en      294     420     30        
hi      32      1988    37  
univ    37      131     249         

        Old-scores                              New-scores

CLASS   P       R       F1              CLASS   P       R       F1
en      39.516  80.992  53.117          en      39.783  80.992  53.358
hi      96.646  78.299  86.51           hi      96.65   78.417  86.584
univ    59.712  78.797  67.94           univ    59.427  78.797  67.75                       

--------
FACEBOOK
--------
        en      hi      univ
en      12997   397     530
hi      127     2446    173
univ    90      14      3841

        Old-scores                              New-scores

CLASS   P       R       F1              CLASS   P       R       F1
en      93.335  98.35   95.777          en      93.342  98.358  95.785
hi      89.043  85.614  87.295          hi      89.075  85.614  87.31
univ    97.363  84.507  90.481          univ    97.364  84.529  90.494   

--------
TWITTER
-------
        en      hi      univ
en      3038    1047    227
hi      575     8034    243
univ    119     698     3330

        Old-scores                              New-scores

CLASS   P       R       F1              CLASS   P       R       F1
en      70.255  81.324  75.385          en      70.455  81.404  75.535
hi      90.721  82.084  86.187          hi      90.759  82.156  86.243
univ    80.352  87.605  83.822          univ    80.299  87.632  83.805 

alsii's People

Contributors

hi-im-buggy avatar nukitt avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

nukitt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.