Git Product home page Git Product logo

lang-detection's Introduction

Detectors

Tools used for this purpose:

*: Supports the Guarani language.

Installation

Pre-requisites:

Install polyglot dependencies.

Install requirements pip install -r requirements.txt

Download fastText lib.

Download the crubadan corpus.

# commented out due to low precision of textcat, use glcd3 instead.
"""
import nltk
nltk.download('crubadan')
nltk.download('punkt')
"""

Command Line Interface

All commands must be run from the src directory.

Detect language of tweets

python run.py [data_dir] [file_name_of_tweets] [language_lexicon] --detect_language --guarani

data_dir: path to data directory and must be relative to the src directory. Required.
file_name_of_tweets: Name of the file containing the tweets in CSV format. Required.
language_lexicon: Name of the file containing the language's (to-identify) words lexicon. Optional. In fact, language_lexicon can be any low-resource language.
guarani: The language (to-identify) is Guarani (or another low-resource language)? Optional. Needed for language_lexicon.

See also: lang, lang_2.


Note: Partially forked from https://github.com/social-link-analytics-group-bsc/tw_coronavirus in v1.0.

lang-detection's People

Contributors

mmaguero avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.