language_ident's Introduction

Language identification with Python

Reimplementation of automatic language identification for demonstration purposes with minimal usage of existing libraries. (numpy)

Testing the code

The code consists of the Naive Bayes model class (nb_classifier.py), the feature extraction utilities (features.py) and the anlysis notebook (test_li.ipynb). In the notebook the usage of the code is walked through step-by-step with some explanations. Also, the model is trained on a dataset imported from the HuggingFace repository and evaluated against the langid and langdetect libraries.

To try out the code follow these steps:

set up a python 3 virtual environment

python3 -m venv language_ident_env
source language_ident_env/bin/activate

install the required packages to view the notebook

python3 -m pip install -r requirements_analysis.txt

or if you just want to test the Naive Bayes model without the analysis notebook, you only need two required packages.

python3 -m pip install -r requirements_method.txt

Run the notebook

jupyter notebook test_li.ipynb

or test the code by importing nb_classifier.py and features.py

Recommend Projects

christof93 / language_ident Goto Github PK

language_ident's Introduction

Language identification with Python

Testing the code

language_ident's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent