cmartin009 / machine-learning-for-language-detection Goto Github PK

View Code? Open in Web Editor NEW

In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in . We will focus just on Frensh , Spanish , Portuguese and Italian

Python 100.00%

machine-learning-for-language-detection's Introduction

Machine-learning-for-language-detection: Text Classification using python, scikit-learn and NLTK , Neural Network(MLP)

Introduction:

Abstract:

1-download the data set from http://www.statmt.org/europarl/ ( download==>"source release"==>choose: Fr , es , pt , it)

2-Extract features from text files ( cleaning.py + feature_preparation.py)

3-Running ML algorithms , we will use MLP ( libraries =>scikit-learn and NLTK )

More details:

##First , don't forget to install nltk :)

1-Text files are actually series of words (ordered). In order to run machine learning algorithms we need to convert the text files into numerical feature vectors. We will be using bag of words model for our example. Our bag-of-words will contain n-grams Then, we need to get the frequency distribution of the words in all files (TF), We need to covert the text corpus into the feature vectors

2-Building a Classifier: After cleanup, it is time to build the classifier to identify language of each file There are many algorithms to choose from, we will use a neural network (espacially MLP) Classifier and train the model on the training set.

###########################If there are any omissions or mistakes, please do not hesitate to let me know. ###########################Email : [email protected]

<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3 MAKE CODE NOT WAR <3<3<3<3<3<3<3<3<3<3<3<3<3<3<3

Recommend Projects

cmartin009 / machine-learning-for-language-detection Goto Github PK

machine-learning-for-language-detection's Introduction

Machine-learning-for-language-detection: Text Classification using python, scikit-learn and NLTK , Neural Network(MLP)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent