manuel-silvan / simple-language-recognition-in-european-parliament-proceedings-parallel-corpus Goto Github PK

The European Parliament Proceedings Parallel Corpus (1996-2011) (https://www.statmt.org/europarl/) is a well-known dataset in Natural Language Processing tasks, it contains proceedings of the European Parliament in 21 European languages. In this project we will only extract data from 6 languages (German, French, Spanish, Italian, Polish and English), we will extract, preprocess, clean and normalize the data and after that we will train on that data some quite simple classifiers that will be able to tell in which language a sentence is written. This was originally a project i did on university.

Python 100.00%

classification-algorithm language-recognition natural-language-processing from-scratch

Recommend Projects

manuel-silvan / simple-language-recognition-in-european-parliament-proceedings-parallel-corpus Goto Github PK

simple-language-recognition-in-european-parliament-proceedings-parallel-corpus's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent