Git Product home page Git Product logo

polyglot's Introduction

polyglot

Downloads Latest Version Supported Python versions Development Status Download Format Build Status Documentation Status

Polyglot is a natural language pipeline that supports massive multilingual applications.

Features

  • Tokenization (165 Languages)
  • Language detection (196 Languages)
  • Named Entity Recognition (40 Languages)
  • Sentiment Analysis (136 Languages)
  • Word Embeddings (137 Languages)
  • Morphological analysis (135 Languages)
  • Transliteration (69 Languages)

Developer

  • Rami Al-Rfou @ rmyeid gmail com

Qiuck Tutorial

import polyglot
from polyglot.text import Text

Language Detection

text = Text("Bonjour, Mesdames.")
print("Language Detected: Code={}, Name={}\n".format(text.language.code, text.language.name))
Language Detected: Code=fr, Name=French

Tokenization

zen = Text("Beautiful is better than ugly. "
           "Explicit is better than implicit. "
           "Simple is better than complex.")
print(zen.words)
[u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']
print(zen.sentences)
[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]

Part of Speech Tagging

text = Text(u"O primeiro uso de desobediência civil em massa ocorreu em setembro de 1906.")

print("{:<16}{}".format("Word", "POS Tag")+"\n"+"-"*30)
for word, tag in text.pos_tags:
    print(u"{:<16}{:>2}".format(word, tag))
Word            POS Tag
------------------------------
O               PART
primeiro        SCONJ
uso             PART
de              ADP
desobediência   PART
civil           SCONJ
em              ADP
massa           PART
ocorreu         SCONJ
em              ADP
setembro        PART
de              ADP
1906            DET
.               ADV

Named Entity Recognition

text = Text(u"In Großbritannien war Gandhi mit dem westlichen Lebensstil vertraut geworden")
print(text.entities)
[I-LOC([u'Groxdfbritannien']), I-PER([u'Gandhi'])]

Polarity

print("{:<16}{}".format("Word", "Polarity")+"\n"+"-"*30)
for w in zen.words[:6]:
    print("{:<16}{:>2}".format(w, w.polarity))
Word            Polarity
------------------------------
Beautiful        0
is               0
better           1
than             0
ugly            -1
.                0

Embeddings

word = zen.words[0]
print(word.vector)
[-0.08001513 -0.35475096  0.27702546 -0.20423636  0.36313248  0.06376412
  0.0444247  -0.30489922  0.014972    0.13951094  0.07515849 -0.2703914
  0.04650182  0.58747977  0.5101701  -0.04114699  0.37434807 -0.27707747
 -0.06124159  0.21493433 -0.23498166  0.07404013 -0.23953673 -0.15044802
  0.21210277 -0.58776855  0.12014424  0.30591646  0.07079886  0.44168213
  0.2473582  -0.43409103 -0.25516582  0.45812422  0.33660468  0.61951864
  0.16038296 -0.12069689 -0.59378242 -0.47525382 -0.03109539  0.28781402
 -0.51556301 -0.26363477 -0.0820123   0.31425434 -0.10971891  0.53333962
  0.3446033  -0.62146574 -0.15398794  0.11720303  0.50415224 -0.79616308
 -0.25548786  0.36809164 -0.26254281  0.11736908 -0.30717522 -0.18103991
 -0.03320931 -0.15692121 -0.22654058  0.56092978]

Morphology

word = Text("Preprocessing is an essential step.").words[0]
print(word.morphemes)
[u'Pre', u'process', u'ing']

Transliteration

from polyglot.transliteration import Transliterator
transliterator = Transliterator(source_lang="en", target_lang="ru")
print(transliterator.transliterate(u"preprocessing"))
препрокессинг

polyglot's People

Contributors

abosamoor avatar svisser avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.