Git Product home page Git Product logo

eoparser's Introduction

EOParser

demo
EOParser parsing an article from Vikipedio

What is it?

EOParser is a python library that decomposes Esperanto words into it's particles. This is possible soly due to Esperanto's regular grammar! For example, the word "malsanulo" (sick person) will be decomposed into mal (opposite of), san (health), ul (suffix for a person) and o (marking it as a noun).

Really, what is it?

Techincally EOParser is a handwritten GLL parser and disambiguator with a big list of poiible prefixes, suffixes, roots, etc... It uses the LL parser to parse words into it's particles.

Limitations

EOParser is a word-level analyser. It does not care about the overall grammar and context of a sentence. Hence it may not disambiguate gwords correctly in some cases.

How to use

>>> import eoparser
>>> parser = eoparser.eoparser()
>>> parser.parse('geparto') # English: parent
[['ge', 'prefix'], ['part', 'root']]

The eoparser.parse method return a list of lists. Storing the particles and their in type order. In the example, geparto is decomposed into ge (the prefix for both sexes), part (the word root for parent) and the part-of-speech marker is omited. Meaning both father and mother.

Use the keep_ending_marker if you wish to keep the -o part of speech marker.

>>> parser.parse('geparto', keep_ending_marker=True)
[['ge', 'prefix'], ['part', 'root'], ['o', 'pos_marker']]

This is the list of particle types

Types
root
pos_marker
suffix
prefix
word (special single word)

Using a different dictionary

EOParser uses severial dictionaries to track which particles are which. By default it loads from wherever the library lives.

parser = eoparser.eoparser(numbers="my_list_of_numbers.txt")
# Or
my_list_of_numbers = load_numbers()
parser = eoparser.eoparser(numbers=list(my_list_of_numbers))
Parameter
numbers
prefixes
suffixes
roots
full_words
correlatives

Contribution

EOParser is by no means perfect! Feel free to submit an issue, open a PR or anything!

eoparser's People

Contributors

marty1885 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

eoparser's Issues

bugs

Hi! I've been trying to use Your software, however i come across several errors while running it. First is the problem with reading the file -- I managed to determine it's because under Windows 10 (at least my configuration) and Python 3.11 the open() function defaults to interpreting the file as cp1252 instead of utf-8. Adding the parameter so that line 53 at eoparser.py looks like

`f = open(path, encoding = "utf-8")

fixes the problem and makes the program independent from the locale default encoding. Unfortunately then the demo spits out

image

I imagine it's because some problem with termcolor. I'm using version 2.2.0 and the colored function produces such gibberish.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.