filyp / autocorrect Goto Github PK

View Code? Open in Web Editor NEW

441.0 7.0 78.0 3.9 MB

Spelling corrector in python

License: GNU Lesser General Public License v3.0

Python 100.00%

autocorrection spelling spellchecker levenshtein-distance python english polish turkish ukrainian russian

autocorrect's People

Contributors

Stargazers

Watchers

autocorrect's Issues

Adding a new language does not work

Hi!

I tried to follow the explanation on ading new languages.

count_words('hiwiki-latest-pages-articles.xml', 'hi')

does not work for me. It says

~\Miniconda3\lib\site-packages\autocorrect\word_count.py in count_words(src_filename, lang, encd, out_filename)
     17 def count_words(src_filename, lang, encd=None, out_filename='word_count.json'):
     18     words = get_words(src_filename, lang, encd)
---> 19     counts = Counter(words)
     20     # make output file human readable
     21     counts_list = list(counts.items())

~\Miniconda3\lib\collections\__init__.py in __init__(*args, **kwds)
    566             raise TypeError('expected at most 1 arguments, got %d' % len(args))
    567         super(Counter, self).__init__()
--> 568         self.update(*args, **kwds)
    569 
    570     def __missing__(self, key):

~\Miniconda3\lib\collections\__init__.py in update(*args, **kwds)
    653                     super(Counter, self).update(iterable) # fast path when counter is empty
    654             else:
--> 655                 _count_elements(self, iterable)
    656         if kwds:
    657             self.update(kwds)

~\Miniconda3\lib\site-packages\autocorrect\word_count.py in get_words(filename, lang, encd)
      7 
      8 def get_words(filename, lang, encd):
----> 9     word_regex = word_regexes[lang]
     10     capitalized_regex = r'(\.|^|<|"|\'|\(|\[|\{)\s*' + word_regexes[lang]
     11     with open(filename, encoding=encd) as file:

KeyError: 'hi'

Best regards
Robert

Detection of an omitted space

Thanks for this wonderful lib!

Can you add some functionality to detect accidentally merged words, for example, when a whitespace (separating words apart) was omitted?

from autocorrect import Speller
spellEn = Speller('en')
[spellEn.get_candidates(lemma) for lemma in ['test','project','testproject']]

>>>[[(495684, 'test')], [(1628175, 'project')], [(0, 'testproject')]]

It would be cool if 'testproject' could produce correct candidates: 'test' and 'project'
How hard is it to add such a feature?

Problem with download langauge pack

Hi, I have some problems with download polish package. Looks like dropbox block the link.

couldn't download https://dl.dropboxusercontent.com/s/40orabi1l3dfqpp/pl.tar.gz?dl=0, trying next url... Traceback (most recent call last): File "C:\Users\lisek\Desktop\test.py", line 15, in <module> spell = Speller('pl') File "D:\Python\lib\site-packages\autocorrect\__init__.py", line 78, in __init__ self.nlp_data = load_from_tar(lang) if nlp_data is None else nlp_data File "D:\Python\lib\site-packages\autocorrect\__init__.py", line 61, in load_from_tar raise ConnectionError( ConnectionError: HTTP Error 429: Too Many Requests Fix your network connection, or manually download

Is there any chance to get language pack another way?

Crash when using autocorrect library from pyinstaller .exe file

There is currently an issue which causes .exe files created using pyinstaller to crash.

If the source Python file used the autocorrect library, the program would crash if the user tried to launch it from the .exe file. "dictionary for this language not found, downloading..." will appear in the terminal, then "couldn't download https://drive.google.com/uc?export=download&id=19xqFyk9d8aFR7LR43oy6cExk8Pk9wVwV, trying next url...", followed by a ConnectionError: [Errno 2]. When I visit the link in my browser, it seems to work. This is for the English language.

Is it possible to fix this, or is the library not meant to be used from a stand-alone executable?

Add a CONTRIBUTING.md guide

Let's add instructions to guide contributors.

Like specifying the tools and coding style to be used while making contributions.

Correction for API adoption

//:;()

Improve model

Before everything, it is great tool. Thank you for your work. Altough Turkish-nlp is a difficult task, your model handels succesfully. However, there are some mistakes in some specific sentences which I tried for testing. It doesn't fix some specific words. At this point, I want to improve this model. What can I do for it ? If I can collect huge corpus for training from scratch, would that be useful ?

How to use Speller(lang='vi')

Is it possible to get the most close words to the typed instance? Currently, Speller() only return a single word.

Reference Norvig's spelling corrector in readme

Hi, since the code or at least the algorithm is based on Peter Norvig's spelling corrector https://norvig.com/spell-correct.html you should mention it in the readme file. This way the advanced reader will quickly get an idea of the implementation.
Right now it's only mentioned here https://github.com/filyp/autocorrect/blob/0fa2a7cab20a44b9d8393246d9553629bb35e077/autocorrect/typos.py

is it possible to not autocorrect few words?

This word "saree" exists but it gets autocorrect to "spree" how to avoid this one?

second typo correction always uses english alphabet

def double_typos(self):
        """letter combinations two typos away from word"""
        return chain.from_iterable(
            Word(e1, only_replacements=self.only_replacements).typos()
            for e1 in self.typos()
        )

when fixing the second typo, a Word object is created, but without specifying a language. thus, the second typo correction always uses the english alphabet. so, double_typoing a word like здрйхствуйте returns corrections like здраmствуйте instead of здравствуйте, replacing one of the letters properly, and another with an english letter.

Why the license changes

Hi.

From the commit messages it appears that the project changed its lincese twice in the past month (GPL and LGPL).

And the original project was MIT.

Can you tell us why the changes were made?

Issues creating new dictionnary

WHile following the tutorial for the hindi language (but using the italian wiki page instead), this error apperead:

'charmap' codec can't decode byte 0x9d in position 5050: character maps to <undefined>

This is the code used:

`
from autocorrect.word_count import count_words

count_words('itwiki-latest-pages-articles.xml', 'it')
`

Polish language pack download problem

When trying to download polish language pack spell = Speller(lang='pl') I get couldn't download https://siasky.net/AAC52kUAAmfZF_DUZVMQWWO-tJ3g3_3FDEhR2BUXr4oq8Q, trying next url....

Fine tuning and improving

Hi,
First of all - this looks great. Thanks a lot.
I compared 3 different packages (yours, pyspellchecker, textblob) and yours does the best.

How can I improve performance? Is there a way to finetune this to a specific data set?

how to add a new language (for kyrgyz language)

i did everything by the instruction in the description but i could not find a xml file for my language. Please, anybody tell me how to find and add a new language.
Thanks a lot in advance!

Add romanian language

Change the dict

Hi @filyp
Thanks for giving the provision to change the dict to own text file. And yours is the only package I found which replaces the word in sentences, else every other is working on 1 word at a time.

However, I am facing an error while doing tar to the output. Attaching screenshot.
Can you please help?

Also, I see that it works fine till changes = 2, how to increase this?
Eg: spell('NissSSan') returns 'Nissan' and
spell('NissSSSan') returns 'NissSSSan'

Help will be really appreciated.

cannot import name 'Speller'

from autocorrect import Speller

This line is throwing the error as mentioned in the title. Has it been deprecated or is there any workaround?

Ignore word function

Hello.
I can't find a function to ignore a word.
In some cases we need it.
An example in the English dictionary:
srai -> sry
but I have to ignore it.

Added Languange in Google Colab

how to add languange in this packages but im use in google colab?

Drop support for Python2

Can we focus on Python3 and drop support for Python2 since it has reached end of life?

It would allow the project to use the newer features.

And maybe we should also choose which Python3 version we support? How about >=3.6?

May this fork rejuvante the package. :-)

Edit: I noticed the typo in the last line afterwards. I will leave it there. :D

//:;(orphan.io)

Full stack

Uncaught Exception for Empty String

Hiya,

First of all, big thanks for all the work on this project, it's awesome and super-lightweight, making it perfect for my project, which is going to be used to support hundreds of children with their education.

My project crashed today though, and in debugging I found it was because of autocorrect. Here's a minimally reproducible example of the error:

from autocorrect import Speller
autocorrector = Speller()
part_given = ""
part_given = autocorrector.autocorrect_word(part_given)

I was expecting this to keep part_given as "", but instead it throws up an IndexError.

With best wishes,
JimmyCarlos.

contribution question

I added portuguese language, for the pt.tar.gz file, should I push it or create a dropbox link for it and add it to the constants ?

Cannot install

I tried pip install autocorrect
didn't work so I downloaded the code from Github and ran
pip install autocorrect-master
gave me the same log and problems
I tried all the versions all didn't work
I have no idea what I'm missing

Spell Check Multiple Languages Simultaneously in a Multi-lingual Setup

Hi, I was wondering if autocorrect can utilize multiple languages simultaneously in a multi-lingual setup.

For example,

from autocorrect import Speller

spellcheck = Speller(lang='en', 'es', 'fr')

corrected_text = spellcheck(text)

Evaluation

Is there a way to measure or evaluate how much the system has done the spelling correction correctly?

Is there a way to adapt this to Mirrored QWERTY setup for one handed typing?

Hi.

Mirrored QWERTY

(from Wikipedia) The idea is to only use one hand (preferably the left one) and type the right-hand letters by holding a key that acts as a modifier key. The layout is mirrored, so the use of the muscle memory of the other hand is possible, which greatly reduces the amount of time needed to learn the layout if the person previously used both hands to type. This was first proposed by Randall Munroe on the xkcd-blog.

I would like it for when I type a word such as "cwwdee" it can take into consideration of the mirrored QWERTY setup autocorrect it to "cookie"

Autocorrecting Proper Nouns. Anyway to add an exception list?

Hi I have a proper name 'SO' that is getting corrected to 'S' after using the Speller class. Anyway around this?

Question

Q1: How can I check if a word exist in the dictionary ?
Q2: what method 'existing' is doing ?

Italian languange not working

Hi, i'm trying to load and run spell check with italian language (supported) with current script

from autocorrect import Speller
spell = Speller(lang='it')
result = spell('Ciaa da uma perzona itaviana')
print(result);

....but i receive the following error

Can you help us? We're building a big project upon this library.

Wish you my best regards,
Sebastian

Error when adding Azerbaijani Language

Hello, i am trying to add Azerbaijani language, but i can't manage to do it.

I added this "az": r"[AaBbCcÇçDdEeƏəFfGgĞğHhXxIıİiJjKkQqLlMmNnOoÖöPpRrSsŞşTtUuÜüVvYyZz]+", to words_reges
Added this "az": "abcdefghijklmnopqrstuvxyzəüöğşçı", to alphabets

Then downloaded azwiki-latest-pages-articles.xml

When i run this code, it gives me error :
from autocorrect.word_count import count_words
count_words('azwiki-latest-pages-articles.xml', 'salam')
PS: "salam" means "hello" in my language
Follwing error appears:

Traceback (most recent call last):
File "C:/Users/User/AppData/Local/Programs/Python/Python39/spell.py", line 5, in
count_words('azwiki-latest-pages-articles.xml', 'salam')
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\autocorrect\word_count.py", line 19, in count_words
counts = Counter(words)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\collections_init_.py", line 593, in init
self.update(iterable, **kwds)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\collections_init_.py", line 679, in update
_count_elements(self, iterable)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\autocorrect\word_count.py", line 9, in get_words
word_regex = word_regexes[lang]
KeyError: 'salam'

Any way to fix this ?

How to add new words for autocorrection?

I want to add new words for autocorrelation. For example, I want to add strings like 'tive', 'fiue' that autocorrects to string 'five'. How can I achieve that?

French Spell Checker

I followed your instructions but I got an error when I try to count words from language corpus.
I attach a screenshot here, but the error log is the following

~/miniconda3/envs/cs/lib/python3.8/site-packages/autocorrect/word_count.py in get_words(filename, lang, encd)
      7 
      8 def get_words(filename, lang, encd):
----> 9     word_regex = word_regexes[lang]
     10     capitalized_regex = r'(\.|^|<|"|\'|\(|\[|\{)\s*' + word_regexes[lang]
     11     with open(filename, encoding=encd) as file:

KeyError: 'fr'

My guess is that this is related to the key languages present in word_regexes dict.

Hindi Spell checker

i have followed your instructions and tried to create Hindi spell checker . But it does not seem to work, Please share me your email id

filyp / autocorrect Goto Github PK

autocorrect's People

Contributors

Stargazers

Watchers

Forkers

autocorrect's Issues

Mirrored QWERTY

Recommend Projects

Recommend Topics

Recommend Org