Git Product home page Git Product logo

lyubomirt / lesp Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 5.0 1.17 MB

๐Ÿ“š LESP is a lightweight, efficient spelling proofreader written in Python. It's designed to be easy to use and lightweight, while still providing a decent result when checking for spelling errors. Resource consumption is kept to a minimum, and the program is designed to be as fast as possible.

Home Page: https://lesp.gitbook.io/lesp

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
cross-platform easy-to-use lightweight plug-and-play spellcheck collaborate github student-vscode

lesp's Introduction

LESP - Lightweight Efficient Spelling Proofreader


version license python platform dependencies

Welcome to the LESP repository! ๐Ÿ‘‹

LESP is a lightweight, efficient spelling proofreader written in Python. It's designed to be easy to use and lightweight, while still providing a decent result when checking for spelling errors. Resource consumption is kept to a minimum, and the program is designed to be as fast as possible.

Features โœจ

  • Lightweight and efficient
  • Easy to use
  • Fast
  • Cross-platform
  • No dependencies
  • (Kind of) Customizable

Installation ๐Ÿ“ฅ

Simply clone the repository and run the demo.py file to check it out. You don't need to install any additional libraries, so this is like plug-and-play. Just note that anything below Python 3.6 won't run this since old versions don't support concurrent.futures, which is used to speed up the process.

...or install it with pip ๐Ÿ“ฅ

pip install lesp

Detailed installation instructions for Git

  1. Clone the repository
git clone https://github.com/LyubomirT/lesp.git
  1. Open the folder
cd lesp
  1. Run the demo
python demo.py

Usage ๐Ÿ“–

LESP is pretty easy to setup, and basic demo configuration is already pre-built. You can find it in demo_config (this is a file, not a folder!) and you can edit it to your liking. Note that the file is required for the demo to run, so don't delete, move, or rename it. Not required for installing it with pip though.

If you want to take a closer look at how to use LESP, you can check out our documentation. There we have a detailed explanation of how to use LESP, along with some examples. If you're still not sure how to use LESP, you can check out the examples folder. It contains some examples of how you can use LESP in your projects. These examples are pretty simple, but they should give you an idea of how you can use LESP in your projects.

Basic usage

To use LESP, you need to import the Proofreader class from the lesp module. The class has a decent amount of functions, but the most important ones are is_correct and get_similar. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")
clearlynotcorrect = proofreader.is_correct("apgle") # False

if not clearlynotcorrect:
    print("Did you mean: " + proofreader.get_similar("apgle")) # Did you mean: apple

Simple as that!

Advanced usage

By default, Proofreader will use the lesp-wordlist.txt file as the wordlist.

You can use a different wordlist by specifying the path to it in the wordlist argument, when initializing the Proofreader class.

A wordlist must be structured with each word on a new line, like this:

apple
banana
orange

When finished with writing your wordlist, save it as a .txt file. Then, you can use it like this:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

You can customize the process of getting similar words as well. Configuration will be provided as arguments to the get_similar function. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

similar_words = proofreader.get_similar("apgle", similarity_rate=0.5, chunks=4, upto=3)

print(similar_words)

In the code above, we're getting similar words to apgle with a similarity rate of 0.5, splitting the wordlist into 4 chunks, and returning up to 3 similar words.

A similarity rate of 0.5 means that the words returned will be at least 50% similar to the word we're checking. The higher the similarity rate, the more precise the results will be, but generally there will be less words. Myself I would recommend to keep the similarity rate at 0.5, but you can experiment with it and see what works best for you.

The chunks argument specifies how many chunks the wordlist will be split into. This is useful if you have a large wordlist and you want to speed up the process. The higher the number, the faster the process will be, but the more memory/CPU it will consume. For example, when trying to scan wordlist.txt with 1500 chunks, the process takes about 0.5 seconds on my machine, but it consumes about 1.5 GB of RAM and 44% of one of the CPU cores. If you have a large wordlist.

The upto argument specifies how many similar words will be returned. If you set it to 3, then the function will return up to 3 similar words. If you set it to 1, then it will return up to 1 similar word. But, whatever amount you select, the output will still be a list. If you set it to 0, then the function will raise a ValueError.

Get similarity score

Even if this function isn't really supposed to be a feature, you can still use it if you want to. It's pretty simple to use, just use the get_similarity_score function of the Proofreader class and pass the two words you want to compare as arguments. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

score = proofreader.get_similarity_score("apple", "apgle") # 0.8

print(score)

The function will return a float between 0 and 1, where 0 means that the words are completely different, and 1 means that the words are exactly the same.

Backup

If you're concerned about losing your wordlist, you can use the backup function to backup your wordlist. It will create a file in the path you specify, and it will write the wordlist in it. Note that the file will be overwritten if it already exists. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

proofreader.backup("my_wordlist_backup.txt") # Leave empty to use default path

Restore

If you've backed up your wordlist, you can restore it using the restore function. It will read the file you specify and it will overwrite the current wordlist with the one in the file. Note that the file must exist, otherwise the function will raise a FileNotFoundError. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

proofreader.restore(True, "my_wordlist_backup.txt") # Leave empty to use default path

True here stands for overridecurrent, which lets you choose whether you want the wordlist file to be overwritten or not. If you set it to False, then the function will leave your current wordlist file untouched, and will just modify the wordlist variable in the current session. If you set it to True, then the function will overwrite the wordlist file with the one in the backup file along with the wordlist variable in the current session.

Extend wordlist

This is useful if the user usually writes about a specific, non-general topic. For example, if the user is a programmer, you can extend the wordlist with programming-related words if one is not found in the wordlist already. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

if not proofreader.is_correct("reactjs") and proofreader.get_similar("reactjs") is None:
    confirm = input("reactjs is not in the wordlist. Would you like to add it? (y/n) ")
    if confirm.lower() == "y":
        proofreader.backup()
        proofreader.extend_wordlist("reactjs")
        print("reactjs added to wordlist.")
    else:
        pass

You can also extend the wordlist with multiple words at once by passing a list or a tuple to the function. Like this:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

words = ["reactjs", "vuejs", "angularjs"]

proofreader.extend_wordlist(words)

Remove from wordlist

An opposite of the extend_wordlist function, this function removes a word from the wordlist. Note that this function will raise a ValueError if the word is not in the wordlist. Also note that this function will not remove the word from the wordlist permanently, it will only remove it for the current session. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

word = "reactjs"
proofreader.remove_from_wordlist(word)

If you want to remove multiple words at once, you can pass a list or a tuple to the function. Like this:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

words = ["reactjs", "vuejs", "angularjs"]

proofreader.remove_from_wordlist(words)

Stacking

This function lets you stack two wordlist files together, so you can have a bigger wordlist out of two combined. The function will take two arguments, the source file and the destination file. The source file is the file that will be stacked on top of the destination file. Here's an example:

from lesp.autocorrect import Proofreader

proofreader.stack("wordlist.txt", "my_wordlist.txt")

Merge delete

This function lets you delete all words from the destination file that are in the source file. For example, if you have a wordlist with the following words:

apple
banana
orange

And you have another wordlist with the following words:

apple
banana
raspberry

Then, if you use the merge_delete function, the destination file will be modified to look like this:

orange
raspberry

Here's an example of how you can use it:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

proofreader.merge_delete("wordlist.txt", "my_wordlist.txt")

with open("my_wordlist.txt", "r") as f:
    print(f.read())

Caching

To improve the perfomance of LESP, get_similar uses a cache file to store similar words. This way, if you check the same word multiple times, it will be much faster. The default cache file is lesp_cache/lesp.cache, but you can change it by specifying the cache_file argument when initializing the Proofreader class. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt", cache_file="my_cache.cache")

Cache works only for mistakes that have been made at least once. For example, if you check the word apgle and it returns apple, then the next time you check apgle, it will be much faster. This can save a lot of time and resources, especially if the user makes a lot of mistakes.

If you want to clear the cache, you can use the clear_cache function. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt", cache_file="my_cache.cache")

proofreader.clear_cache()

This will delete the cache file and clear the cache variable in the current session. Note that the file will be deleted permanently, so make sure you have a backup if you want to keep it.

To use the cache, you need to specify the use_cache (or set_cache if you want it to be modified) argument when calling the get_similar function. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt", cache_file="my_cache.cache")

similar_words = proofreader.get_similar("apgle", similarity_rate=0.5, chunks=4, upto=3, use_cache=True, set_cache=True) # Takes about 0.18 seconds on my machine
similar_words2 = proofreader.get_similar("apgle", similarity_rate=0.5, chunks=4, upto=3, use_cache=True, set_cache=True) # Works almost instantly thanks to cache

Here, use_cache is responsible for using the loaded cache file (if it exists) and set_cache helps you to add a new mistake to cache. If you set set_cache to True, then the function will add the mistake to cache, so the next time you check the same word, it will be much faster with use_cache enabled.

Removing Special Characters

Sometimes, a string may contain special characters, such as !, ?, @, etc. These characters can be removed using the remove_special method. It covers most of the special characters out there, but not all of them. So if you find a special character that is not covered, please open an issue and I'll add it. Here's an example:

from lesp.autocorrect import Proofreader

proofreader = Proofreader(wordlist_path="my_wordlist.txt")

word = "apgle!"
word = proofreader.remove_special(word) # apgle

if not proofreader.is_correct(word): # Not correct, of course
    print("Did you mean: " + proofreader.get_similar(word)) # Did you mean: apple

Examples ๐Ÿ“

If you're still not sure where to use LESP, you can check out the examples folder. It contains some examples of how you can use LESP in your projects. These examples are pretty simple, but they should give you an idea of how you can use LESP in your projects.

How to run an example?

Simply open the folder of the example you want to run, then copy the main.py file to the root of the directory (same as demo.py, for instance). After that, run the main.py file and voila! The application is running!

Contributing ๐Ÿค

Contributions, issues and feature requests are welcome! Feel free to check out the issues page.

How to contribute?

Thank you for your interest in contributing to LESP! Here's a quick guide on how to contribute:

  1. Fork the repository
git clone https://github.com/LyubomirT/lesp.git
  1. Make your changes

  2. Test your changes to make sure everything works as expected

  3. Commit your changes

git commit -m "Your changes"
  1. Push your changes
git push
  1. Open a pull request

  2. Wait for your pull request to be reviewed

Once again, thank you for your support!

Reach out to the developer ๐Ÿ‘จโ€๐Ÿ’ป

You can contact me on Discord either in my Discord Server or in my DMs (@lyubomirt). Creating a discussion might also work, but I'm a bit faster to respond on Discord.

License ๐Ÿ“œ

This project is licensed under the BSD 3-Clause License. For more information, please refer to the LICENSE file.

Acknowledgements ๐Ÿ™

Many thanks to the following Open-Source projects:

Our Amazing Contributors โœจ

Thanks to these awesome people for contributing! I appreciate your support a lot! โค๏ธ

Contributors

(Note that due to a glitch, some contributors may not appear in the grid)

lesp's People

Contributors

deepsource-io[bot] avatar halzorg avatar lyubomirt avatar mahhheshh avatar parakrant avatar y9rabbito avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

lesp's Issues

๐Ÿ” Use the same verification for other wordlist-related commands

Currently, when loading a wordlist, a verification process first checks if it's applicable to be a wordlist. However, these functions:

  • restore()
  • stack()
  • merge_delete()

don't support such validation yet. This can be easily implemented by adapting the algorithm from load_wordlist() to work with the methods mentioned above.

๐Ÿ–ฅ GUI Demo

While a CLI demonstration is good, a GUI demonstration will be more user-friendly. Whether it's a Web app hosted with Flask or a Native desktop app, it will surely simplify the process of understanding how LESP works.

๐Ÿ’ฝ Add Docstrings

Docstrings can be added for the module itself, for the class, and the functions. This will help users understand the syntax of LESP without opening the documentation all the time.

๐Ÿ“œ Hosted Documentation on a Dedicated Website / Documentation Service

We might want to use things like GitBook or Sphinx to create a better-looking documentation for LESP, not limiting ourselves to the README. This will be one of the most important steps when publishing the package to PyPi, hence must be created as soon as possible. We could either use one of the services mentioned above, or host a custom solution.

๐Ÿ” Make sure the wordlist follows the appropriate format

For now, there is no error handling that validates the structure of a wordlist or a backup file when loading. This makes the program more error-prone and less user-friendly. I think it would be nice to have an error handler for such situations.

Hint:

  • Use if blocks to check if a selected file follows the specified structure

๐Ÿ’พ Optimize Memory Usage

Optimize memory usage, especially in the chunked processing, to reduce the program's overall footprint. CPU usage optimization is optional but is greatly appreciated if implemented.

Hint:

  • psutil might work really well

๐Ÿ’พ Use a class for initialization instead of a config file

Currently, this library uses a config file to initialize. Allowing users to initialize with a class instead will decrease the amount of files and will make the library more versatile. This will also make the library look easier to use.

Hint:

  • Create a class Proofreader for the library
  • Put the functions in the class as well, so these will now be methods
  • Instead of using configs files, allow to pass in the configuration options into a class instance

๐Ÿ“‘ Add more examples

Personally, I think that adding more examples will improve the ease of use of LESP by explaining how to use the project better and in more detail.

Hint:

  • GUI options could be a very nice addition
  • Simple examples are good, but we shouldn't overuse these
  • In total, at least 5 examples will be fine. Not more than 8 though.

๐Ÿ—ƒ Allow Wordlist Stacking

This feature will allow the user to "stack" two valid wordlists into one by putting all the words of one into another. Instead of overwriting the destination, the target will combine both the selected files. This will be useful for people who have a couple of separate files and want to create a bigger wordlist but don't want to put everything in a list.

Hint:
This introduces two new functions: stack() and merge_delete(). The stack function is responsible for the stacking technique, while the merge delete will remove all words from the source file that match the destination file. It would also be good if a validation method is added, to make sure both source and destination files follow the appropriate format of each word being on a separate line and not containing anything else than alphabetic characters.

โœ’ Add "Custom Dictionaries"

A Custom Dictionary is a feature that will allow you to modify the wordlist if there is an unrecognized word you'd like to recognize. It is helpful for users who use specialized language, or who write about non-general topics.

Hint:
This introduces the following commands:

  • backup() - Save the current state of the wordlist
  • restore() - Restore the last backup of the wordlist
  • extend() - Add a new word to the wordlist
  • remove() - Remove a word from the wordlist

๐Ÿ”— Caching

Caching saves time during the process of seeking similarities between words. When a word is used for scanning, data for it will be cached. Before seeking similarities of the world, the cache will be scanned first. This will save a lot of resources and reduce loading times if the user makes similar mistakes multiple times.

๐ŸŽ’ Support Bulk Expansion

Bulk Expansion enables users to extend their wordlist using Lists and Tuples, when they have a lot of words to add. For example, you have a list of words, which contains 60 words. Instead of adding each of the words manually, you can simply select the list and it will be used to expand the wordlist. To implement this, we can accept both strings and lists / tuples in the extend function. When it's a string, it will do a single word expansion, and vice versa.

Hint:

  • Use an if statement to handle different data types
  • Validate whether the list only contains appropriate items (alphabetic characters)

Add Type Hinting

Adding type hints could help improve the user and development experience. It can be done using the in-built typing module.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.