Git Product home page Git Product logo

javasymspell's Introduction

Blog

Find my blog on blog.londogard.com!

And make sure to not miss github.com/londogard where I do a lot of code today!

Recent entries:

  1. Timeseries (predicting stocks/crypto) 3-part series (#1, #2, #3)
  2. Deep Learning in the browser using ONNX with KotlinJS
  3. Knowledge Distillation

Interesting Presentations

Almedalsveckan 2021 - Vikten av att verkligen förstå AI-modeller (The Importance of Really Understanding AI-models). I did a presentation on the importance of understanding models and their decisions. This talk touched upon an array of matters such as Ethical AI, Explainable AI (XAI) and Data Bias. The focus of the presentation was to be of societal contribution.

There was multiple people showing up afterwards thanking me for a great "easily digestible" presentation, one being Hallands Län.
youtube.com

Industry Days 2021
Based on my expertise I was invited as an AI Expert on a panel to discuss the industry of tomorrow and how to speed up the digitalization during Industry Days 2021 (by IEEE) in Västerås.
vimeo/ldlmedia (starts at ~2h13m)

Google DevFest by GDG - Managing the ML lifecycle without a headache (MLOps)
I did a presentation in person that unfortunately didn’t get recorded as the camera battery died. I did a reshoot but it didn’t become as natural as there was no audience. Either way I was speaking about the importance and how to manage the ML lifecycle without a headache. I presented some great tech agnostic MLOps tool (DVC/CML) and how to work with these.
youtube.com

Interesting Repositories

What Where Summary
londogard-nlp-toolkit Kotlin, JVM One of the, if not the, best Natural Language Processing Toolkit on the JVM written in Kotlin. Includes simple usage of Tokenizers (including HuggingFace), BPEmb, classifiers, stopwords, and much more!
Summarize Kotlin, JVM A summarize with two different variants which were state-of-the-art for extractive summarisation a few years ago when I implemented them. One is built on top of TFIDF, the other combines TFIDF, embeddings and more! Runnable on the Raspberry Pi by using some clever optimisations for Word Embeddings (try it here)
Text Gen Kotlin, JVM Text Generation without Machine Learning. Built purely on statistics, clever optimisations and smart lookups allows this too run on a Raspberry Pi
Java SymSpell Java A port of SymSpell to Java & the JVM. SymSpell is a incredibly efficient spell-correction tool
Fuzzy Match Kotlin, JVM A copy-and-insert ready to use library for fuzzy searching, works incredibly well (try it here)

Make sure to take a look at github.com/londogard, londogard.com & blog.londogard.com.

javasymspell's People

Contributors

lundez avatar trans avatar tyruiop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

javasymspell's Issues

SuggestionStage

I don't get the whole SuggestionStage thing. Comments say it's to help speed things up and be more memory efficient, but looking at the code I can't see how would do anything but the opposite. Creating and populating the staging might be faster, but in the end it still has to go in the final data structure. So it's just an intermediate data object. I must be missing something.

P.S. While I am not sure, because I don't quite get it, by using a Trie I think staging becomes a moot device anyway. Would like to be sure before I rip it out.

Suggestions with wrong distance

Hi!

I am trying your java version of SymSpell.
It works really well, however I have noticed a very strange behaviour.

I am using the provided demo, with frequency_dictionary_en_500_000.txt as dictionary ( find it here on SymSpell page )

If I type as input "the", only this word, without spaces, the suggestions are the following:

Enter input:   
the   
Lookup suggestion: the 1 6801236995 
Lookup suggestion: he 1 588605295
Lookup suggestion: they 1 326802098
Lookup suggestion: them 1 178377546
Lookup suggestion: she 1 140368041
Lookup suggestion: then 1 108733059
Lookup suggestion: thy 1 11355619
Lookup suggestion: thee 1 7761536
Lookup suggestion: tho 1 4200344
Lookup suggestion: tie 1 1819888
LookupCompound: the

The first is correctly the word itself, by the way it has a distance of 1 from the input word... It should be 0!

This is an isolated case, as with other correct words, I'm correctly getting a 0 distance.
Do you have any idea why this happens?

Completions

I would also like to add a completions method -- return all the words that start with a given prefix. It would be cool if it could also handle spelling errors in the prefix, but one step at a time.

Would this be an acceptable pull request?

Update to SymSpell 6.3

Any plan to update this release to latest SymSpell version?
The v6.4 has introduced important changes most of them related to the new WordSegmentation. A minor change to maxEditDistance to speed up lookup.

Thank you.

Good for Android App?

Would this library be a good choice for use in an Android App? Or is it too memory intensive?

And if yes, any guidance? One problem that occurs to me is that it has to be possible to load the dictionary from an android asset file. So a function that takes an InputStream instead of a file name might be necessary.

Word Segmentation

Hi,

I couldn't manage to access the correctedString and segmentedString fields of Word Segmentation, so I locally changed them to public fields and now it works.

Not sure if that's the correct way of doing it though?

Thanks for the port!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.