Comments (15)
I have been thinking about this issue more recently and as mentioned in #43 have some ideas for at least handing more control over the timeout to the caller. A performance improvement may also be possible (see #33 ) but I think an API addition for options such as cancellation token should be the first step. I have some difficult work stuff going on but when I get some free time I may spend a weekend hacking on this stuff.
from wecantspell.hunspell.
Probably abysmal but I plan to get an en-US performance comparison for that soon after I make a few more attempts at this affix performance.
from wecantspell.hunspell.
That is pretty strange for sure! There is some logic within the suggest algorithm that will read the system time and stop producing suggestions after a certain amount of time. I would be curious to see if you get a consistent set of 3
or a larger count even on a faster machine. Or perhaps if you slow your machine down enough you may see more 0
counts. That would at least give more weight to the cause being the timer based short circuit in the code. I have been thinking about rewriting how that short circuit works. If that is the cause I may be more inclined to act on that. Where can I get a copy of that dictionary from?
from wecantspell.hunspell.
Thank you for your help !
I tried to time the calls to the Suggest() method to test your idea. If you're right, the calls returning no suggestion should take more time than the calls returning some suggestions.
var dictionaryFr = WordList.CreateFromFiles(ressources + "\\fr-toutesvariantes.dic", ressources + "\\fr-toutesvariantes.aff");
Stopwatch watch = new Stopwatch();
for (int i = 0; i < 100; i++)
{
watch.Restart();
List<string> suggestList = dictionaryFr.Suggest("Systemes").ToList();
watch.Stop();
System.Diagnostics.Debug.WriteLine(suggestList.Count + " ( "+ watch.ElapsedMilliseconds+"ms )");
}
I got something like
this
3 ( 338ms )
3 ( 271ms )
0 ( 262ms )
3 ( 345ms )
0 ( 293ms )
3 ( 314ms )
3 ( 294ms )
3 ( 314ms )
0 ( 277ms )
3 ( 296ms )
3 ( 332ms )
3 ( 295ms )
3 ( 300ms )
3 ( 328ms )
3 ( 267ms )
3 ( 321ms )
3 ( 310ms )
3 ( 275ms )
0 ( 288ms )
3 ( 280ms )
3 ( 263ms )
3 ( 273ms )
0 ( 267ms )
0 ( 263ms )
3 ( 300ms )
3 ( 278ms )
3 ( 267ms )
3 ( 300ms )
3 ( 269ms )
0 ( 285ms )
3 ( 273ms )
0 ( 286ms )
3 ( 281ms )
3 ( 278ms )
0 ( 257ms )
3 ( 288ms )
3 ( 294ms )
3 ( 285ms )
3 ( 292ms )
0 ( 295ms )
3 ( 291ms )
0 ( 298ms )
3 ( 296ms )
0 ( 265ms )
3 ( 294ms )
3 ( 304ms )
3 ( 284ms )
3 ( 270ms )
0 ( 275ms )
3 ( 296ms )
0 ( 272ms )
3 ( 326ms )
3 ( 303ms )
3 ( 269ms )
3 ( 296ms )
0 ( 277ms )
3 ( 285ms )
3 ( 300ms )
0 ( 280ms )
3 ( 293ms )
3 ( 272ms )
3 ( 296ms )
0 ( 277ms )
3 ( 314ms )
0 ( 261ms )
0 ( 280ms )
3 ( 288ms )
0 ( 283ms )
0 ( 262ms )
3 ( 275ms )
3 ( 260ms )
3 ( 312ms )
3 ( 283ms )
3 ( 271ms )
3 ( 274ms )
3 ( 270ms )
0 ( 269ms )
0 ( 267ms )
0 ( 295ms )
3 ( 311ms )
0 ( 289ms )
0 ( 372ms )
0 ( 291ms )
3 ( 275ms )
0 ( 277ms )
0 ( 280ms )
3 ( 289ms )
3 ( 283ms )
3 ( 276ms )
3 ( 316ms )
3 ( 297ms )
3 ( 264ms )
3 ( 283ms )
0 ( 264ms )
3 ( 293ms )
3 ( 273ms )
3 ( 265ms )
But I didn't get the expected results π’, the mean time for zero results calls was ~280ms while the mean time for calls returning suggestions was ~292ms
Oh, and here is a copy of the dictionnary I use :
Dictionnary.zip
from wecantspell.hunspell.
Hi all!
Any updates on this issue?
from wecantspell.hunspell.
No, sorry, I gave up π
from wecantspell.hunspell.
Current status:
- The release-4 branch offers better control over suggest limitations and timeouts
- The release-4 branch should also offer better performance but, not good enough for this case π¬
Stuff still to try:
- A sorted suffix/prefix index for the affix code could greatly improve performance here, or do nothing but is worth a try π€·ββοΈ
- Can try reducing temporary string allocations with a "mutable string" or a "value string builder" but most experimentation there hasn't shown promising improvements
from wecantspell.hunspell.
Current status:
- The release-4 branch offers better control over suggest limitations and timeouts
- The release-4 branch should also offer better performance but, not good enough for this case π¬
Stuff still to try:
- A sorted suffix/prefix index for the affix code could greatly improve performance here, or do nothing but is worth a try π€·ββοΈ
- Can try reducing temporary string allocations with a "mutable string" or a "value string builder" but most experimentation there hasn't shown promising improvements
Hi! It's great to see that you're still active and trying to push WeCantSpell forward. How do you reckon suggest is currently, compared to the old NHunspell in terms of performance?
from wecantspell.hunspell.
@Tigrisor I think I found the performance issue. It turns out I'm a lazy person so I ported the affix search code in a lazy way using just an array and brute force. I updated all that to use a binary search tree now, and it is much faster for this dictionary now π. The bad new though is that I always get 1 suggestion of "SystΓ¨mes" instead of 3 suggestions like in your initial post, and I don't know why that is.
from wecantspell.hunspell.
The new code is now 4.4x faster than the original library, which makes it dramatically faster than the C++ library. That is surprising enough that I now wonder if I did something wrong π€·ββοΈ.
from wecantspell.hunspell.
I can reproduce the 3 suggestions on the old 3.x version of the library so I'm going to have to debug into that to see what went wrong there.
from wecantspell.hunspell.
Good news, this is not a bug π. The old v3 code was so slow it was unable to generate any suggestions from the MapRelated code path when it should have done so. If there are no suggestions generated before this point in the code, then it will perform n-gram suggest operations. This is probably why the performance was so bad because the code went through the horrible performance of MapRelated and then into the horrible performance of NGram. Now that MapRelated is nice and fast, that isn't an issue any more. I put this dictionary through the old NHunspell library too just to be extra sure, and it also generated the same single suggest result. I think this should be enough to make a release candidate.
from wecantspell.hunspell.
Let me know if this release candidate works for you all: https://www.nuget.org/packages/WeCantSpell.Hunspell/4.0.0-rc01
from wecantspell.hunspell.
@Tigrisor I published a new release which should resolve this issue, https://www.nuget.org/packages/WeCantSpell.Hunspell/4.0.0 . If this isn't enough, feel free to re-open.
@funex I'm pretty surprised honestly but the Suggest performance in this new release seems to be much faster than the older release of the C++ library. I posted my results in the readme: https://github.com/aarondandy/WeCantSpell.Hunspell/#performance ... I'm pretty sure this isn't because it's broken π€£π€·ββοΈ.
from wecantspell.hunspell.
Wow, looks very promising.
from wecantspell.hunspell.
Related Issues (20)
- [Q] Add custom words to loaded dictionary? HOT 5
- Strong-Naming The Library HOT 1
- Any suggestion on how to use this library for real-time word suggestions? HOT 5
- Areas for improvement: Infrastructure HOT 1
- Areas for improvement: Affix HOT 1
- Areas for improvement: Word List HOT 1
- Restore disabled test: allcaps.aff
- Future target frameworks HOT 7
- Suggest algorithm optimization: Levenshtein distance HOT 1
- can i use it as dotnet tool as part of msbuild in csproj? HOT 2
- Occasional System.IndexOutOfRangeException for Suggest HOT 6
- How to ignore punctuation symbols HOT 3
- Parsing text for individual words HOT 3
- First algorithm fails on E5-26xx HOT 5
- Get words that start with X HOT 1
- Some suggestions have incorrect spelling HOT 2
- Support for UWP HOT 1
- Suggest Method Returns Only Single Suggestion HOT 1
- Russian-English Bilingual not working correctly HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wecantspell.hunspell.