Hello Here is my code (it's pretty basic) : <div class="highligh

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Suggest() method result inconsitent about wecantspell.hunspell HOT 15 CLOSED

Tigrisor commented on September 25, 2024

Suggest() method result inconsitent

from wecantspell.hunspell.

Comments (15)

aarondandy commented on September 25, 2024 3

I have been thinking about this issue more recently and as mentioned in #43 have some ideas for at least handing more control over the timeout to the caller. A performance improvement may also be possible (see #33 ) but I think an API addition for options such as cancellation token should be the first step. I have some difficult work stuff going on but when I get some free time I may spend a weekend hacking on this stuff.

from wecantspell.hunspell.

aarondandy commented on September 25, 2024 1

Probably abysmal but I plan to get an en-US performance comparison for that soon after I make a few more attempts at this affix performance.

from wecantspell.hunspell.

aarondandy commented on September 25, 2024

That is pretty strange for sure! There is some logic within the suggest algorithm that will read the system time and stop producing suggestions after a certain amount of time. I would be curious to see if you get a consistent set of 3 or a larger count even on a faster machine. Or perhaps if you slow your machine down enough you may see more 0 counts. That would at least give more weight to the cause being the timer based short circuit in the code. I have been thinking about rewriting how that short circuit works. If that is the cause I may be more inclined to act on that. Where can I get a copy of that dictionary from?

from wecantspell.hunspell.

Tigrisor commented on September 25, 2024

Thank you for your help !

I tried to time the calls to the Suggest() method to test your idea. If you're right, the calls returning no suggestion should take more time than the calls returning some suggestions.

var dictionaryFr = WordList.CreateFromFiles(ressources + "\\fr-toutesvariantes.dic", ressources + "\\fr-toutesvariantes.aff");

Stopwatch watch = new Stopwatch();

for (int i = 0; i < 100; i++)
{
    watch.Restart();
    List<string> suggestList = dictionaryFr.Suggest("Systemes").ToList();
    watch.Stop();
    System.Diagnostics.Debug.WriteLine(suggestList.Count + " ( "+ watch.ElapsedMilliseconds+"ms )");
}

I got something like

this

3 ( 338ms )
3 ( 271ms )
0 ( 262ms )
3 ( 345ms )
0 ( 293ms )
3 ( 314ms )
3 ( 294ms )
3 ( 314ms )
0 ( 277ms )
3 ( 296ms )
3 ( 332ms )
3 ( 295ms )
3 ( 300ms )
3 ( 328ms )
3 ( 267ms )
3 ( 321ms )
3 ( 310ms )
3 ( 275ms )
0 ( 288ms )
3 ( 280ms )
3 ( 263ms )
3 ( 273ms )
0 ( 267ms )
0 ( 263ms )
3 ( 300ms )
3 ( 278ms )
3 ( 267ms )
3 ( 300ms )
3 ( 269ms )
0 ( 285ms )
3 ( 273ms )
0 ( 286ms )
3 ( 281ms )
3 ( 278ms )
0 ( 257ms )
3 ( 288ms )
3 ( 294ms )
3 ( 285ms )
3 ( 292ms )
0 ( 295ms )
3 ( 291ms )
0 ( 298ms )
3 ( 296ms )
0 ( 265ms )
3 ( 294ms )
3 ( 304ms )
3 ( 284ms )
3 ( 270ms )
0 ( 275ms )
3 ( 296ms )
0 ( 272ms )
3 ( 326ms )
3 ( 303ms )
3 ( 269ms )
3 ( 296ms )
0 ( 277ms )
3 ( 285ms )
3 ( 300ms )
0 ( 280ms )
3 ( 293ms )
3 ( 272ms )
3 ( 296ms )
0 ( 277ms )
3 ( 314ms )
0 ( 261ms )
0 ( 280ms )
3 ( 288ms )
0 ( 283ms )
0 ( 262ms )
3 ( 275ms )
3 ( 260ms )
3 ( 312ms )
3 ( 283ms )
3 ( 271ms )
3 ( 274ms )
3 ( 270ms )
0 ( 269ms )
0 ( 267ms )
0 ( 295ms )
3 ( 311ms )
0 ( 289ms )
0 ( 372ms )
0 ( 291ms )
3 ( 275ms )
0 ( 277ms )
0 ( 280ms )
3 ( 289ms )
3 ( 283ms )
3 ( 276ms )
3 ( 316ms )
3 ( 297ms )
3 ( 264ms )
3 ( 283ms )
0 ( 264ms )
3 ( 293ms )
3 ( 273ms )
3 ( 265ms )

But I didn't get the expected results 😢, the mean time for zero results calls was ~280ms while the mean time for calls returning suggestions was ~292ms

Oh, and here is a copy of the dictionnary I use :
Dictionnary.zip

from wecantspell.hunspell.

funex commented on September 25, 2024

Hi all!
Any updates on this issue?

from wecantspell.hunspell.

Tigrisor commented on September 25, 2024

No, sorry, I gave up 😅

from wecantspell.hunspell.

aarondandy commented on September 25, 2024

Current status:

The release-4 branch offers better control over suggest limitations and timeouts
The release-4 branch should also offer better performance but, not good enough for this case 😬

Stuff still to try:

A sorted suffix/prefix index for the affix code could greatly improve performance here, or do nothing but is worth a try 🤷‍♂️
Can try reducing temporary string allocations with a "mutable string" or a "value string builder" but most experimentation there hasn't shown promising improvements

from wecantspell.hunspell.

funex commented on September 25, 2024

Current status:

The release-4 branch offers better control over suggest limitations and timeouts

The release-4 branch should also offer better performance but, not good enough for this case 😬

Stuff still to try:

A sorted suffix/prefix index for the affix code could greatly improve performance here, or do nothing but is worth a try 🤷‍♂️

Can try reducing temporary string allocations with a "mutable string" or a "value string builder" but most experimentation there hasn't shown promising improvements

Hi! It's great to see that you're still active and trying to push WeCantSpell forward. How do you reckon suggest is currently, compared to the old NHunspell in terms of performance?

from wecantspell.hunspell.

aarondandy commented on September 25, 2024

@Tigrisor I think I found the performance issue. It turns out I'm a lazy person so I ported the affix search code in a lazy way using just an array and brute force. I updated all that to use a binary search tree now, and it is much faster for this dictionary now 🎉. The bad new though is that I always get 1 suggestion of "Systèmes" instead of 3 suggestions like in your initial post, and I don't know why that is.

from wecantspell.hunspell.

aarondandy commented on September 25, 2024

The new code is now 4.4x faster than the original library, which makes it dramatically faster than the C++ library. That is surprising enough that I now wonder if I did something wrong 🤷‍♂️.

from wecantspell.hunspell.

aarondandy commented on September 25, 2024

I can reproduce the 3 suggestions on the old 3.x version of the library so I'm going to have to debug into that to see what went wrong there.

from wecantspell.hunspell.

aarondandy commented on September 25, 2024

Good news, this is not a bug 🎉. The old v3 code was so slow it was unable to generate any suggestions from the MapRelated code path when it should have done so. If there are no suggestions generated before this point in the code, then it will perform n-gram suggest operations. This is probably why the performance was so bad because the code went through the horrible performance of MapRelated and then into the horrible performance of NGram. Now that MapRelated is nice and fast, that isn't an issue any more. I put this dictionary through the old NHunspell library too just to be extra sure, and it also generated the same single suggest result. I think this should be enough to make a release candidate.

from wecantspell.hunspell.

aarondandy commented on September 25, 2024

Let me know if this release candidate works for you all: https://www.nuget.org/packages/WeCantSpell.Hunspell/4.0.0-rc01

from wecantspell.hunspell.

aarondandy commented on September 25, 2024

@Tigrisor I published a new release which should resolve this issue, https://www.nuget.org/packages/WeCantSpell.Hunspell/4.0.0 . If this isn't enough, feel free to re-open.

@funex I'm pretty surprised honestly but the Suggest performance in this new release seems to be much faster than the older release of the C++ library. I posted my results in the readme: https://github.com/aarondandy/WeCantSpell.Hunspell/#performance ... I'm pretty sure this isn't because it's broken 🤣🤷‍♂️.

from wecantspell.hunspell.

funex commented on September 25, 2024

Wow, looks very promising.

from wecantspell.hunspell.

Suggest() method result inconsitent about wecantspell.hunspell HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent