Git Product home page Git Product logo

Comments (15)

aarondandy avatar aarondandy commented on September 25, 2024 3

I have been thinking about this issue more recently and as mentioned in #43 have some ideas for at least handing more control over the timeout to the caller. A performance improvement may also be possible (see #33 ) but I think an API addition for options such as cancellation token should be the first step. I have some difficult work stuff going on but when I get some free time I may spend a weekend hacking on this stuff.

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024 1

Probably abysmal but I plan to get an en-US performance comparison for that soon after I make a few more attempts at this affix performance.

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024

That is pretty strange for sure! There is some logic within the suggest algorithm that will read the system time and stop producing suggestions after a certain amount of time. I would be curious to see if you get a consistent set of 3 or a larger count even on a faster machine. Or perhaps if you slow your machine down enough you may see more 0 counts. That would at least give more weight to the cause being the timer based short circuit in the code. I have been thinking about rewriting how that short circuit works. If that is the cause I may be more inclined to act on that. Where can I get a copy of that dictionary from?

from wecantspell.hunspell.

Tigrisor avatar Tigrisor commented on September 25, 2024

Thank you for your help !

I tried to time the calls to the Suggest() method to test your idea. If you're right, the calls returning no suggestion should take more time than the calls returning some suggestions.

var dictionaryFr = WordList.CreateFromFiles(ressources + "\\fr-toutesvariantes.dic", ressources + "\\fr-toutesvariantes.aff");

Stopwatch watch = new Stopwatch();

for (int i = 0; i < 100; i++)
{
    watch.Restart();
    List<string> suggestList = dictionaryFr.Suggest("Systemes").ToList();
    watch.Stop();
    System.Diagnostics.Debug.WriteLine(suggestList.Count + " ( "+ watch.ElapsedMilliseconds+"ms )");
}

I got something like

this
3 ( 338ms )
3 ( 271ms )
0 ( 262ms )
3 ( 345ms )
0 ( 293ms )
3 ( 314ms )
3 ( 294ms )
3 ( 314ms )
0 ( 277ms )
3 ( 296ms )
3 ( 332ms )
3 ( 295ms )
3 ( 300ms )
3 ( 328ms )
3 ( 267ms )
3 ( 321ms )
3 ( 310ms )
3 ( 275ms )
0 ( 288ms )
3 ( 280ms )
3 ( 263ms )
3 ( 273ms )
0 ( 267ms )
0 ( 263ms )
3 ( 300ms )
3 ( 278ms )
3 ( 267ms )
3 ( 300ms )
3 ( 269ms )
0 ( 285ms )
3 ( 273ms )
0 ( 286ms )
3 ( 281ms )
3 ( 278ms )
0 ( 257ms )
3 ( 288ms )
3 ( 294ms )
3 ( 285ms )
3 ( 292ms )
0 ( 295ms )
3 ( 291ms )
0 ( 298ms )
3 ( 296ms )
0 ( 265ms )
3 ( 294ms )
3 ( 304ms )
3 ( 284ms )
3 ( 270ms )
0 ( 275ms )
3 ( 296ms )
0 ( 272ms )
3 ( 326ms )
3 ( 303ms )
3 ( 269ms )
3 ( 296ms )
0 ( 277ms )
3 ( 285ms )
3 ( 300ms )
0 ( 280ms )
3 ( 293ms )
3 ( 272ms )
3 ( 296ms )
0 ( 277ms )
3 ( 314ms )
0 ( 261ms )
0 ( 280ms )
3 ( 288ms )
0 ( 283ms )
0 ( 262ms )
3 ( 275ms )
3 ( 260ms )
3 ( 312ms )
3 ( 283ms )
3 ( 271ms )
3 ( 274ms )
3 ( 270ms )
0 ( 269ms )
0 ( 267ms )
0 ( 295ms )
3 ( 311ms )
0 ( 289ms )
0 ( 372ms )
0 ( 291ms )
3 ( 275ms )
0 ( 277ms )
0 ( 280ms )
3 ( 289ms )
3 ( 283ms )
3 ( 276ms )
3 ( 316ms )
3 ( 297ms )
3 ( 264ms )
3 ( 283ms )
0 ( 264ms )
3 ( 293ms )
3 ( 273ms )
3 ( 265ms )

But I didn't get the expected results 😒, the mean time for zero results calls was ~280ms while the mean time for calls returning suggestions was ~292ms

Oh, and here is a copy of the dictionnary I use :
Dictionnary.zip

from wecantspell.hunspell.

funex avatar funex commented on September 25, 2024

Hi all!
Any updates on this issue?

from wecantspell.hunspell.

Tigrisor avatar Tigrisor commented on September 25, 2024

No, sorry, I gave up πŸ˜…

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024

Current status:

  • The release-4 branch offers better control over suggest limitations and timeouts
  • The release-4 branch should also offer better performance but, not good enough for this case 😬

Stuff still to try:

  • A sorted suffix/prefix index for the affix code could greatly improve performance here, or do nothing but is worth a try πŸ€·β€β™‚οΈ
  • Can try reducing temporary string allocations with a "mutable string" or a "value string builder" but most experimentation there hasn't shown promising improvements

from wecantspell.hunspell.

funex avatar funex commented on September 25, 2024

Current status:

  • The release-4 branch offers better control over suggest limitations and timeouts
  • The release-4 branch should also offer better performance but, not good enough for this case 😬

Stuff still to try:

  • A sorted suffix/prefix index for the affix code could greatly improve performance here, or do nothing but is worth a try πŸ€·β€β™‚οΈ
  • Can try reducing temporary string allocations with a "mutable string" or a "value string builder" but most experimentation there hasn't shown promising improvements

Hi! It's great to see that you're still active and trying to push WeCantSpell forward. How do you reckon suggest is currently, compared to the old NHunspell in terms of performance?

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024

@Tigrisor I think I found the performance issue. It turns out I'm a lazy person so I ported the affix search code in a lazy way using just an array and brute force. I updated all that to use a binary search tree now, and it is much faster for this dictionary now πŸŽ‰. The bad new though is that I always get 1 suggestion of "SystΓ¨mes" instead of 3 suggestions like in your initial post, and I don't know why that is.

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024

The new code is now 4.4x faster than the original library, which makes it dramatically faster than the C++ library. That is surprising enough that I now wonder if I did something wrong πŸ€·β€β™‚οΈ.

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024

I can reproduce the 3 suggestions on the old 3.x version of the library so I'm going to have to debug into that to see what went wrong there.

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024

Good news, this is not a bug πŸŽ‰. The old v3 code was so slow it was unable to generate any suggestions from the MapRelated code path when it should have done so. If there are no suggestions generated before this point in the code, then it will perform n-gram suggest operations. This is probably why the performance was so bad because the code went through the horrible performance of MapRelated and then into the horrible performance of NGram. Now that MapRelated is nice and fast, that isn't an issue any more. I put this dictionary through the old NHunspell library too just to be extra sure, and it also generated the same single suggest result. I think this should be enough to make a release candidate.

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024

Let me know if this release candidate works for you all: https://www.nuget.org/packages/WeCantSpell.Hunspell/4.0.0-rc01

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on September 25, 2024

@Tigrisor I published a new release which should resolve this issue, https://www.nuget.org/packages/WeCantSpell.Hunspell/4.0.0 . If this isn't enough, feel free to re-open.

@funex I'm pretty surprised honestly but the Suggest performance in this new release seems to be much faster than the older release of the C++ library. I posted my results in the readme: https://github.com/aarondandy/WeCantSpell.Hunspell/#performance ... I'm pretty sure this isn't because it's broken πŸ€£πŸ€·β€β™‚οΈ.

from wecantspell.hunspell.

funex avatar funex commented on September 25, 2024

Wow, looks very promising.

from wecantspell.hunspell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.