Git Product home page Git Product logo

Comments (9)

lord-re avatar lord-re commented on May 22, 2024 2

Ok i didn't realized that was the same setting as in the other issue.
I increased it to 1024 and the results aren't perfect but way better !

The wasm is way bigger too ( 1.1MB) but it compresses very well (to 290KB) so even with very big number it's still usable.
I'll continue increasing it to see if it gets better.

from tinysearch.

lord-re avatar lord-re commented on May 22, 2024 1

I can confirm that it works well and the index is smaller whereas my json growed a little bit.
It came from 1.2M to 392K uncompressed or 319K gzipped to 281K gzipped.

Nice work ;-)

from tinysearch.

mre avatar mre commented on May 22, 2024

Thanks for the report and for working on the integration.

It finds results from words which doesn't exist at all ( try "xugn.ui").

This can happen because the data structure that we use to store words is a Bloom filter and it allows for a number of false positives. The number should be quite low in general, though. Hope that is the case for your.
Keep in mind that tinysearch will try to match all words in a query independently, so false positives also occur because a different word match in your query was matching.

It finds pages without the term (try "morse").

Also expected for the same reasons as above. The number of false positives should generally be low, though. Maybe you can test that on your end?

it can't find pages containing the term (try "midsommar")

Yup, same reason: fuzzy search. Checking the index you linked, the number of false positives is indeed quite high:

Hereditary 🚫 
Les trackers 1st-party : quelles solutions ? ✅
Récap 03 : Octobre 2019 ✅
Sophie Nélisse 🚫 
Jack Reynor ✅
Jack Nicholson 🚫 
Nourrir ses gentils ptits chats 🚫 

Would it help if the false-positive threshold would be configurable? The drawback of less false-positives is increased index size.
Can you check with a bigger "magic number" to see if that changes the number of false positives?

from tinysearch.

lord-re avatar lord-re commented on May 22, 2024

Yes i can try tweaking the magic number. Where is it ?

from tinysearch.

johnmathews avatar johnmathews commented on May 22, 2024

Perhaps changing the default value of TINY_MAGIC to 1024 would be an improvement for most people. I found the results to be much more intuitive, and the compressed wasm is not too large.

from tinysearch.

mre avatar mre commented on May 22, 2024

@johnmathews I'd accept a PR for that.

from tinysearch.

mre avatar mre commented on May 22, 2024

If someone finds the time, I'd really love to see how https://github.com/ayazhafiz/xorf compares to the cuckoo filter. Perhaps we won't need to deal with the magic number quirk in the future when switching to that.

from tinysearch.

mre avatar mre commented on May 22, 2024

At long last I found some time to port the project over to xorf.

The results are great!

  • 20-25% smaller wasm size: The test index is 99kB now, 49kB gzipped, 40kB brotli
  • Faster initial compilation time thanks to fewer dependencies.
  • No more TINY_MAGIC. The Xor filter doesn't run into this problem.

Changes got merged into master if anyone wants to give it another try.
Gonna close this, but feel free to add a comment about your own experiences.

from tinysearch.

mre avatar mre commented on May 22, 2024

Thanks for the feedback @lord-re. 😊

from tinysearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.