Comments (9)
Ok i didn't realized that was the same setting as in the other issue.
I increased it to 1024 and the results aren't perfect but way better !
The wasm is way bigger too ( 1.1MB) but it compresses very well (to 290KB) so even with very big number it's still usable.
I'll continue increasing it to see if it gets better.
from tinysearch.
I can confirm that it works well and the index is smaller whereas my json growed a little bit.
It came from 1.2M to 392K uncompressed or 319K gzipped to 281K gzipped.
Nice work ;-)
from tinysearch.
Thanks for the report and for working on the integration.
It finds results from words which doesn't exist at all ( try "xugn.ui").
This can happen because the data structure that we use to store words is a Bloom filter and it allows for a number of false positives. The number should be quite low in general, though. Hope that is the case for your.
Keep in mind that tinysearch will try to match all words in a query independently, so false positives also occur because a different word match in your query was matching.
It finds pages without the term (try "morse").
Also expected for the same reasons as above. The number of false positives should generally be low, though. Maybe you can test that on your end?
it can't find pages containing the term (try "midsommar")
Yup, same reason: fuzzy search. Checking the index you linked, the number of false positives is indeed quite high:
Hereditary 🚫
Les trackers 1st-party : quelles solutions ? ✅
Récap 03 : Octobre 2019 ✅
Sophie Nélisse 🚫
Jack Reynor ✅
Jack Nicholson 🚫
Nourrir ses gentils ptits chats 🚫
Would it help if the false-positive threshold would be configurable? The drawback of less false-positives is increased index size.
Can you check with a bigger "magic number" to see if that changes the number of false positives?
from tinysearch.
Yes i can try tweaking the magic number. Where is it ?
from tinysearch.
Perhaps changing the default value of TINY_MAGIC to 1024 would be an improvement for most people. I found the results to be much more intuitive, and the compressed wasm is not too large.
from tinysearch.
@johnmathews I'd accept a PR for that.
from tinysearch.
If someone finds the time, I'd really love to see how https://github.com/ayazhafiz/xorf compares to the cuckoo filter. Perhaps we won't need to deal with the magic number quirk in the future when switching to that.
from tinysearch.
At long last I found some time to port the project over to xorf.
The results are great!
- 20-25% smaller wasm size: The test index is 99kB now, 49kB gzipped, 40kB brotli
- Faster initial compilation time thanks to fewer dependencies.
- No more
TINY_MAGIC
. The Xor filter doesn't run into this problem.
Changes got merged into master if anyone wants to give it another try.
Gonna close this, but feel free to add a comment about your own experiences.
from tinysearch.
Thanks for the feedback @lord-re. 😊
from tinysearch.
Related Issues (20)
- Benchmark against competition HOT 7
- building docker image gives error from wasm-pack in step 19/28 HOT 5
- Could tiny search run in IOS and Android WebView browsers? HOT 2
- Error: No such file or directory (os error 2) HOT 19
- Too many false positives HOT 7
- Error: failed to execute "wasm-pack" "build" HOT 21
- Error: Engine directory could not be created HOT 8
- bug HOT 1
- Update Cargo build for tinysearch HOT 1
- Does tinysearch support stemming, stopwords, and CJK? HOT 2
- Is there a way to return the page description or body in the results? HOT 19
- Github action HOT 2
- Please tag release 0.7.0 HOT 3
- Failing to find Cargo.toml in temp directory? HOT 10
- For Zola sites, the tinysearch json index gets included in the sitemap.xml file. HOT 1
- Add a switch for build dir, and copy only the resulting wasm file to the path HOT 2
- latest tinysearch, cargo install tinysearch not working. HOT 12
- On npm HOT 5
- No such file or directory (os error 2) HOT 1
- Changes in the way browsers work with wasm causes issues with some js implementation to load the wasm. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tinysearch.