Comments (10)
Hey,
Thanks for suggesting this feature.
Why not ! Working on it...
from autofaiss.
I close the PR that I created because we will implement something similar on our side, once it is stable, we will consider adding here. Stay tuned.
from autofaiss.
Thanks a lot for looking into this! The update_index
branch seems usable already; I might start using it and will report any hiccups/successes here!
Cheers!
from autofaiss.
It does have issues, it won't work if the new index_key
is different than the one used in the already-built index. Essentially, we need to retrain the index in this case. That’s something I have missed in the branch. Thus, I don't recommend you use this branch right now.
from autofaiss.
if the index_key needs to be changed then it's not possible at all to add items to the index.
so why is that method not suitable?
from autofaiss.
Yes, I have the same thoughts as @rom1504; For many use cases, index_key would not be different and it should just work fine for those cases, right? I can report back about my specific use case soon anyway :) If it does not work, I'll report here, and await the revised PR later.
But even if it has issues that you'd like to solve before merging to the main, I think it is already useful in many cases. So thanks for working on this 👍
from autofaiss.
I agree what both of you said. It is suitable if we don’t need to update the clustering. What I had in mind is that the new interface can handle two main use cases ideally:
- keep the clustering unchanged, add more embedding on it;
- update the clustering with the existing embedding sand new embedding
@kushalkafle I would be happy to see your feedbacks after using it. Thanks :)
from autofaiss.
from autofaiss.
@rom1504 Pardon me for replying late. We would retrain both, it is indeed equivalent to rebuilding the index from scratch.
In my opinion, it is possible to provide an interface like update_index
branch whose responsibility is only to add more features/embeddings on a built index while keeping the index unchanged. It would be useful for autofaiss's users.
@nateagr is working on incremental indexing right now for our internal use cases, we will revisit this topic soon.
from autofaiss.
I think I want to chime in here as well. Overall, I think I agree with @rom1504.
-
If the index is going to be built from scratch, why does this even fall under the
update_index
's job? That is just the regular building of the index; there is noupdate
happening. -
Yes, what you described (i.e., only to add more features/embeddings on an already-built index) is exactly what I am suggesting, and I think it will be useful for multiple usages. Perhaps
add_features_to_index
is a better name 😄 -
I know this will change the index size and the query time(s) statistics, but that is something that can be re-benchmarked/adjusted without any retraining.
from autofaiss.
Related Issues (20)
- Support building binary vector index using Hamming distance? HOT 10
- build_index using too much RAM during training and crashes HOT 9
- build_index take much more time when decreasing max_index_memory_usage HOT 5
- autofaiss installation fails on python3.11
- augmenting embeddings with k labels to help segment searches HOT 2
- Can autofaiss take spark dataframe as input? HOT 3
- Query Result Distances Appear in Descending Order for ANN Search HOT 2
- What is the optimized way for Bulk Retrieval of Approximate Nearest Neighbors from Large 'autofaiss' Index" HOT 4
- Exact distances HOT 2
- IP and L2 distance out of range of (0,1) HOT 7
- Updating the pandas version
- PySpark cluster and session sizing
- [Bug?] Index retrieval is not self-consistent. HOT 4
- Cannot read embeddings from parquet files stored in S3
- fail to write index HOT 5
- README.md and requirements.txt should be in MANIFEST.in, not in data_files
- GPU Support HOT 7
- In-memory setup for `build_index` doesn't work HOT 7
- No embeddings found in folder HOT 2
- Create mapping between list of image files and index creates with emebddings
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autofaiss.