Comments (5)
Building an hnsw is indeed one of the slowest adding method, especially with random vectors.
This is calling faiss index.add
If you want to optimize for speed of building an index you can force the index key to not use hnsw
from autofaiss.
Building an hnsw is indeed one of the slowest adding method, especially with random vectors. This is calling faiss index.add
If you want to optimize for speed of building an index you can force the index key to not use hnsw
autofaiss build_index cost time 49min,faiss index add cost time 34 min
from autofaiss import setup_logging, build_index
from autofaiss import Timeit
import numpy as np
import faiss
import logging
import logging.config
import multiprocessing
setup_logging(logging.INFO)
faiss.omp_set_num_threads(multiprocessing.cpu_count())
embeddings = np.float32(np.random.rand(1000000, 512))
with Timeit("build index"):
index = faiss.index_factory(512, "HNSW32", faiss.METRIC_L2)
index.add(embeddings)
from autofaiss.
embedding_reader parameter max_piece_size and parallel_pieces need optimize?
autofaiss/autofaiss/indices/build.py
Lines 98 to 102 in d5c773f
from autofaiss.
What kind of local disk do you have ?
from autofaiss.
What kind of local disk do you have ?
SSD
from autofaiss.
Related Issues (20)
- Support building binary vector index using Hamming distance? HOT 10
- build_index using too much RAM during training and crashes HOT 9
- build_index take much more time when decreasing max_index_memory_usage HOT 5
- autofaiss installation fails on python3.11
- augmenting embeddings with k labels to help segment searches HOT 2
- Can autofaiss take spark dataframe as input? HOT 3
- Query Result Distances Appear in Descending Order for ANN Search HOT 2
- What is the optimized way for Bulk Retrieval of Approximate Nearest Neighbors from Large 'autofaiss' Index" HOT 4
- Exact distances HOT 2
- IP and L2 distance out of range of (0,1) HOT 7
- Updating the pandas version
- PySpark cluster and session sizing
- [Bug?] Index retrieval is not self-consistent. HOT 4
- Cannot read embeddings from parquet files stored in S3
- fail to write index HOT 5
- README.md and requirements.txt should be in MANIFEST.in, not in data_files
- GPU Support HOT 7
- In-memory setup for `build_index` doesn't work HOT 7
- No embeddings found in folder HOT 2
- Create mapping between list of image files and index creates with emebddings
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autofaiss.