machine： cpu-machine：Intel(R) Core(TM) i7-107

build_index is very slow about autofaiss HOT 5 OPEN

criteo commented on July 20, 2024

build_index is very slow

from autofaiss.

Comments (5)

rom1504 commented on July 20, 2024

Building an hnsw is indeed one of the slowest adding method, especially with random vectors.
This is calling faiss index.add

If you want to optimize for speed of building an index you can force the index key to not use hnsw

from autofaiss.

xiongqiangcs commented on July 20, 2024

Building an hnsw is indeed one of the slowest adding method, especially with random vectors. This is calling faiss index.add

If you want to optimize for speed of building an index you can force the index key to not use hnsw

autofaiss build_index cost time 49min，faiss index add cost time 34 min

from autofaiss import setup_logging, build_index
from autofaiss import Timeit
import numpy as np
import faiss
import logging
import logging.config
import multiprocessing

setup_logging(logging.INFO)
faiss.omp_set_num_threads(multiprocessing.cpu_count())


embeddings = np.float32(np.random.rand(1000000, 512))
with Timeit("build index"):
    index = faiss.index_factory(512, "HNSW32", faiss.METRIC_L2)
    index.add(embeddings)

from autofaiss.

xiongqiangcs commented on July 20, 2024

embedding_reader parameter max_piece_size and parallel_pieces need optimize?

autofaiss/autofaiss/indices/build.py

Lines 98 to 102 in d5c773f

 for batch_id, (vec_batch, ids_batch) in enumerate(embedding_reader(batch_size=batch_size)): 

 if add_embeddings_with_ids: 

 trained_index.add_with_ids(vec_batch, ids_batch["i"].to_numpy()) 

 else: 

 trained_index.add(vec_batch)

from autofaiss.

rom1504 commented on July 20, 2024

What kind of local disk do you have ?

from autofaiss.

xiongqiangcs commented on July 20, 2024

What kind of local disk do you have ?

SSD

from autofaiss.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

	for batch_id, (vec_batch, ids_batch) in enumerate(embedding_reader(batch_size=batch_size)):
	if add_embeddings_with_ids:
	trained_index.add_with_ids(vec_batch, ids_batch["i"].to_numpy())
	else:
	trained_index.add(vec_batch)

Comments (5)

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org