Git Product home page Git Product logo

Comments (5)

rom1504 avatar rom1504 commented on July 20, 2024

Building an hnsw is indeed one of the slowest adding method, especially with random vectors.
This is calling faiss index.add

If you want to optimize for speed of building an index you can force the index key to not use hnsw

from autofaiss.

xiongqiangcs avatar xiongqiangcs commented on July 20, 2024

Building an hnsw is indeed one of the slowest adding method, especially with random vectors. This is calling faiss index.add

If you want to optimize for speed of building an index you can force the index key to not use hnsw

autofaiss build_index cost time 49min,faiss index add cost time 34 min

from autofaiss import setup_logging, build_index
from autofaiss import Timeit
import numpy as np
import faiss
import logging
import logging.config
import multiprocessing

setup_logging(logging.INFO)
faiss.omp_set_num_threads(multiprocessing.cpu_count())


embeddings = np.float32(np.random.rand(1000000, 512))
with Timeit("build index"):
    index = faiss.index_factory(512, "HNSW32", faiss.METRIC_L2)
    index.add(embeddings)

image

from autofaiss.

xiongqiangcs avatar xiongqiangcs commented on July 20, 2024

embedding_reader parameter max_piece_size and parallel_pieces need optimize?

for batch_id, (vec_batch, ids_batch) in enumerate(embedding_reader(batch_size=batch_size)):
if add_embeddings_with_ids:
trained_index.add_with_ids(vec_batch, ids_batch["i"].to_numpy())
else:
trained_index.add(vec_batch)

from autofaiss.

rom1504 avatar rom1504 commented on July 20, 2024

What kind of local disk do you have ?

from autofaiss.

xiongqiangcs avatar xiongqiangcs commented on July 20, 2024

What kind of local disk do you have ?

SSD

from autofaiss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.