criteo / autofaiss Goto Github PK

View Code? Open in Web Editor NEW

777.0 777.0 72.0 11.48 MB

Automatically create Faiss knn indices with the most optimal similarity search parameters.

Home Page: https://criteo.github.io/autofaiss/

License: Apache License 2.0

Makefile 1.39% Python 98.61%

autofaiss's People

Contributors

Stargazers

Watchers

autofaiss's Issues

Distributed training

Hi,
thanks to all maintainers of this project, that's a great tool to streamline the building and tuning of a Faiss index.

I have a quick dumb question about the training of an index in distributed mode. Am I correct that the training is done on the host, i.e non distributed, and that only the adding/optimizing part is distributed ? After a quick look at the code and doc, I feel like that's the case, right ? If that's the case, would there be a possibility of training the index in a distributed fashion?

consider using distributed kmeans in distributed mode for a better training

https://github.com/facebookresearch/faiss/blob/b8fe92dfee9ea6f9c8cae27e4fc3ffeb12b5c4d2/benchs/distributed_ondisk/README.md#distributed-k-means

index with scalar quantization not supported

I wanted to build IVF262144_HNSW32,SQ8, but the estimation of the required memory was way too big.

Build index throw error raise ValueError(f"Unknown index type: {index_key}") ValueError: Unknown index type: HNSW15

create an helper function quantize(embedding) -> index

that does not require the user to save their embedding to use autofaiss

make an example.py out of it

Control verbosity of messages

Hi, thanks for this library, it really helps, when working with faiss! One minor problem I have is that I would like to control the verbosity of the messages, since I use this autofaiss in my own library. The simplest way to do that would probably through the use of python's logging module.

Is there anything planned in that regard?

Torch Tensor support?

I want to ask whether doing KNN search with torch tensors is supported? Many thanks!

Make ingestion pipeline require less disk space

Currently the flow is:

download a large amount of embeddings
convert to numpy
run autofaiss to produce an index

It works well but requires a large amount of disk space

It's possible to instead do download -> convert -> add for each part of the embedding collection (and remove temporary files when doing the next part)
One way to do this could be to opensource the pyspark job doing this
It could also be possible to implement this directly in python here.

A simple way to do this could also be to have better support of remote file systems directly in quantize.

Vector normalization while building index

Hi!
According to the docs faiss doens't natively support cosine similarity as distance metric. The closest one is inner product which additionaly needs to prenormalize embedding vectors. In FAQ authors propose a way to do it manually with their function faiss.normalize_L2.
I have exactly the same case and would be glad, if autofaiss have an optional flag which additionally prenormalize vectors before building index.
It seems to me that it's not so difficult and ones should add faiss.normalize_L2 to each place where iterate over embedding_reader. If so i can make a PR.

make autofaiss not use TemporaryDirectory

TemporaryDirectory is a local folder which may not have any room
the user should specify what is the temporary folder (in fact we already have an option for this)

avoid storing overhead/training parameters in small indices

Merging could be made more efficient by not storing the training parameters (centroids) in each small index
This could reduce intermediary small indices size from 400GB to 40GB for 3B items
and make #55 not as necessary

use feature to specify the scanner in an ivf index to implement one stage filtering

https://github.com/facebookresearch/faiss/wiki/Inverted-list-objects-and-scanners#the-invertedlistscanner-object

this would require some c++ code, but it seems possible to have one stage filtering directly with faiss by overriding that scanner to add a masking feature (for example mask to keep only samples of a given language, or a given partner)

support building the index on spark optionally

https://github.com/horovod/horovod/blob/386be429b1417a1f6cb5e715bbe36efd2e74f402/horovod/spark/runner.py#L244 is a good trick to let the user build his own spark context

x8 vs x4fsr

INFO:autofaiss: Computing best hyperparameters for index faiss_titles.faiss 05/05/2022, 07:16:53                                                            
WARNING:autofaiss:The maximum nearest neighbors coverage is 10.65% for this index. It means that when requesting 20 nearest neighbors, the average number of retrieved neighbors will be 2. The program will try to find the best hyperparameters to reach 95% of this max coverage at least, and then will optimize the search time for this target. The index search speed could be higher than the requested max search speed.

What can we do to prevent this?

This happened with "OPQ768_768,IVF262144_HNSW32,PQ768x8" -> bad max coverage
With the index_key "OPQ768_768,IVF262144_HNSW32,PQ768x4fsr", everything was ok. The vectors were just a bit too compressed.

My d is 768.

Thank you

fix estimation of training memory used by autofaiss

just tried it and the new estimation at https://github.com/criteo/autofaiss/pull/81/files doesn't fully capture the memory needed for training

when training an index such as OPQ32_224,IVF131072_HNSW32,PQ32x8 faiss trains the index in 2 steps
The first step seems to be indeed using the memory assumed by the current estimation (for example 21.5GB for 11M vectors of dimension 512) but then the second step uses some more ram.
I am not sure yet what are these 2 steps, but I'd guess something like a primary then secondary index

Let's figure it out then add some more tests for this (could be scheduled tests instead of tests that run for every commit)

module 'faiss' has no attribute 'swigfaiss'

python 3.8.12
autofaiss                 2.13.2                   pypi_0    pypi
faiss-cpu                 1.7.2                    pypi_0    pypi
libfaiss                  1.7.2            h2bc3f7f_0_cpu    pytorch

First of all, thank you for the great project! I get the error: module 'faiss' has no attribute 'swigfaiss' when running the following command:

import autofaiss

autofaiss.build_index(
    "embeddings.npy",
    "autofaiss.index",
    "autofaiss.json",
    metric_type="ip",
    should_be_memory_mappable=True,
    make_direct_map=True)

The error appears when running it for make_direct_map=True.

Tested using conda 4.11.0 or mamba 0.15.3 using pytorch or conda-forge channel.

Fix potential out of disk problem when producing N indices

When we produce N indices (with nb_indices_to_keep larger than 1), within the function of optimize_and_measure_indices, we download N indices from remote in one shot (see here), if the machine running autofaiss has limited disk space, it would fail due to No space left error.

Add func for load npz vectors

Hi! I have a numpy matrices that saved as npz files. Unfortunately Autofaiss support only npy. Can you add that functionality?

support reading/writing indices to any filesystem using fsspec

make test_get_optimal_hyperparameters less slow

it takes many minutes to run it

Make current available memory properly aggregate all the memory needs

the index final size should be subtracted from the amount of memory adding is allowed to use
the index untrained size should be subtracted from the amount of memory training is allowed to use

this would make it possible to have stronger guaranties about how much memories autofaiss would use

try to use memory mapping instead of holding all training vectors in memory

might also be done for the index itself

would unlock training with more points and building larger indices with a lower memory

[Feature Request:] Add new features to a previously built index

Right now there does not seem to be an easy way to take an already-built index and add more embeddings to it (from the same distribution). This is obviously already indirectly supported by / possible with autofaiss because distributed training already does it, and also it is something easily supported by FAISS backbone. But I wonder if we can expose an easy interface to take a built index and add more features from a new set of embeddings (Using all the bells and whistles provided by autofaiss/embedding-reader for reading embeddings from a numpy-parquet format). Perhaps a update_index interface?

Thanks!

get_optimal_index_keys_v2 support faiss AutoTune

autofaiss/autofaiss/external/optimize.py

Lines 139 to 159 in d5c773f

 def get_optimal_index_keys_v2( 

 nb_vectors: int, 

 dim_vector: int, 

 max_index_memory_usage: str, 

 flat_threshold: int = 1000, 

 quantization_threshold: int = 10000, 

 force_pq: Optional[int] = None, 

 make_direct_map: bool = False, 

 should_be_memory_mappable: bool = False, 

 ivf_flat_threshold: int = 1_000_000, 

 use_gpu: bool = False, 

 ) -> List[str]: 

 """ 

  Gives a list of interesting indices to try, *the one at the top is the most promising* 

  See: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index for 

  detailed explanations. 

  """ 

 # Exception cases:

add an option to add new samples to an existing index

Make embedding iterator faster on high latency file systems

s3 and hdfs are low latency high bandwidth file systems
On these fs, fetching files sequentially is slow
Today our embedding iterator read files sequentially

This could be made faster by reading files in parallel or even parts of files in parallel using pyarrow readers that includes threads internally

add a filter on the candidate index key

for example if we want only ivf

use embedding reader start:end feature to get a proper sampled training set

same for the evaluation set

currently we use the first N vectors for both training and evaluation which is not ideal, especially if the embedding set is not randomly shuffled

build_index can't handle empty numpy files

Hello,

I'm currently running a workflow in argo which is generating several embedding files, in parallel, based on a database search.
If no data was found, the workflow returns a empty numpy file:

np.save(os.path.join(output, "features", filename), np.empty(0, np.float32))

Sadly the build_index is not capable of handling those files:

Using 4 omp threads (processes), consider increasing --nb_cores if you have more
Launching the whole pipeline 04/08/2022, 09:54:53
Reading total number of vectors and dimension 04/08/2022, 09:54:53

  0%|          | 0/16 [00:00<?, ?it/s]
 19%|█▉        | 3/16 [00:00<00:00, 29.92it/s]
 56%|█████▋    | 9/16 [00:00<00:00, 87.73it/s]
>>> Finished "Reading total number of vectors and dimension" in 0.1517 secs
>>> Finished "Launching the whole pipeline" in 0.1517 secs
Traceback (most recent call last):
  File "/usr/local/bin/autofaiss", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/autofaiss/external/quantize.py", line 395, in main
    fire.Fire({"build_index": build_index, "tune_index": tune_index, "score_index": score_index})
  File "/usr/local/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/autofaiss/external/quantize.py", line 143, in build_index
    nb_vectors, vec_dim = read_total_nb_vectors_and_dim(
  File "/usr/local/lib/python3.8/site-packages/autofaiss/readers/embeddings_iterators.py", line 258, in read_total_nb_vectors_and_dim
    for c in p.imap_unordered(file_to_line_count, file_paths):
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.8/site-packages/autofaiss/readers/embeddings_iterators.py", line 252, in file_to_line_count
    return matrix_reader.get_row_count()
  File "/usr/local/lib/python3.8/site-packages/autofaiss/readers/embeddings_iterators.py", line 101, in get_row_count
    return self.get_shape()[0]

Would be great if it could handle it, by just showing a waning in the logs or a flag to allow it.

Windows parallelization

Hi! Thank you for the great project! Unfortunately I'm experiencing some issues, which could be caused by Windows (10 Pro) and I'm not sure how to solve them.

I installed autofaiss with conda into a new env with Python 3.6. First, I had problems with import:
ImportError: DLL load failed while importing _swigfaiss: The specified module could not be found.

I solved that by first installing openblass, numpy and faiss from conda-forge:
conda create --name faiss_env python=3.6
conda activate faiss_env
conda install conda-forge::blas=*=openblas
conda install -c conda-forge numpy
conda install -c conda-forge faiss
pip install autofaiss

Then I tried to run the example from README, but I have encountered an error in embedding_reader:

~\.conda\envs\faiss_env\lib\site-packages\embedding_reader\get_file_list.py in _get_file_list(path, file_format, sort_result)
     42     path = make_path_absolute(path)
     43     fs, path_in_fs = fsspec.core.url_to_fs(path)
---> 44     prefix = path[: path.index(path_in_fs)]
ValueError: substring not found

I found out that the problem is in the fsspec.core.url_to_fs method, namely in the private method _strip_protocol on the line 402 in fsspec\core.py:
urlpath = fs._strip_protocol(url)
This line changes backward slashes to forward slashes and therefore the substring path_in_fs is not found in the string path.

Now comes the incomprehensible part: when I changed the private method _strip_protocol to general method strip_protocol (I only deleted the leading underscore), the ValueError disapeared and the function preserved backward slashes in the path... but then another error appeared:
RuntimeError: Error in __cdecl faiss::FileIOWriter::FileIOWriter(const char *) at D:\a\faiss-wheels\faiss-wheels\faiss\faiss\impl\io.cpp:98: Error: 'f' failed: could not open C:\Users\USER\AppData\Local\Temp\tmp2jqscc1t for writing: Permission denied

This seems to me like the problem with parallelization and I don't know how to solve it. I suppose that the solution of the ValueError was not the correct one and there is still some problem with Windows implementation.

Can you give me some advice how to find out a solution to this?

Thanks!

add option to save keys from parquet embeddings into a new parquet collection

to avoid reading the embeddings parquet a second time, we could consider extracting, yielding and saving the keys from the parquet files in the read embeddings function.
These keys could be saved either as parquet, either in some format convenient for fast random access (eg arrow, hdf5 for one way, leveldb for 2 way).
That would probably be convenient but let's keep this for another PR

(Another option is to do this in another utility that would read only the key column, to be seen what is best)

consider implementing the embedding id as an array of byte instead of long in faiss

it would decrease significantly the 8 byte overhead of each item
Storing 2^63 items in an index is not possible

use verbose option of index to display more information during training

build_index is very slow

machine：

cpu-machine：Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
mem: 32G
cpu-cores: 16

code：

from autofaiss import build_index
import numpy as np

embeddings = np.float32(np.random.rand(1000000, 512))
index, index_infos = build_index(embeddings, save_on_disk=False)

log：

Add all parameters from doc to readme

 embeddings_path : str
        Local path containing all preprocessed vectors and cached files.
        Files will be added if empty.
    output_path: str
        Destination path of the quantized model on local machine.
    index_key: Optinal(str)
        Optional string to give to the index factory in order to create the index.
        If None, an index is chosen based on an heuristic.
    index_param: Optional(str)
        Optional string with hyperparameters to set to the index.
        If None, the hyper-parameters are chosen based on an heuristic.
    max_index_query_time_ms: float
        Bound on the query time for KNN search, this bound is approximative
    max_index_memory_usage: str
        Maximum size allowed for the index, this bound is strict
    current_memory_available: str
        Memory available on the machine creating the index, having more memory is a boost
        because it reduces the swipe between RAM and disk.
    use_gpu: bool
        Experimental, gpu training is faster, not tested so far
    metric_type: str
        Similarity function used for query:
            - "ip" for inner product
            - "l2" for euclidian distance

add option to build the index with a direct map to enable fast reconstruction

simply call faiss.extract_index_ivf(index).set_direct_map_type(faiss.DirectMap.Array) under an option before starting the .add there https://github.com/criteo/autofaiss/blob/master/autofaiss/external/build.py#L171

use merging strategy in non-pyspark mode as well

the strategy to create a few small indices the memory usage during adding and (if using the special merge on disk function) completely cap the memory used by autofaiss in general, making it possible to create arbitrarily big indices with a fixed amount of ram

let's use that strategy not only for pyspark mode, but even for the normal mode
adding N indices to normal mode should also be possible by reusing the code from distributed

decrease merge time in distributed mode by implemented 2 stage merging

Currently merging takes a while because retrieving thousands of indices (with overhead) in a single node can be slow
this can be made faster by doing a 2 stage merge

split 10000 indices in batch of 100 and merge them in parallel
merge the 100 remaining one on driver

multi index ideas

building one index or a thousand indices from one embedding set has the same cost if doing one training and grouping at read time (allows doing one index per strict category)
building N index-parts then merging may make it easier to parallelize reading and building. it could also post pone the memory cost only at merge time, which might be beneficial (for example unlocks building in many memory constrained executors then merging in one big machine afterwards, or maybe even merging with memory mapping to use no memory for merging)

some info at https://github.com/facebookresearch/faiss/tree/main/benchs/distributed_ondisk and https://github.com/facebookresearch/faiss/wiki/Indexing-1T-vectors and https://github.com/facebookresearch/faiss/blob/151e3d7be54aec844b6328dc3e7dd0b83fcfa5bc/faiss/invlists/OnDiskInvertedLists.cpp

Tests

check hnsw size > flat size

decrease memory used by merging

Currently merging in distributed mode requires to store the whole index in memory
Possible strategies:

improve faiss merge into to avoid putting everything in memory
producing N index instead of one and letting the user search in all of them at search time

Suspicious constant 1-recall score

I have trained 3 different index and every time, my 1-recall@20 are exactly the same:

INFO:autofaiss: 1-recall@20: 0.802
INFO:autofaiss: 1-recall@40: 0.824

But there is some variation in the 20-recall and 40-recall scores.

3 digits of exactitude is too much.

What do you think about it?

add_with_ids is not implemented for Flat indexes

Hello, I'm encountering an issue using autofaiss with flat indexes.
build_index raises an error (in my case, when embeddings are ndarray, I did not test with parquet embeddings) in distributed mode, for flat indexes. This error could be related to facebookresearch/faiss#1212 (method index.add_with_ids is not implemented for flat indexes).

from autofaiss import build_index

build_index(
    embeddings=np.ones((100, 512)),
    distributed="pyspark",
    should_be_memory_mappable=True,
    index_path="hdfs://root/user/foo/knn.index",
    index_key="Flat",
    nb_cores=20,
    max_index_memory_usage="32G",
    current_memory_available="48G",
    ids_path="hdfs://root/user/foo/test_indexing_out/ids",
    temporary_indices_folder="hdfs://root/user/foo/indices/tmp/",
    nb_indices_to_keep=5,
    index_infos_path="hdfs://root/user/r.laby/test_indexing_out/index_infos.json",
)

raises

RuntimeError: Error in virtual void faiss::Index::add_with_ids(faiss::Index::idx_t, const float*, const idx_t*) at /project/faiss/faiss/Index.cpp:39: add_with_ids not implemented for this type of index

Is it expected ? Or could this be fixed ?
Thanks !

Compute medium metrics (recall) during index building by using the distributed ground truth trick

While adding parts to the index, compute the ground truth on each part and merge them at the end.
Make it possible to compute the recall without loading the embeddings twice

first step that is worse: add an option to compute the medium metrics with the current state of the code

GPU on A100

import numpy as np
from autofaiss import build_index

embeddings = np.float32(np.random.rand(700, 700))


build_index(
    embeddings=embeddings,  # type: ignore
    index_path="knn.index",
    index_infos_path="infos.json",
    should_be_memory_mappable=True,
    use_gpu=True,
)

On my A100, the use_gpu=True breaks the flow.

add a test using hdfs as a file system to validate file system support

using https://github.com/beyondstorage/setup-hdfs

Optimize efSearch search by using binary search the second time

Currently efSearch is done in 2 stage:

binary search with a low accuracy estimation (low timeout)
linear search with a high accuracy estimation (high timeout) with a surrounding range
The second one could be done with binary search too

get_optimal_index_keys_v2 returns an empty list

I am using autofaiss 2.14.0 and it works for some parts of the data I am working on, but not for some. I keep getting this error and I do not know where to look at:

2022-04-21 17:46:40,649 [INFO]: There are 16325691 embeddings of dim 768
2022-04-21 17:46:40,653 [INFO]: >>> Finished "Reading total number of vectors and dimension" in 37.7308 secs
2022-04-21 17:46:40,653 [INFO]:         Compute estimated construction time of the index 04/21/2022, 17:46:40
2022-04-21 17:46:40,659 [INFO]:                 -> Train: 16.7 minutes
2022-04-21 17:46:40,659 [INFO]:                 -> Add: 2.3 minutes
2022-04-21 17:46:40,659 [INFO]:                 Total: 19.0 minutes
2022-04-21 17:46:40,659 [INFO]:         >>> Finished "Compute estimated construction time of the index" in 0.0057 secs
2022-04-21 17:46:40,659 [INFO]:         Checking that your have enough memory available to create the index 04/21/2022, 17:46:40
2022-04-21 17:46:40,802 [INFO]:         >>> Finished "Checking that your have enough memory available to create the index" in 0.1431 secs
2022-04-21 17:46:40,803 [INFO]: >>> Finished "Launching the whole pipeline" in 37.8808 secs
Traceback (most recent call last):
  File "process.py", line 26, in <module>
    chunks_to_precalculated_knn_(
  File "/home/x_ehsdo/.local/lib/python3.8/site-packages/retro_pytorch/retrieval.py", line 373, in chunks_to_precalculated_knn_
    index, embeddings = chunks_to_index_and_embed(
  File "/home/x_ehsdo/.local/lib/python3.8/site-packages/retro_pytorch/retrieval.py", line 334, in chunks_to_index_and_embed
    index = index_embeddings(
  File "/home/x_ehsdo/.local/lib/python3.8/site-packages/retro_pytorch/retrieval.py", line 288, in index_embeddings
    build_index(
  File "/home/x_ehsdo/.local/lib/python3.8/site-packages/autofaiss/external/quantize.py", line 224, in build_index
    necessary_mem, index_key_used = estimate_memory_required_for_index_creation(
  File "/home/x_ehsdo/.local/lib/python3.8/site-packages/autofaiss/external/build.py", line 46, in estimate_memory_required_for_index_creation
    index_key = get_optimal_index_keys_v2(
IndexError: list index out of range

add option to cap estimated training time

if enabled, avoid doing IVFFlat large trainings

Misunderstanding of the estimated computing time

I am not sure whether I misunderstand something or there is an error, but when building my index with autofaiss is written Train: 16.7 minutes but takes ~11 secs Finished "Launching the whole pipeline" in 11.1440 secs?

Using 16 omp threads (processes), consider increasing --nb_cores if you have more
Launching the whole pipeline 01/28/2022, 08:15:47
There are 4269 embeddings of dim 1024
	Compute estimated construction time of the index 01/28/2022, 08:15:47
		-> Train: 16.7 minutes
		-> Add: 0.0 seconds
		Total: 16.7 minutes
	>>> Finished "Compute estimated construction time of the index" in 0.0000 secs
	Checking that your have enough memory available to create the index 01/28/2022, 08:15:47
20.6MB of memory will be needed to build the index (more might be used if you have more)
	>>> Finished "Checking that your have enough memory available to create the index" in 0.0009 secs
	Selecting most promising index types given data characteristics 01/28/2022, 08:15:47
	>>> Finished "Selecting most promising index types given data characteristics" in 0.0000 secs
	Creating the index 01/28/2022, 08:15:47
		-> Instanciate the index HNSW15 01/28/2022, 08:15:47
		>>> Finished "-> Instanciate the index HNSW15" in 0.0036 secs
The index size will be approximately 17.2MB
The memory available for adding the vectors is 7.0GB(total available - used by the index)
Will be using at most 1GB of ram for adding
		-> Adding the vectors to the index 01/28/2022, 08:15:47
Using a batch size of 244140 (memory overhead 953.7MB)
100%|██████████| 1/1 [00:00<00:00, 74.53it/s]		>>> Finished "-> Adding the vectors to the index" in 0.1602 secs
	>>> Finished "Creating the index" in 0.1647 secs
	Computing best hyperparameters 01/28/2022, 08:15:47

	>>> Finished "Computing best hyperparameters" in 3.3091 secs
The best hyperparameters are: efSearch=21
	Compute fast metrics 01/28/2022, 08:15:50
2000
	>>> Finished "Compute fast metrics" in 7.6499 secs
	Saving the index on local disk 01/28/2022, 08:15:58
	>>> Finished "Saving the index on local disk" in 0.0091 secs
Recap:
{'99p_search_speed_ms': 30.39110283832997,
 'avg_search_speed_ms': 3.7983315605670214,
 'compression ratio': 0.9678652870286923,
 'index_key': 'HNSW15',
 'index_param': 'efSearch=21',
 'nb vectors': 4269,
 'reconstruction error %': 0.0,
 'size in bytes': 18066382,
 'vectors dimension': 1024}
>>> Finished "Launching the whole pipeline" in 11.1440 secs

	def get_optimal_index_keys_v2(
	nb_vectors: int,
	dim_vector: int,
	max_index_memory_usage: str,
	flat_threshold: int = 1000,
	quantization_threshold: int = 10000,
	force_pq: Optional[int] = None,
	make_direct_map: bool = False,
	should_be_memory_mappable: bool = False,
	ivf_flat_threshold: int = 1_000_000,
	use_gpu: bool = False,
	) -> List[str]:
	"""
	Gives a list of interesting indices to try, the one at the top is the most promising

	See: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index for
	detailed explanations.
	"""

	# Exception cases:

criteo / autofaiss Goto Github PK

autofaiss's People

Contributors

Stargazers

Watchers

Forkers

autofaiss's Issues

Recommend Projects

Recommend Topics

Recommend Org