Oops, my bad. Removed that flag and got this: <a href="

faiss::gpu::ToGpuClonerMultiple failed about faiss HOT 9 CLOSED

facebookresearch commented on June 7, 2024

faiss::gpu::ToGpuClonerMultiple failed

from faiss.

Comments (9)

mdouze commented on June 7, 2024

Why -ngpu 1 ?

from faiss.

sharthZ23 commented on June 7, 2024

Because i want run faiss on first GPU.

from faiss.

mdouze commented on June 7, 2024

Why -R 2 then?

from faiss.

sharthZ23 commented on June 7, 2024

Oops, my bad. Removed that flag and got this:

[email protected]:~/projects/faiss$ python benchs/bench_gpu_1bn.py Deep1B OPQ20_80,IVF262144,PQ20 -nnn 10 -ngpu 4 -altadd -noptables
Preparing dataset Deep1B
sizes: B (1000000000, 96) Q (10000, 96) T (10000000, 96) gt (10000, 1)
cachefiles:
/data/bench_gpu_1bn/preproc_Deep1B_OPQ20_80.vectrans
/data/bench_gpu_1bn/cent_Deep1B_OPQ20_80,IVF262144.npy
/data/bench_gpu_1bn/Deep1B_OPQ20_80,IVF262144,PQ20.index
preparing resources for 4 GPUs
load /data/bench_gpu_1bn/preproc_Deep1B_OPQ20_80.vectrans
load /data/bench_gpu_1bn/Deep1B_OPQ20_80,IVF262144,PQ20.index
CPU index contains 1000000000 vectors, move to GPU
copying loaded index to GPUs
IndexShards shard 0 indices 0:250000000
IndexIVFPQ size 250000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
IndexShards shard 1 indices 250000000:500000000
IndexIVFPQ size 250000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
IndexShards shard 2 indices 500000000:750000000
IndexIVFPQ size 250000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
IndexShards shard 3 indices 750000000:1000000000
IndexIVFPQ size 250000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
move to GPU done in 203.708 s
search...
0/10000 (0.466 s) Faiss assertion listOffset < listIndices.size() failed in void faiss::gpu::ivfOffsetToUserIndex(long int*, int, int, int, const std::vector<std::vector >&) at impl/RemapIndices.cpp:40Aborted (core dumped)

With -ngpu 3 script works normal:

[email protected]:~/projects/faiss$ python benchs/bench_gpu_1bn.py Deep1B OPQ20_80,IVF262144,PQ20 -nnn 10 -ngpu 3 -altadd -noptables
Preparing dataset Deep1B
sizes: B (1000000000, 96) Q (10000, 96) T (10000000, 96) gt (10000, 1)
cachefiles:
/data/bench_gpu_1bn/preproc_Deep1B_OPQ20_80.vectrans
/data/bench_gpu_1bn/cent_Deep1B_OPQ20_80,IVF262144.npy
/data/bench_gpu_1bn/Deep1B_OPQ20_80,IVF262144,PQ20.index
preparing resources for 3 GPUs
load /data/bench_gpu_1bn/preproc_Deep1B_OPQ20_80.vectrans
load /data/bench_gpu_1bn/Deep1B_OPQ20_80,IVF262144,PQ20.index
CPU index contains 1000000000 vectors, move to GPU
copying loaded index to GPUs
IndexShards shard 0 indices 0:333333333
IndexIVFPQ size 333333333 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
IndexShards shard 1 indices 333333333:666666666
IndexIVFPQ size 333333333 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
IndexShards shard 2 indices 666666666:1000000000
IndexIVFPQ size 333333334 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
move to GPU done in 130.713 s
search...
0/10000 (0.561 s) probe=1 : 1.075 s 1-R@1: 0.2334 1-R@10: 0.3384
0/10000 (0.008 s) probe=2 : 0.513 s 1-R@1: 0.3054 1-R@10: 0.4666
0/10000 (0.015 s) probe=4 : 0.523 s 1-R@1: 0.3704 1-R@10: 0.5907
0/10000 (0.015 s) probe=8 : 0.558 s 1-R@1: 0.4193 1-R@10: 0.6998
0/10000 (0.015 s) probe=16 : 0.639 s 1-R@1: 0.4506 1-R@10: 0.7785
0/10000 (0.012 s) probe=32 : 0.780 s 1-R@1: 0.4708 1-R@10: 0.8337
0/10000 (0.018 s) probe=64 : 1.076 s 1-R@1: 0.4810 1-R@10: 0.8693
0/10000 (0.016 s) probe=128: 1.608 s 1-R@1: 0.4858 1-R@10: 0.8863
0/10000 (0.020 s) probe=256: 2.718 s 1-R@1: 0.4895 1-R@10: 0.8962

And one more question, do I understand correctly that the flag tempmem limits the maximum possible memory on a single GPU?

from faiss.

wickedfoo commented on June 7, 2024

tempmem is used to control the temporary memory scratch space at use on the GPU. It should ideally be at least 1 GB at all times.

https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU

The database in question here uses 20 bytes * 1 billion = 20 GB of memory and thus cannot fit on a single GPU, or likely not 2 GPUs either.

As to why it is working on 3 but not 4, this could be a problem with the sharding copy? @mdouze

from faiss.

mdouze commented on June 7, 2024

I can't repro the issue, tested on 4K40 and 4TitanX. Please provide more context (ldd output, nvidia-smi output, gdb stacktrace).

from faiss.

sharthZ23 commented on June 7, 2024

I will try repro issue this weekend with 4K80.

from faiss.

mdouze commented on June 7, 2024

Could you try with the current version? It has better low-mem GPU support.

from faiss.

sharthZ23 commented on June 7, 2024

Cant repro bug, thanks you for update.

from faiss.

faiss::gpu::ToGpuClonerMultiple failed about faiss HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent