Comments (37)
conda install -c conda-forge faiss-gpu
This fix it for me.
from faiss.
Possibility that you ran out of GPU memory?
from faiss.
What were you trying to run?
from faiss.
train data shape: (2000000, 1000)
base data shape: (20000000, 1000)
query data shape: (1000000, 1000)
data type: float32
my code:
index = faiss.index_factory(d, "OPQ16_512,IVF4096,PQ16")
co = faiss.GpuClonerOptions()
co.usePrecomputed = False
index = faiss.index_cpu_to_gpu(res, 0, index, co)
index.train(xt)
del xt
index.add(xb) # error happends here
My GPU memory is 8GB. I just tried the bench bench_gpu_sift1m.py, the same error.
from faiss.
index.add(xb) # error happends here
instead of giving all of the (20000000, 1000) at once, try giving it in chunks of (10000, 1000) or so.
This is a issue that will be fixed at some point, the GPU side is less friendly unless you handle chunking the input beforehand, but eventually we'll handle that automatically.
from faiss.
Only GpuIndexFlat* handles passing large amounts of data all at once for add or search at present.
from faiss.
I see. Actually I used numpy.memmap to load the data. Sorry, could you give me some guidance on how to chunk the input data that can be loaded with index.add?
from faiss.
Also, I notice that my GPU memory occupation in training is always about 20%. That's strange.
from faiss.
Just made some changes on the bench code bench_gpu_sift1m.py, still the same error. Populating top 10000 not work, either. Seems it is not memory issue. Maybe there is something wrong with the CUBLAS. By the way, do you have a plan to publish an official docker image to avoid some problems caused by installation?
#################################################################
# Approximate search experiment
#################################################################
print "============ Approximate search"
index = faiss.index_factory(d, "IVF4096,PQ64")
# faster, uses more memory
# index = faiss.index_factory(d, "IVF16384,Flat")
co = faiss.GpuClonerOptions()
# here we are using a 64-byte PQ, so we must set the lookup tables to
# 16 bit float (this is due to the limited temporary memory).
co.useFloat16 = True
index = faiss.index_cpu_to_gpu(res, 0, index, co)
print "train"
index.train(xt)
print "add vectors to index"
index.add(xb[:10000])
from faiss.
Hi
Note that the code above will not work for 1000-dim data (because 1000 is not a multiple of 64).
We do not have plans for a Docker image.
from faiss.
Hi, mdouze. Above code is from bench_gpu_sift1m.py. I used the data from http://corpus-texmex.irisa.fr/, following the instruction in https://github.com/facebookresearch/faiss/tree/master/benchs. I just wanted to check if the bench code works. Turn out to be the same error with my own.
from faiss.
Ok, so this is the exact script bench_gpu_sift1m.py applied to the SIFT1M dataset and not your 20M*1000-dim dataset, correct?
On which type of GPU are you running this?
from faiss.
Yes for your first question.
My GPU is GeForce GTX 1080
from faiss.
It could be the same bug as issue #8. Unfortunately we do not have the hardware to reproduce it, so we would be grateful if you could narrow down the error for us:
- Does it still crash in the add?
- If yes, could you add fewer vectors until it does not crash any more?
- could you set co.usePrecomputed = false and test again?
- could you reduce the 2 numbers in "IVF4096,PQ64" by powers of two until it does not crash any more?
from faiss.
You can also try running cuda-memcheck
on the bench_gpu_sift1m.py to see if anything gets printed out that does not look like the following:
========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaPointerGetAttributes.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/nvidia/libcuda.so.1 [0x2eea03]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x126239]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x16e44]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x1d066]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x1d1e2]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x1889f]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x194e5]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0xb504c]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x2332f]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x260d0]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0xf8cb]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b35]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0xf415]
=========
========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaGetLastError.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/nvidia/libcuda.so.1 [0x2eea03]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x11de53]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x16e65]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x1d066]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x1d1e2]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x1889f]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x194e5]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0xb504c]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x2332f]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0x260d0]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0xf8cb]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b35]
========= Host Frame:test/demo_ivfpq_indexing_gpu [0xf415]
Another thing is to try resetting the GPU via nvidia-smi and trying again.
Also, you could try and investigate which CUDA shared libraries it is trying to load, to see if there is a mismatch if you have multiple CUDA SDK versions installed.
from faiss.
Also, I notice that my GPU memory occupation in training is always about 20%. That's strange.
Faiss GPU reserves about 18% of available GPU memory up front for scratch space. This amount is controllable via StandardGpuResources
, but it will run slower if you decrease it by a lot (due to cudaMalloc/cudaFree overhead). 1-2 GB of scratch space seems to be appropriate for most workloads.
from faiss.
For your questions:
- could you add fewer vectors until it does not crash any more?
It will always crash no matter how small the number of vectors is. - could you set co.usePrecomputed = false and test again?
It works. But it doesn't work for my own code. I will give more tries. - could you reduce the 2 numbers in "IVF4096,PQ64" by powers of two until it does not crash any more?
It will fail if setting co.usePrecomputed = True
Some other infos:
ldd gpu/test/demo_ivfpq_indexing_gpu ==>
linux-vdso.so.1 => (0x00007ffcc0066000)
libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007fa709dfd000)
liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007fa709661000)
libcublas.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0 (0x00007fa706cb1000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa706aa9000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa70688b000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa706687000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa706383000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa70607d000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fa705e6e000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa705c58000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa705893000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa70b606000)
libblas.so.3 => /usr/lib/libblas.so.3 (0x00007fa70408a000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fa703d70000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fa703b34000)
from faiss.
- here is cuda-memcheck result (setting co.usePrecomputed = True):
============ Approximate search
train
add vectors to index
Faiss assertion err == CUBLAS_STATUS_SUCCESS failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with T = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at utils/MatrixMult.cu:141Aborted========= Error: process didn't terminate successfully
========= Internal error (7)
========= No CUDA-MEMCHECK results found
- resetting the GPU via nvidia-smi doesn't work
- There is only one CUDA SDK: V8.0.44
from faiss.
Are you compiling with clang or gcc?
from faiss.
gcc
from faiss.
I believe this is related to the GPU, which is similar to issue #8
from faiss.
I meet the same problem. My GPU is TITAN X. I want to index 1000000 512 dimension vectors using faiss.GpuIndexFlatL2. Then it will meet this issue. But if I cut the number 1000000 to 500000, it will be normal. It seems the max number of vectors is 500000. Because 60*0000 vectors will also cause this problem. The following is my code:
d = 1000000 # dimension
nb = 512 # database size
nq = 1000 # nb of queries
np.random.seed(1234) # make reproducible
xb = np.random.random((nb, d)).astype('float32')
xb[:, 0] += np.arange(nb) / 1000.
xq = np.random.random((nq, d)).astype('float32')
xq[:, 0] += np.arange(nq) / 1000.
xc = xb[0:1000, :].copy()
xc[:, 0] += 0.02
res = faiss.StandardGpuResources()
index = faiss.GpuIndexFlatL2(res, 0, d, False) # build the GPU index
# index = faiss.IndexFlatL2(d) # build the index
print index.is_trained
index.add(xb)
print index.ntotal
print (' build the index time %f ms' % ((time.time() - time_1) * 1000))
time_1 = time.time()
k = 1 # we want to see 4 nearest neighbors
D, I = index.search(xc[:1], k)
print (' search time %f ms' % ((time.time() - time_1) * 1000))
from faiss.
@yhpku , Thanks. I tried GTX 1080 and Titan X, both failed. Seems yours is caused by OOM. IndexFlatL2 will load all the data all at once for add or search. So, maybe 500000 is the upper limitation for Titan X. You can try IndexIVFPQ, which compresses the stored vectors with a lossy compression.
from faiss.
Hi @yhpku, in the code above you use 512 vectors in 1M dimensions. Is this what you want?
from faiss.
@mdouzeοΌthat's not. I means 1M vectors in 512 dimensions
from faiss.
@hellolovetiger, Titan X should work. Does bench_gpu_sift1m.py crash on Titan X? What error?
from faiss.
@yhpku, please fix your code then.
from faiss.
On Titan X,
For demo_ivfpq_indexing_gpu, the error is:
Adding the vectors to the index
Segmentation fault (core dumped)
For bench_gpu_sift1m.py,
============ Approximate search
train
WARNING clustering 100000 points to 4096 centroids: please provide at least 159744 training points
add vectors to index
Segmentation fault (core dumped)
The error will be gone if setting co.usePrecomputed = False
For my own code:
#train data shape: (2000000, 1000)
#base data shape: (20000000, 1000)
#query data shape: (1000000, 1000)
#data type: float32
index = faiss.index_factory(d, "OPQ16_512,IVF1024,PQ16")
co = faiss.GpuClonerOptions()
co.useFloat16 = False
co.usePrecomputed = False
co.indicesOptions = faiss.INDICES_CPU
index = faiss.index_cpu_to_gpu(res, 0, index, co)
index.train(xt)
del xt
index.add(xb) # error happends here
The error is:
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 5669326848 B, highwater 5669326848 B)
Faiss assertion err == CUBLAS_STATUS_SUCCESS failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with T = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at utils/MatrixMult.cu:141Aborted (core dumped)
When I cut the base data from 20M to 3M, the error becomes:
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 6144000000 B, highwater 6144000000 B)
Faiss assertion err == cudaSuccess failed in char* faiss::gpu::StackDeviceMemory::Stack::getAlloc(size_t, cudaStream_t) at utils/StackDeviceMemory.cpp:71Aborted (core dumped)
Seems it becomes a memory issue.
By the way, GpuIndexIVFPQ will still encounter memory issue if the base vectors is too big?
from faiss.
You are running out of GPU memory. Do not try and add so many vectors at once. 3M * 1000 * sizeof(float) is 12 GB.
Try adding the vectors in chunks of 10000 to 50000 instead.
from faiss.
After adding to the index, the vectors will then be compressed via PQ, and then you can add more. But, before compression, each vector takes 4000 bytes of memory ( = 1000 * sizeof(float)), not 16 bytes (PQ16).
from faiss.
Problems with attempting to add large CPU resident vectors all at once will be fixed internally at some point. But in the meantime you will have to incrementally add them.
from faiss.
Got it. Thanks, @wickedfoo . It is better to add these infos to wiki. π
from faiss.
@mdouze ,I am sorry , this is a typing error. The actual code is as follows. And the error output is, "Faiss assertion err == cudaSuccess failed in faiss::gpu::StackDeviceMemory::Stack::~Stack() at utils/StackDeviceMemory.cpp:54Aborted (core dumped)".
time_1 = time.time()
d = 512 # dimension
nb = 700000 # database size
nq = 1000 # nb of queries
np.random.seed(1234) # make reproducible
xb = np.random.random((nb, d)).astype('float32')
xb[:, 0] += np.arange(nb) / 1000.
xq = np.random.random((nq, d)).astype('float32')
xq[:, 0] += np.arange(nq) / 1000.
xc = xb[0:1000, :].copy()
xc[:, 0] += 0.02
res = faiss.StandardGpuResources()
index = faiss.GpuIndexFlatL2(res, 0, d, False) # build the GPU index
print index.is_trained
index.add(xb)
print index.ntotal
print (' build the index time %f ms' % ((time.time() - time_1) * 1000))
time_1 = time.time()
D, I = index.search(xc[:2], 1)
print (D)
print (I)
print (' search time %f ms' % ((time.time() - time_1) * 1000))
from faiss.
Closing this issue now, because the discussion derived. Please open a new one if it is blocking.
from faiss.
Recently, I started to use faiss and met the same problem. I found many issues and tried almost all the solutions mentioned above, but failed to find a solution.
At last, I found different CUDA versions shown by nvcc and nvdia-smi, so I adjust the nvcc verion to match the nvidia-smi, and luckily it works at last. So, Note that the nvcc version must be consistent with the nvdia-smi version.
my mismatch nvcc and nvdia-smi
If you met the same problem throgh compile faiss, this may help you.
choose the best CUDA Toolkit version is here.
the difference between nvcc and nvidia-smi is here.
from faiss.
Recently, I started to use faiss and met the same problem. I found many issues and tried almost all the solutions mentioned above, but failed to find a solution.
At last, I found different CUDA versions shown by nvcc and nvdia-smi, so I adjust the nvcc verion to match the nvidia-smi, and luckily it works at last. So, Note that the nvcc version must be consistent with the nvdia-smi version.
my mismatch nvcc and nvdia-smi
If you met the same problem throgh compile faiss, this may help you.
choose the best CUDA Toolkit version is here. the difference between nvcc and nvidia-smi is here.
You are lucky. Unfortunately, it does not work when I tried to use the faiss-gpu on cuda 11.1.
from faiss.
conda install -c conda-forge faiss-gpu
Hi,
I tried the same command
thank you it resolved my problem.
from faiss.
Related Issues (20)
- Cannot debug similarity search HOT 1
- Add a tutorial for IndexHNSW HOT 3
- Segfault error on faiss.IndexIVFFlat().train HOT 1
- knn_gpu should use raft when raft is compiled in HOT 2
- ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found HOT 1
- Remove lapack dependency? HOT 1
- Faiss imported after Torch leads to segfault HOT 2
- Suggestions on implementing multi-scale quantization HOT 3
- The similarity results obtained from the index.faiss file are significantly different from those obtained from previous versions HOT 1
- inquiry related to DistanceComputer HOT 2
- Failed to install via poetry HOT 1
- Update the raft handle through StandardGpuResourcesImpl::setDefaultStream
- [Feature Request] GPU indices Provide Interface to Access Resource HOT 2
- faiss index and retriever not able to save HOT 1
- problems with unit tests HOT 1
- Non exhaustive search candidates stats in HNSW
- Soft K-Means clustering HOT 1
- Documentation/Instructions on L1 metric HOT 1
- faiss index search return same results
- Index.search return's different results after saving and loading to file system HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from faiss.