<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

Got this error with make index-wiki split=10k <di

Probably. One more thing to try is to reduce the batch size here: <a href="https://git

hmm same result.. <a target="_blank" rel="noopener noreferrer" href="https://priva

make Colbert Index-Wiki got stuck at "Sorting codes..." about wikichat HOT 12 CLOSED

StevenXHK commented on June 12, 2024

make Colbert Index-Wiki got stuck at "Sorting codes..."

from wikichat.

Comments (12)

s-jse commented on June 12, 2024 1

One thing you can try is to index a smaller subset of the data first. If there are any issues with the installation for example, it will show up here.
For this, you can copy the first 10K lines of the collection_all.tsv file, e.g. head -n 100000 collection_all.tsv > collection_10k.tsv then run make index-wiki split=10k.

from wikichat.

deter3 commented on June 12, 2024 1

I'm working with i7-12700F, 128GB RAM, Nvidia A2000 12GB.
I attempted to build index with the latest Wikipedia (as of Jan 15 2024) It ran for 50hrs to get to "Sorting Index" but then it got stuck.
The files with ids 01342 are still in \experiments\wikipedia_all\indexes\wikipedia.all.1bits

128RAM is not enough , I thought it should be enough , turned out I have to use 256RAM in EC2 .

Thanks for sharing. Did you build index with CPU? I thought build with GPU only use VRAM?

I did not build the wiki index but just using their index directly , 128GB ram is not enough for wikichat app up and running .

I did build up a index which is 220GB in size for 4bits, the RAM is 250GB . The actual text file is about 18Gb in pickle format . It took much longer time in saving the index one chunk by one chunk .

from wikichat.

s-jse commented on June 12, 2024

Hi,

This step should be much faster, I have not seen it take this long.
Can you please provide more details about the size of the corpus you are indexing, and the specifications of the machine you are using (operating system, GPU, RAM, disk etc.)?

from wikichat.

StevenXHK commented on June 12, 2024

I'm working with i7-12700F, 128GB RAM, Nvidia A2000 12GB.

I attempted to build index with the latest Wikipedia (as of Jan 15 2024)
It ran for ~50hrs to get to "Sorting Index" but then it got stuck.

The files with ids 0~1342 are still in \experiments\wikipedia_all\indexes\wikipedia.all.1bits

from wikichat.

s-jse commented on June 12, 2024

How much disk space do you have left before indexing? The size of the index itself is quite large.

from wikichat.

StevenXHK commented on June 12, 2024

The drive have 250+ GB left when sorting index.
I'll do another attempt with 2 x A2000 over the weekend and see if this can be resolved.

from wikichat.

StevenXHK commented on June 12, 2024

Got this error with make index-wiki split=10k

#> Starting...
nranks = 2       num_gpus = 2    device=0
nranks = 2       num_gpus = 2    device=1
{
    "query_token_id": "[unused0]",
    "doc_token_id": "[unused1]",
    "query_token": "[Q]",
    "doc_token": "[D]",
    "ncells": null,
    "centroid_score_threshold": null,
    "ndocs": null,
    "load_index_with_mmap": false,
    "index_path": null,
    "nbits": 1,
    "kmeans_niters": 4,
    "resume": false,
    "similarity": "cosine",
    "bsize": 256,
    "accumsteps": 1,
    "lr": 3e-6,
    "maxsteps": 500000,
    "save_every": null,
    "warmup": null,
    "warmup_bert": null,
    "relu": false,
    "nway": 2,
    "use_ib_negatives": false,
    "reranker": false,
    "distillation_alpha": 1.0,
    "ignore_scores": false,
    "model_name": null,
    "query_maxlen": 32,
    "attend_to_mask_tokens": false,
    "interaction": "colbert",
    "dim": 128,
    "doc_maxlen": 140,
    "mask_punctuation": true,
    "checkpoint": "colbert-ir\/colbertv2.0",
    "triples": null,
    "collection": ".\/workdir\/en\/wikipedia_16_01_2024\/collection_10k.tsv",
    "queries": null,
    "index_name": "wikipedia.10k.1bits",
    "overwrite": false,
    "root": "\/mnt\/d\/linux\/Colbert\/experiments",
    "experiment": "wikipedia_10k",
    "index_root": null,
    "name": "2024-01\/24\/12.31.01",
    "rank": 0,
    "nranks": 2,
    "amp": true,
    "gpus": 2
}
...
  File "/home/user/anaconda3/envs/wikichat/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 12145, in index_cpu_to_gpu_multiple
    return _swigfaiss_avx2.index_cpu_to_gpu_multiple(provider, devices, index, options)
RuntimeError: Error in virtual void faiss::gpu::StandardGpuResourcesImpl::initializeForDevice(int) at /home/circleci/miniconda/conda-bld/faiss-pkg_1681998300314/work/faiss/gpu/StandardGpuResources.cpp:280: Error: 'err == cudaSuccess' failed: failed to cudaHostAlloc 268435456 bytes for CPU <-> GPU async copy buffer (error 2 out of memory)

So is this a hardware limit then?

from wikichat.

s-jse commented on June 12, 2024

Probably. One more thing to try is to reduce the batch size here: https://github.com/stanford-oval/WikiChat/blob/main/ColBERT/colbert/indexer.py#L62.
The original ColBERT code sets this to 64.

from wikichat.

StevenXHK commented on June 12, 2024

hmm same result..

This is weird though, it's not asking for that many VRAM and I can see there are still ~15GB left
The moment before out of memory error:

from wikichat.

StevenXHK commented on June 12, 2024

Probably. One more thing to try is to reduce the batch size here: https://github.com/stanford-oval/WikiChat/blob/main/ColBERT/colbert/indexer.py#L62. The original ColBERT code sets this to 64.

Hi, would like to know your system specification for running index-wiki.
Apart from running with A100 GPU for 20hrs, what's other specs that you could share that is required / confirmed would work?

from wikichat.

deter3 commented on June 12, 2024

I'm working with i7-12700F, 128GB RAM, Nvidia A2000 12GB.

I attempted to build index with the latest Wikipedia (as of Jan 15 2024) It ran for ~50hrs to get to "Sorting Index" but then it got stuck.

The files with ids 0~1342 are still in \experiments\wikipedia_all\indexes\wikipedia.all.1bits

128RAM is not enough , I thought it should be enough , turned out I have to use 256RAM in EC2 .

from wikichat.

StevenXHK commented on June 12, 2024

I'm working with i7-12700F, 128GB RAM, Nvidia A2000 12GB.
I attempted to build index with the latest Wikipedia (as of Jan 15 2024) It ran for 50hrs to get to "Sorting Index" but then it got stuck.
The files with ids 01342 are still in \experiments\wikipedia_all\indexes\wikipedia.all.1bits

128RAM is not enough , I thought it should be enough , turned out I have to use 256RAM in EC2 .

Thanks for sharing. Did you build index with CPU? I thought build with GPU only use VRAM?

from wikichat.

make Colbert Index-Wiki got stuck at "Sorting codes..." about wikichat HOT 12 CLOSED

Comments (12)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent