Well, I first use python download_evalsets.py $download

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Tried evaluate the model on a local network only machine about datacomp HOT 4 OPEN

mlfoundations commented on August 28, 2024

Tried evaluate the model on a local network only machine

from datacomp.

Comments (4)

zwsjink commented on August 28, 2024 1

Sorry to get back to you late, but I was able to bypass this issue by modifying the datacomp source code as follows:

diff --git a/eval_utils/retr_eval.py b/eval_utils/retr_eval.py
index 3c19917..647edf7 100644
--- a/eval_utils/retr_eval.py
+++ b/eval_utils/retr_eval.py
@@ -37,7 +37,7 @@ def evaluate_retrieval_dataset(
 
     dataset = RetrievalDataset(
         datasets.load_dataset(
-            f"nlphuji/{task.replace('retrieval/', '')}",
+            f"/mnt/data/datacomp2023/evaluate_datasets/{task.replace('retrieval/', '')}.py",
             split="test",
             cache_dir=os.path.join(data_root, "hf_cache")
             if data_root is not None

which force the hf to use my local dataset repository instead of checking any online updates.

from datacomp.

gabrielilharco commented on August 28, 2024

@djghosh13

from datacomp.

djghosh13 commented on August 28, 2024

Hi, thanks for bringing this up! I assumed that the HF datasets would work properly without Internet connection because the download_evalsets.py script loads them once to put them in the cache already. I'll look into potential solutions to this issue

from datacomp.

djghosh13 commented on August 28, 2024

Can you try setting the environment variable HF_DATASETS_OFFLINE to 1? (from https://huggingface.co/docs/datasets/v2.14.5/en/loading#offline)
It seems like even if the dataset is cached, HF will by default check the online version. So hopefully this should fix things.

If that doesn't work, could you check to make sure the files are indeed in the hf_cache folder?

from datacomp.

Recommend Projects