Thank you for the amazing repo. I am curious why are some titles mis

Why are some document titles missing? about learning_to_retrieve_reasoning_paths HOT 2 OPEN

akariasai commented on June 14, 2024

Why are some document titles missing?

from learning_to_retrieve_reasoning_paths.

Comments (2)

AkariAsai commented on June 14, 2024

Hi, sorry for my late response! Could you share the command you are running and in which dataset you have that issue?
I think I have seen the same issue when the Wikipedia title (id) cannot be matched with any of the ids in the database. In particular,

the code cannot handle well some Unicode characters
the Wikipedia entity titles have been changed or directed to the new one

from learning_to_retrieve_reasoning_paths.

mukhal commented on June 14, 2024

Thanks for the response. This happens with HotpotQA when I run the following command or similar commands.

python run_graph_retriever.py \
        --task hotpot_open \
        --bert_model bert-base-uncased --do_lower_case \
        --dev_file_path path/to/hotpotqa/dev \
        --output_dir path/to/output \
        --model_suffix 3\
        --max_para_num 10 \
        --tfidf_limit 50 \
        --beam 4\
        --eval_chunk 200 \
        --eval_batch_size 64 \
        --split_chunk 1000\
        --pruning_by_links \
        --example_limit 128

I think the main issue is that some titles are retrieved by the tfidf retriever, but when trying to retrieve their content using tfidf_retriever.load_abstract_para_text(), it outputs this warning for some documents. Not sure if I should worry about it, though since I was able to reproduce your results with the warning happening many times.

from learning_to_retrieve_reasoning_paths.

Recommend Projects

Why are some document titles missing? about learning_to_retrieve_reasoning_paths HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent