Comments (2)
Hi, @dsl-light
Thank you for going through our code!
TF-IDF documents first, then the hyperlink negative ones?
This is totally correct, based on the logic of our code.
My question is: hyperlink negative docs are considered by appending docs of all_linked_paras_dic, but keys of all_linked_paras_dic are all TF-IDF retrieved titles, so the most important part, hyperlink negative doc of gold path, may not be included for training?
For this, let us explain the logic in detail.
-
Appending gold paragraph titles (only during the training phase)
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L495
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L502 -
Appending TF-IDF-based negative examples
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L502
We can control how many TF-IDF-based negative examples we use for the model training, and also please refer to https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L502
for the use of the--tfidf_limit
option. -
Appending hyperlink-based negative examples
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L526
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L540
Here we add hyperlink-based negative examples, and we can see that the hyperlinked titles are used.
l
is a hyperlinked paragraph's title from a paragraphp_
(example.all_linked_paras_dic[p_]
).
Let us know if you have further quesitons.
Thank you!
from learning_to_retrieve_reasoning_paths.
@hassyGo Very clear explanation!Thank you!
from learning_to_retrieve_reasoning_paths.
Related Issues (20)
- Some details regarding generating NQ trainset for the reader model HOT 6
- demo.py arg error about NQ HOT 4
- Inconsistent 'answers' types in the nq_reader_train data HOT 1
- `database is locked` while evaluation HOT 1
- How to evaluate the pretrained graph retriever model? HOT 5
- The error when training the graph_retriever in the HotpotQA HOT 5
- Training data construction for reader verifier HOT 3
- json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) HOT 1
- Fine-tuning on own documents? HOT 2
- What the TF-IDF retriever data output mean HOT 3
- A problem about total tranining steps of reader HOT 2
- How to evaluate the supporting facts in the HotPotQA experiment? HOT 5
- How many of the first TF-IDF processing needs to be retained? HOT 5
- The hyperparameters for training the bert-base reader ? HOT 1
- How to train and evaluate the models in HotpotQA distractor setting? HOT 2
- What do output_masks do? HOT 2
- Why are some document titles missing? HOT 2
- sqlite3.OperationalError: unable to open database file HOT 1
- Why are some document titles missing?
- What is the problem?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from learning_to_retrieve_reasoning_paths.