Comments (10)
Oh! sorry, I didn't sync your commits in my server. It gets normal now. Good job! Thanks.
from genre.
@hitercs No I cannot reproduce this. Here is what I have running
python scripts_genre/evaluate_kilt_dataset.py \
models/fairseq_entity_disambiguation_aidayago \
datasets/msnbc-test-kilt.jsonl \
datasets/msnbc-test-kilt-out.jsonl \
--candidates \
--batch_size 16 \
-d -v
INFO:root:Loading model
INFO:fairseq.file_utils:loading archive file models/fairseq_entity_disambiguation_aidayago
INFO:fairseq.tasks.translation:[source] dictionary: 50264 types
INFO:fairseq.tasks.translation:[target] dictionary: 50264 types
INFO:root:Loading datasets/msnbc-test-kilt.jsonl
Evaluating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 656/656 [01:03<00:00, 10.35it/s, f1=0.941, prec=0.945, rec=0.938]
INFO:root:Saving dataset in datasets/msnbc-test-kilt-out.jsonl
+-----------------+-------+-----------+--------+-------------+----------+
| Dataset | F1 | Precision | Recall | R-precision | Recall@5 |
+-----------------+-------+-----------+--------+-------------+----------+
| msnbc-test-kilt | 94.26 | 94.62 | 93.90 | 93.90 | 97.41 |
+-----------------+-------+-----------+--------+-------------+----------+
from genre.
@hitercs yes, I'll update it in an hour since I'm fixing other stuff as well
from genre.
I managed to reproduce your bug. Working on it.
from genre.
@hitercs Now batches work again! Regarding a mismatch in scores: I am trying other datasets (eg in MSNBC I got the same score as in the paper but not for ACE2004 as you reported). One possibility that is very likely is the following:
- I used a version of
fairseq
from 6 months ago. There were some breaking changes and it might that the code of the BART model or the beam search changed slightly leading to different results. - For the results in the paper I used an internal (private) Facebook AI version of
fairseq
so the code might also be slightly different there.
I hope it helps and thanks for reporting the but! I really appreciated it 😊
from genre.
@nicola-decao Thanks.
However, when I run with
--candidates --batch_size 16
, the performance is still very low.
It is normal without --candidates
from genre.
@hitercs are you using the version of fairseq
version I indicated in the example?
from genre.
sorry, I meant when specifying the argument --candidates
, the bugs seems still there. Can you get normal performance using --candidates --batch_size 16
? Yes, I use the fairseq version in your provided repo.
from genre.
By the way, one more minor bug report:
GENRE/scripts_genre/evaluate_kilt_dataset.py
Line 257 in 130703b
Should it be?
trie = Trie.load_from_dict(pickle.load(f))
from genre.
@hitercs No I cannot reproduce this. Here is what I have running
python scripts_genre/evaluate_kilt_dataset.py \ models/fairseq_entity_disambiguation_aidayago \ datasets/msnbc-test-kilt.jsonl \ datasets/msnbc-test-kilt-out.jsonl \ --candidates \ --batch_size 16 \ -d -v
INFO:root:Loading model INFO:fairseq.file_utils:loading archive file models/fairseq_entity_disambiguation_aidayago INFO:fairseq.tasks.translation:[source] dictionary: 50264 types INFO:fairseq.tasks.translation:[target] dictionary: 50264 types INFO:root:Loading datasets/msnbc-test-kilt.jsonl Evaluating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 656/656 [01:03<00:00, 10.35it/s, f1=0.941, prec=0.945, rec=0.938] INFO:root:Saving dataset in datasets/msnbc-test-kilt-out.jsonl +-----------------+-------+-----------+--------+-------------+----------+ | Dataset | F1 | Precision | Recall | R-precision | Recall@5 | +-----------------+-------+-----------+--------+-------------+----------+ | msnbc-test-kilt | 94.26 | 94.62 | 93.90 | 93.90 | 97.41 | +-----------------+-------+-----------+--------+-------------+----------+
Why would Precision and Recall differ here ? If I remember correctly, the ED task only has a single gold target and you always take the top 1 prediction to evaluate. In that case, wouldn't we always have recall = precision = f1 = accuracy ?
Or is there any detail that I'm getting wrong ...
from genre.
Related Issues (20)
- is prefix_allowed_tokens_fn only working for seq2seq model.generate? HOT 2
- Loading mgenre models is taking 44GB RAM
- Problem in candidate-based generation on GENRE using transformers >= 4.36.0
- the same entity name question
- Inference speed is too slow. Is this problem because of Constrained beam search?
- can not receive different outputs from mGENRE.sample using dropout in train mode and different seeds HOT 2
- can't find ID to title map json file HOT 1
- alignment between candidate and KILT wikipedia data source HOT 4
- Question: Running genre on multiple GPUs HOT 1
- format of entries for entity linking training HOT 2
- Invalid prediction - no wikipedia entity HOT 10
- Fail to Reproduce the dev score of GENRE Document Retrieval HOT 7
- mGENRE finetuning issue
- Why do you prepend `eos_token_id' to sent_orig HOT 2
- colab script to run GENRE
- NameError: name 'batched_hypos' is not defined (mGENRE) HOT 5
- [Question] Evaluating mGENRE on Mewsli-9
- Fine-tune with hugging face trainer
- import package error
- Chinese entity linking
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genre.