Git Product home page Git Product logo

Comments (10)

hitercs avatar hitercs commented on June 4, 2024 3

Oh! sorry, I didn't sync your commits in my server. It gets normal now. Good job! Thanks.

from genre.

nicola-decao avatar nicola-decao commented on June 4, 2024 1

@hitercs No I cannot reproduce this. Here is what I have running

python scripts_genre/evaluate_kilt_dataset.py \
   models/fairseq_entity_disambiguation_aidayago \
   datasets/msnbc-test-kilt.jsonl \
   datasets/msnbc-test-kilt-out.jsonl \
   --candidates \
   --batch_size 16 \
   -d -v
INFO:root:Loading model
INFO:fairseq.file_utils:loading archive file models/fairseq_entity_disambiguation_aidayago
INFO:fairseq.tasks.translation:[source] dictionary: 50264 types
INFO:fairseq.tasks.translation:[target] dictionary: 50264 types
INFO:root:Loading datasets/msnbc-test-kilt.jsonl
Evaluating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 656/656 [01:03<00:00, 10.35it/s, f1=0.941, prec=0.945, rec=0.938]
INFO:root:Saving dataset in datasets/msnbc-test-kilt-out.jsonl
+-----------------+-------+-----------+--------+-------------+----------+
|     Dataset     |   F1  | Precision | Recall | R-precision | Recall@5 |
+-----------------+-------+-----------+--------+-------------+----------+
| msnbc-test-kilt | 94.26 |   94.62   | 93.90  |    93.90    |  97.41   |
+-----------------+-------+-----------+--------+-------------+----------+

from genre.

nicola-decao avatar nicola-decao commented on June 4, 2024 1

@hitercs yes, I'll update it in an hour since I'm fixing other stuff as well

from genre.

nicola-decao avatar nicola-decao commented on June 4, 2024

I managed to reproduce your bug. Working on it.

from genre.

nicola-decao avatar nicola-decao commented on June 4, 2024

@hitercs Now batches work again! Regarding a mismatch in scores: I am trying other datasets (eg in MSNBC I got the same score as in the paper but not for ACE2004 as you reported). One possibility that is very likely is the following:

  1. I used a version of fairseq from 6 months ago. There were some breaking changes and it might that the code of the BART model or the beam search changed slightly leading to different results.
  2. For the results in the paper I used an internal (private) Facebook AI version of fairseq so the code might also be slightly different there.

I hope it helps and thanks for reporting the but! I really appreciated it 😊

from genre.

hitercs avatar hitercs commented on June 4, 2024

@nicola-decao Thanks.

However, when I run with
--candidates --batch_size 16, the performance is still very low.
It is normal without --candidates

from genre.

nicola-decao avatar nicola-decao commented on June 4, 2024

@hitercs are you using the version of fairseq version I indicated in the example?

from genre.

hitercs avatar hitercs commented on June 4, 2024

sorry, I meant when specifying the argument --candidates, the bugs seems still there. Can you get normal performance using --candidates --batch_size 16? Yes, I use the fairseq version in your provided repo.

from genre.

hitercs avatar hitercs commented on June 4, 2024

By the way, one more minor bug report:

trie = pickle.load(f)

Should it be?
trie = Trie.load_from_dict(pickle.load(f))

from genre.

Saibo-creator avatar Saibo-creator commented on June 4, 2024

@hitercs No I cannot reproduce this. Here is what I have running

python scripts_genre/evaluate_kilt_dataset.py \
   models/fairseq_entity_disambiguation_aidayago \
   datasets/msnbc-test-kilt.jsonl \
   datasets/msnbc-test-kilt-out.jsonl \
   --candidates \
   --batch_size 16 \
   -d -v
INFO:root:Loading model
INFO:fairseq.file_utils:loading archive file models/fairseq_entity_disambiguation_aidayago
INFO:fairseq.tasks.translation:[source] dictionary: 50264 types
INFO:fairseq.tasks.translation:[target] dictionary: 50264 types
INFO:root:Loading datasets/msnbc-test-kilt.jsonl
Evaluating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 656/656 [01:03<00:00, 10.35it/s, f1=0.941, prec=0.945, rec=0.938]
INFO:root:Saving dataset in datasets/msnbc-test-kilt-out.jsonl
+-----------------+-------+-----------+--------+-------------+----------+
|     Dataset     |   F1  | Precision | Recall | R-precision | Recall@5 |
+-----------------+-------+-----------+--------+-------------+----------+
| msnbc-test-kilt | 94.26 |   94.62   | 93.90  |    93.90    |  97.41   |
+-----------------+-------+-----------+--------+-------------+----------+

Why would Precision and Recall differ here ? If I remember correctly, the ED task only has a single gold target and you always take the top 1 prediction to evaluate. In that case, wouldn't we always have recall = precision = f1 = accuracy ?
Or is there any detail that I'm getting wrong ...

from genre.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.