I run the suggested example in the readme to try reproduce results showed in the <a hr

How to reproduce the paper metrics? about ditto HOT 2 CLOSED

megagonlabs commented on July 20, 2024

How to reproduce the paper metrics?

from ditto.

Comments (2)

oi02lyl commented on July 20, 2024

For this dataset, we use the following setting (the summarization option is off). We used roberta, the drop_col da operator, the general dk, and a batch size of 32 for this dataset. The command should look like this:

CUDA_VISIBLE_DEVICES=0 python train_ditto.py \
  --task Structured/Beer \
  --batch_size 32 \
  --max_len 256 \
  --lr 3e-5 \
  --n_epochs 40 \
  --finetuning \
  --lm roberta \
  --fp16 \
  --da drop_col \
  --dk general`

I just re-ran the experiment and here are the results:

Baseline (no dk or da)

=========eval at epoch=40=========
Validation:
=============Structured/Beer==================
accuracy=0.967
precision=0.867
recall=0.929
f1=0.897
======================================
Test:
=============Structured/Beer==================
accuracy=0.967
precision=0.824
recall=1.000
f1=0.903
======================================

With DK (general) only
Somehow the test result is the same as the baseline.

=========eval at epoch=40=========
Validation:
=============Structured/Beer==================
accuracy=0.967
precision=0.824
recall=1.000
f1=0.903
======================================
Test:
=============Structured/Beer==================
accuracy=0.967
precision=0.824
recall=1.000
f1=0.903
======================================

Ditto (with both dk and da)

=========eval at epoch=40=========
Validation:
=============Structured/Beer==================
accuracy=0.956
precision=0.857
recall=0.857
f1=0.857
======================================
Test:
=============Structured/Beer==================
accuracy=0.978
precision=0.875
recall=1.000
f1=0.933
======================================

In the paper's experiments, we reported the results by taking the epoch with the highest validation F1 score (here I am taking the last epoch).

I did notice something strange as the baseline result seems significantly higher than before. I remember that we updated the MixDA code from snippext which might have some effect on the data augmentation results too.

from ditto.

CristhianBoujon commented on July 20, 2024

Thank you!

from ditto.

Recommend Projects

How to reproduce the paper metrics? about ditto HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent