benzakenelad / bitfit Goto Github PK

View Code? Open in Web Editor NEW

132.0 132.0 23.0 107 KB

Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

License: MIT License

Python 100.00%

bitfit's People

Contributors

Stargazers

Watchers

bitfit's Issues

Why not use the original finetuned results of base RoBERTa

Hi,

In Table 2 of your paper, it seems you reproduce the RoBERTa_base results. May I ask why? Since the RoBERTa paper already released the results. Your reproduced results are worse than the original ones except for task MRPC.

In addition, it seems your hyperparameter setting of reproduction is not the same as in https://github.com/facebookresearch/fairseq/tree/main/examples/roberta/config/finetuning.

question about Table 1 in paper

Hi,
It's a interesting job. I have a question about the results in Table 1. Are the reported results of two baselines (i.e., Adapter and Diff-Pruning) reproduced by yourself or from the original papers? I checked the original papers and found that neither paper provided the results on dev-set, and the results of test-set don't match with the original papers.

Thanks in advance!

Reproduce results on MNLI dataset

Hi,

I had trouble reproducing the results you report in the paper for MNLI. I am using the default example you have in the README and the learning rate you mention in the paper for BERT-Base.

python run_glue.py \
        --output-path $1 \
        --task-name mnli\
        --model-name bert-base-cased\
        --fine-tune-type bitfit\
        --learning-rate 1e-4\
        --gpu-device 0

Anything I need to change/doing wrong?

Thanks

Generalization to decoder-only models

Hi,

I was wondering if you could give your opinion on how well would BitFit generalize to decoder-only models? In case you already have tried out some experiments, it would be great to have some insights on them.

Regards,

benzakenelad / bitfit Goto Github PK

bitfit's People

Contributors

Stargazers

Watchers

Forkers

bitfit's Issues

Why not use the original finetuned results of base RoBERTa

question about Table 1 in paper

Reproduce results on MNLI dataset

Generalization to decoder-only models

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent