Comments (6)
I have added the hyperparameters for 8 of the GLUE tasks in the bash script.
For epsilon, in the current setting, you can set it to 0 first, which will put no restriction on the maximum norm, and tune other hyperparameters. In this way, the maximum norm will be restricted by the ascent step size, number of ascent steps and the initialization.
In the context of security, epsilon restricts the strength of the adversary for better comparisons. However, in our case, you should first observe the norm of the embeddings to choose a strength/epsilon that is not ignorable but also won't outweigh the embeddings.
from freelb.
There are only 4 tasks' hyper parameters in this file, would you please release others?
Do you have any comments for the scale of norm(
Is it directly to the adversarial effect or general capability?
from freelb.
@PantherYan
In my opinion, the selection of epsilon is tricky and depend on your task's dataset, large epsilon may lead the generated adversarial example to change the golden label, while the small epsilon cannot threaten the model.
I was curious:
- Is there any guide rule to select a proper epsilon?
- How to make sure the adversarial perturbation won't change the sentence's semantic and the golden label?
from freelb.
@YasinQiu
Thanks for your reply.
#1. Leave this question to @zhuchen03
#2. In my option, the adversarial perturbation similar to the denoise autoencoder, which adding noise to robust or adds the general capability to language model.
I will read more literature to answer our confusion question.
Before yesterday. I training my implemented of freeLB in a plugin format without dropout mask.https://github.com/zhuchen03/FreeLB/issues/8#issuecomment-627669810. It works well with a setting of hypermeter. But after I added the mask_drooout implement and I changed to another set of hyper meter, the FreeLB AT goes to the wrong way. Accuracy falls with training.
It confused me a lot. I will figure out why and post it out.
from freelb.
@YasinQiu
The \epsilon is small.
There are a lot of papers to explain how to choose the minimum or different scales of \epsilon.
Here are for your reference.
- Deepfool: a simple and accurate method to fool deep neural networks
- FGSM. different scale of \epsilon attack.
The author was given a reference in the lunch files.
To the explicit value. Should be around 1e-1?
from freelb.
@zhuchen03 @PantherYan thx ~!!!
from freelb.
Related Issues (18)
- Could you add some comments in the code? HOT 1
- FreeLB didn't use the original training samples? HOT 2
- Reproducing results from the paper with roberta using fairseq HOT 5
- Having issues with training RoBERTa. Loss not decreasing HOT 2
- the "dp_mask" problem HOT 4
- A few questions about FreeLB and dropout HOT 2
- API Key HOT 1
- 词向量空间的不变性
- Errors generated during data preprocessing HOT 2
- Some confusion about the detach operation and embeds_init
- NaN encounted if FreeLB is used at the beginning of finetune stage
- FreeLB-RoBERTa within HuggingFace's transformers? HOT 1
- 'AlbertForSequenceClassification' object has no attribute 'encoder' HOT 2
- Is it still working with update_freq > 1? HOT 2
- Regarding the release of FreeLB ^_^
- ImportError: cannot import name 'glue_criterion_metrics' from 'transformers' HOT 6
- Does anyone meet the Nan error during the end epochs of training? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from freelb.