layneh / self-adaptive-training Goto Github PK

View Code? Open in Web Editor NEW

125.0 4.0 23.0 94 KB

[TPAMI2022 & NeurIPS2020] Official implementation of Self-Adaptive Training

License: MIT License

Python 75.76% Shell 24.24%

machine-learning computer-vision generalization overfitting label-noise adversarial-robustness

self-adaptive-training's Introduction

Self-Adaptive Training

This is the PyTorch implementation of the

NeurIPS'2020 paper Self-Adaptive Training: beyond Empirical Risk Minimization，
Journal version Self-Adaptive Training: Bridging the Supervised and Self-Supervised Learning.

Self-adaptive training significantly improves the generalization of deep networks under noise and enhances the self-supervised representation learning. It also advances the state-of-the-art on learning with noisy label, adversarial training and the linear evaluation on the learned representation.

News

2021.10: The code of Selective Classification for SAT has been released here.
2021.01: We have released the journal version of Self-Adaptive Training, which is a unified algorithm for both the supervised and self-supervised learning. Code for self-supervised learning will be available soon.
2020.09: Our work has been accepted at NeurIPS'2020.

Requirements

Python >= 3.6
PyTorch >= 1.0
CUDA
Numpy

Usage

Standard training

The main.py contains training and evaluation functions in standard training setting.

Runnable scripts

Training and evaluation using the default parameters

We provide our training scripts in directory scripts/. For a concrete example, we can use the command as below to train the default model (i.e., ResNet-34) on CIFAR10 dataset with uniform label noise injected (e.g., 40%):
```
$ bash scripts/cifar10/run_sat.sh [TRIAL_NAME]
```
The argument TRIAL_NAME is optional, it helps us to identify different trials of the same experiments without modifying the training script. The evaluation is automatically performed when training is finished.
Additional arguments
- noise-rate: the percentage of data that being corrupted
- noise-type: type of random corruptions (i.e., corrupted_label, Gaussian,random_pixel, shuffled_pixel)
- sat-es: initial epochs of our approach
- sat-alpha: the momentum term $\alpha$ of our approach
- arch: the architecture of backbone model, e.g., resnet34/wrn34

Results on CIFAR datasets under uniform label noise

Test Accuracy(%) on CIFAR10

Noise Rate	0.2	0.4	0.6	0.8
ResNet-34	94.14	92.64	89.23	78.58
WRN-28-10	94.84	93.23	89.42	80.13

Test Accuracy(%) on CIFA100

Noise Rate	0.2	0.4	0.6	0.8
ResNet-34	75.77	71.38	62.69	38.72
WRN-28-10	77.71	72.60	64.87	44.17

Runnable scripts for repreducing double-descent phenomenon

You can use the command as below to train the default model (i.e., ResNet-18) on CIFAR10 dataset with 16.67% uniform label noise injected (i.e., 15% label error rate):

$ bash scripts/cifar10/run_sat_dd_parallel.sh [TRIAL_NAME]
$ bash scripts/cifar10/run_ce_dd_parallel.sh [TRIAL_NAME]

Double-descent ERM vs. single-descent self-adaptive training

Double-descent ERM vs. single-descent self-adaptive training on the error-capacity curve. The vertical dashed line represents the interpolation threshold.

Double-descent ERM vs. single-descent self-adaptive training on the epoch-capacity curve. The dashed vertical line represents the initial epoch E_s of our approach.

Adversarial training

We use state-of-the-art adversarial training algorithm TRADES as our baseline. The main_adv.py contains training and evaluation functions in adversarial training setting on CIFAR10 dataset.

Training scripts

Training and evaluation using the default parameters

We provides our training scripts in directory scripts/cifar10. For a concrete example, we can use the command as below to train the default model (i.e., WRN34-10) on CIFAR10 dataset with PGD-10 attack ($\epsilon$=0.031) to generate adversarial examples:
```
$ bash scripts/cifar10/run_trades_sat.sh [TRIAL_NAME]
```
Additional arguments
- beta: hyper-parameter $1/\lambda$ in TRADES that controls the trade-off between natural accuracy and adversarial robustness
- sat-es: initial epochs of our approach
- sat-alpha: the momentum term $\alpha$ of our approach

Robust evaluation script

Evaluate robust WRN-34-10 models on CIFAR10 under PGD-20 attack:

  $ python pgd_attack.py --model-dir "/path/to/checkpoints"

This command evaluates 71-st to 100-th checkpoints in the specified path.

Results

Self-Adaptive Training mitigates the overfitting issue and consistently improves TRADES.

Attack TRADES+SAT

We provide the checkpoint of our best performed model in Google Drive and compare its natural and robust accuracy with TRADES as below.

Attack (submitted by) \ Method	TRADES	TRADES + SAT
None (initial entry)	84.92	83.48
PGD-20 (initial entry)	56.68	58.03
MultiTargeted-2000 (initial entry)	53.24	53.46
Auto-Attack+ (Francesco Croce)	53.08	53.29

Reference

For technical details, please check the conference version or the journal version of our paper.

@inproceedings{huang2020self,
  title={Self-Adaptive Training: beyond Empirical Risk Minimization},
  author={Huang, Lang and Zhang, Chao and Zhang, Hongyang},
  booktitle={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

@article{huang2021self,
  title={Self-Adaptive Training: Bridging the Supervised and Self-Supervised Learning},
  author={Huang, Lang and Zhang, Chao and Zhang, Hongyang},
  journal={arXiv preprint arXiv:2101.08732},
  year={2021}
}

Contact

If you have any question about this code, feel free to open an issue or contact [email protected].

self-adaptive-training's People

Contributors

Stargazers

Watchers

self-adaptive-training's Issues

Checkpoints for TRADES

Hi,

Thanks for making the code pubic. I am particularly intrigued by the improvement in robust accuracy (in combination with Trades). This seems a very strong result (could be improved further using https://arxiv.org/abs/1905.13736) and I am really interested in testing it further. Would it be possible for you to make the adversarial trained network's checkpoints public?

Is it convenient for you to provide the code of MultiTargeted-2000 attack.

Is it convenient for you to provide the code of MultiTargeted-2000 attack? I hope to test your model under this attack.

Test the cifar100 dataset with low natural accuracy and robust accuracy

Hi,

I use the training procedure to run the self-adaptive-training, however, during testing, the accuracy and robust_accuracy go down to about 0.2.

What I have done to pgd_attack.py is only change the load dataset of cifar-10 to cifar-100, the model is setting with 100-classes output.

I use the default run_sat.sh setting to train the adversarial defense model.

What is the root cause for the low accuracy, how to resolve it?

Thanks & Regards!
Momo

关于exponential moving average

请问论文里通过消融实验证明exponential moving average很重要，但是在代码里是没是没有实现EMA？

Selective classification experiment missing

Hi, it seems that the selective classification experiment part is missing. (Both training and evaluation).

Can you upload this code please?
Thanks

About the computation power of "Self-Adaptive Training: beyond Empirical Risk Minimization"

Hello,

Thanks for your sharing and your outstanding contributions for adversarial learning.

I wonder how much computation power is needed to run the default settings, e.g., batch_size=128.

I tried to reproduce your experiment on a server with 2080Ti and memories of 11G, then the server restarted.

Should I make the batch size smaller to reproduce it?

Could you list the corresponding GPU architectures which is able to run it? How many GPU memories are needed.

Thanks & Regards!
Momo

How do you let the train_loader could output the index of data?

Like here https://github.com/LayneH/self-adaptive-training/blob/master/main_adv.py#L189
Normal usage of torch CIFAR10 data loader can only output (inputs, targets) pairs. How do you let it also output the index of data?

About the evaluation problem after adversarial-training

Thanks a lot for sharing.

I have tried the training program in our workstation with 2080-GPU and batchsize=64.

However, during the evaluation, some problem has come:

Files already downloaded and verified
evaluating /home/cdhk409/xiaoyang/Python/adv_training/self-adaptive-training/trained_models/checkpoint_76.tar...
Traceback (most recent call last):
File "pgd_attack.py", line 106, in
main()
File "pgd_attack.py", line 93, in main
model.load_state_dict(torch.load(model_path)['state_dict'])
File "/home/cdhk409/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.conv1.weight", "module.block1.layer.0.bn1.weight", "module.block1.layer.0.bn1.bias", "module.block1.layer.0.bn1.running_mean", "module.block1.layer.0.bn1.running_var", "module.block1.layer.0.conv1.weight", "module.block1.layer.0.bn2.weight", "module.block1.layer.0.bn2.bias", "module.block1.layer.0.bn2.running_mean", "module.block1.layer.0.bn2.running_var", "module.block1.layer.0.conv2.weight", "module.block1.layer.0.convShortcut.weight", "module.block1.layer.1.bn1.weight", "module.block1.layer.1.bn1.bias", "module.block1.layer.1.bn1.running_mean", "module.block1.layer.1.bn1.running_var", "module.block1.layer.1.conv1.weight", "module.block1.layer.1.bn2.weight", "module.block1.layer.1.bn2.bias", "module.block1.layer.1.bn2.running_mean", "module.block1.layer.1.bn2.running_var", "module.block1.layer.1.conv2.weight", "module.block1.layer.2.bn1.weight", "module.block1.layer.2.bn1.bias", "module.block1.layer.2.bn1.running_mean", "module.block1.layer.2.bn1.running_var", "module.block1.layer.2.conv1.weight", "module.block1.layer.2.bn2.weight", "module.block1.layer.2.bn2.bias", "module.block1.layer.2.bn2.running_mean", "module.block1.layer.2.bn2.running_var", "module.block1.layer.2.conv2.weight", "module.block1.layer.3.bn1.weight", "module.block1.layer.3.bn1.bias", "module.block1.layer.3.bn1.running_mean", "module.block1.layer.3.bn1.running_var", "module.block1.layer.3.conv1.weight", "module.block1.layer.3.bn2.weight", "module.block1.layer.3.bn2.bias", "module.block1.layer.3.bn2.running_mean", "module.block1.layer.3.bn2.running_var", "module.block1.layer.3.conv2.weight", "module.block1.layer.4.bn1.weight", "module.block1.layer.4.bn1.bias", "module.block1.layer.4.bn1.running_mean", "module.block1.layer.4.bn1.running_var", "module.block1.layer.4.conv1.weight", "module.block1.layer.4.bn2.weight", "module.block1.layer.4.bn2.bias", "module.block1.layer.4.bn2.running_mean", "module.block1.layer.4.bn2.running_var", "module.block1.layer.4.conv2.weight", "module.block2.layer.0.bn1.weight", "module.block2.layer.0.bn1.bias", "module.block2.layer.0.bn1.running_mean", "module.block2.layer.0.bn1.running_var", "module.block2.layer.0.conv1.weight", "module.block2.layer.0.bn2.weight", "module.block2.layer.0.bn2.bias", "module.block2.layer.0.bn2.running_mean", "module.block2.layer.0.bn2.running_var", "module.block2.layer.0.conv2.weight", "module.block2.layer.0.convShortcut.weight", "module.block2.layer.1.bn1.weight", "module.block2.layer.1.bn1.bias", "module.block2.layer.1.bn1.running_mean", "module.block2.layer.1.bn1.running_var", "module.block2.layer.1.conv1.weight", "module.block2.layer.1.bn2.weight", "module.block2.layer.1.bn2.bias", "module.block2.layer.1.bn2.running_mean", "module.block2.layer.1.bn2.running_var", "module.block2.layer.1.conv2.weight", "module.block2.layer.2.bn1.weight", "module.block2.layer.2.bn1.bias", "module.block2.layer.2.bn1.running_mean", "module.block2.layer.2.bn1.running_var", "module.block2.layer.2.conv1.weight", "module.block2.layer.2.bn2.weight", "module.block2.layer.2.bn2.bias", "module.block2.layer.2.bn2.running_mean", "module.block2.layer.2.bn2.running_var", "module.block2.layer.2.conv2.weight", "module.block2.layer.3.bn1.weight", "module.block2.layer.3.bn1.bias", "module.block2.layer.3.bn1.running_mean", "module.block2.layer.3.bn1.running_var", "module.block2.layer.3.conv1.weight", "module.block2.layer.3.bn2.weight", "module.block2.layer.3.bn2.bias", "module.block2.layer.3.bn2.running_mean", "module.block2.layer.3.bn2.running_var", "module.block2.layer.3.conv2.weight", "module.block2.layer.4.bn1.weight", "module.block2.layer.4.bn1.bias", "module.block2.layer.4.bn1.running_mean", "module.block2.layer.4.bn1.running_var", "module.block2.layer.4.conv1.weight", "module.block2.layer.4.bn2.weight", "module.block2.layer.4.bn2.bias", "module.block2.layer.4.bn2.running_mean", "module.block2.layer.4.bn2.running_var", "module.block2.layer.4.conv2.weight", "module.block3.layer.0.bn1.weight", "module.block3.layer.0.bn1.bias", "module.block3.layer.0.bn1.running_mean", "module.block3.layer.0.bn1.running_var", "module.block3.layer.0.conv1.weight", "module.block3.layer.0.bn2.weight", "module.block3.layer.0.bn2.bias", "module.block3.layer.0.bn2.running_mean", "module.block3.layer.0.bn2.running_var", "module.block3.layer.0.conv2.weight", "module.block3.layer.0.convShortcut.weight", "module.block3.layer.1.bn1.weight", "module.block3.layer.1.bn1.bias", "module.block3.layer.1.bn1.running_mean", "module.block3.layer.1.bn1.running_var", "module.block3.layer.1.conv1.weight", "module.block3.layer.1.bn2.weight", "module.block3.layer.1.bn2.bias", "module.block3.layer.1.bn2.running_mean", "module.block3.layer.1.bn2.running_var", "module.block3.layer.1.conv2.weight", "module.block3.layer.2.bn1.weight", "module.block3.layer.2.bn1.bias", "module.block3.layer.2.bn1.running_mean", "module.block3.layer.2.bn1.running_var", "module.block3.layer.2.conv1.weight", "module.block3.layer.2.bn2.weight", "module.block3.layer.2.bn2.bias", "module.block3.layer.2.bn2.running_mean", "module.block3.layer.2.bn2.running_var", "module.block3.layer.2.conv2.weight", "module.block3.layer.3.bn1.weight", "module.block3.layer.3.bn1.bias", "module.block3.layer.3.bn1.running_mean", "module.block3.layer.3.bn1.running_var", "module.block3.layer.3.conv1.weight", "module.block3.layer.3.bn2.weight", "module.block3.layer.3.bn2.bias", "module.block3.layer.3.bn2.running_mean", "module.block3.layer.3.bn2.running_var", "module.block3.layer.3.conv2.weight", "module.block3.layer.4.bn1.weight", "module.block3.layer.4.bn1.bias", "module.block3.layer.4.bn1.running_mean", "module.block3.layer.4.bn1.running_var", "module.block3.layer.4.conv1.weight", "module.block3.layer.4.bn2.weight", "module.block3.layer.4.bn2.bias", "module.block3.layer.4.bn2.running_mean", "module.block3.layer.4.bn2.running_var", "module.block3.layer.4.conv2.weight", "module.bn1.weight", "module.bn1.bias", "module.bn1.running_mean", "module.bn1.running_var", "module.fc.weight", "module.fc.bias".
Unexpected key(s) in state_dict: "conv1.weight", "block1.layer.0.bn1.weight", "block1.layer.0.bn1.bias", "block1.layer.0.bn1.running_mean", "block1.layer.0.bn1.running_var", "block1.layer.0.bn1.num_batches_tracked", "block1.layer.0.conv1.weight", "block1.layer.0.bn2.weight", "block1.layer.0.bn2.bias", "block1.layer.0.bn2.running_mean", "block1.layer.0.bn2.running_var", "block1.layer.0.bn2.num_batches_tracked", "block1.layer.0.conv2.weight", "block1.layer.0.convShortcut.weight", "block1.layer.1.bn1.weight", "block1.layer.1.bn1.bias", "block1.layer.1.bn1.running_mean", "block1.layer.1.bn1.running_var", "block1.layer.1.bn1.num_batches_tracked", "block1.layer.1.conv1.weight", "block1.layer.1.bn2.weight", "block1.layer.1.bn2.bias", "block1.layer.1.bn2.running_mean", "block1.layer.1.bn2.running_var", "block1.layer.1.bn2.num_batches_tracked", "block1.layer.1.conv2.weight", "block1.layer.2.bn1.weight", "block1.layer.2.bn1.bias", "block1.layer.2.bn1.running_mean", "block1.layer.2.bn1.running_var", "block1.layer.2.bn1.num_batches_tracked", "block1.layer.2.conv1.weight", "block1.layer.2.bn2.weight", "block1.layer.2.bn2.bias", "block1.layer.2.bn2.running_mean", "block1.layer.2.bn2.running_var", "block1.layer.2.bn2.num_batches_tracked", "block1.layer.2.conv2.weight", "block1.layer.3.bn1.weight", "block1.layer.3.bn1.bias", "block1.layer.3.bn1.running_mean", "block1.layer.3.bn1.running_var", "block1.layer.3.bn1.num_batches_tracked", "block1.layer.3.conv1.weight", "block1.layer.3.bn2.weight", "block1.layer.3.bn2.bias", "block1.layer.3.bn2.running_mean", "block1.layer.3.bn2.running_var", "block1.layer.3.bn2.num_batches_tracked", "block1.layer.3.conv2.weight", "block1.layer.4.bn1.weight", "block1.layer.4.bn1.bias", "block1.layer.4.bn1.running_mean", "block1.layer.4.bn1.running_var", "block1.layer.4.bn1.num_batches_tracked", "block1.layer.4.conv1.weight", "block1.layer.4.bn2.weight", "block1.layer.4.bn2.bias", "block1.layer.4.bn2.running_mean", "block1.layer.4.bn2.running_var", "block1.layer.4.bn2.num_batches_tracked", "block1.layer.4.conv2.weight", "block2.layer.0.bn1.weight", "block2.layer.0.bn1.bias", "block2.layer.0.bn1.running_mean", "block2.layer.0.bn1.running_var", "block2.layer.0.bn1.num_batches_tracked", "block2.layer.0.conv1.weight", "block2.layer.0.bn2.weight", "block2.layer.0.bn2.bias", "block2.layer.0.bn2.running_mean", "block2.layer.0.bn2.running_var", "block2.layer.0.bn2.num_batches_tracked", "block2.layer.0.conv2.weight", "block2.layer.0.convShortcut.weight", "block2.layer.1.bn1.weight", "block2.layer.1.bn1.bias", "block2.layer.1.bn1.running_mean", "block2.layer.1.bn1.running_var", "block2.layer.1.bn1.num_batches_tracked", "block2.layer.1.conv1.weight", "block2.layer.1.bn2.weight", "block2.layer.1.bn2.bias", "block2.layer.1.bn2.running_mean", "block2.layer.1.bn2.running_var", "block2.layer.1.bn2.num_batches_tracked", "block2.layer.1.conv2.weight", "block2.layer.2.bn1.weight", "block2.layer.2.bn1.bias", "block2.layer.2.bn1.running_mean", "block2.layer.2.bn1.running_var", "block2.layer.2.bn1.num_batches_tracked", "block2.layer.2.conv1.weight", "block2.layer.2.bn2.weight", "block2.layer.2.bn2.bias", "block2.layer.2.bn2.running_mean", "block2.layer.2.bn2.running_var", "block2.layer.2.bn2.num_batches_tracked", "block2.layer.2.conv2.weight", "block2.layer.3.bn1.weight", "block2.layer.3.bn1.bias", "block2.layer.3.bn1.running_mean", "block2.layer.3.bn1.running_var", "block2.layer.3.bn1.num_batches_tracked", "block2.layer.3.conv1.weight", "block2.layer.3.bn2.weight", "block2.layer.3.bn2.bias", "block2.layer.3.bn2.running_mean", "block2.layer.3.bn2.running_var", "block2.layer.3.bn2.num_batches_tracked", "block2.layer.3.conv2.weight", "block2.layer.4.bn1.weight", "block2.layer.4.bn1.bias", "block2.layer.4.bn1.running_mean", "block2.layer.4.bn1.running_var", "block2.layer.4.bn1.num_batches_tracked", "block2.layer.4.conv1.weight", "block2.layer.4.bn2.weight", "block2.layer.4.bn2.bias", "block2.layer.4.bn2.running_mean", "block2.layer.4.bn2.running_var", "block2.layer.4.bn2.num_batches_tracked", "block2.layer.4.conv2.weight", "block3.layer.0.bn1.weight", "block3.layer.0.bn1.bias", "block3.layer.0.bn1.running_mean", "block3.layer.0.bn1.running_var", "block3.layer.0.bn1.num_batches_tracked", "block3.layer.0.conv1.weight", "block3.layer.0.bn2.weight", "block3.layer.0.bn2.bias", "block3.layer.0.bn2.running_mean", "block3.layer.0.bn2.running_var", "block3.layer.0.bn2.num_batches_tracked", "block3.layer.0.conv2.weight", "block3.layer.0.convShortcut.weight", "block3.layer.1.bn1.weight", "block3.layer.1.bn1.bias", "block3.layer.1.bn1.running_mean", "block3.layer.1.bn1.running_var", "block3.layer.1.bn1.num_batches_tracked", "block3.layer.1.conv1.weight", "block3.layer.1.bn2.weight", "block3.layer.1.bn2.bias", "block3.layer.1.bn2.running_mean", "block3.layer.1.bn2.running_var", "block3.layer.1.bn2.num_batches_tracked", "block3.layer.1.conv2.weight", "block3.layer.2.bn1.weight", "block3.layer.2.bn1.bias", "block3.layer.2.bn1.running_mean", "block3.layer.2.bn1.running_var", "block3.layer.2.bn1.num_batches_tracked", "block3.layer.2.conv1.weight", "block3.layer.2.bn2.weight", "block3.layer.2.bn2.bias", "block3.layer.2.bn2.running_mean", "block3.layer.2.bn2.running_var", "block3.layer.2.bn2.num_batches_tracked", "block3.layer.2.conv2.weight", "block3.layer.3.bn1.weight", "block3.layer.3.bn1.bias", "block3.layer.3.bn1.running_mean", "block3.layer.3.bn1.running_var", "block3.layer.3.bn1.num_batches_tracked", "block3.layer.3.conv1.weight", "block3.layer.3.bn2.weight", "block3.layer.3.bn2.bias", "block3.layer.3.bn2.running_mean", "block3.layer.3.bn2.running_var", "block3.layer.3.bn2.num_batches_tracked", "block3.layer.3.conv2.weight", "block3.layer.4.bn1.weight", "block3.layer.4.bn1.bias", "block3.layer.4.bn1.running_mean", "block3.layer.4.bn1.running_var", "block3.layer.4.bn1.num_batches_tracked", "block3.layer.4.conv1.weight", "block3.layer.4.bn2.weight", "block3.layer.4.bn2.bias", "block3.layer.4.bn2.running_mean", "block3.layer.4.bn2.running_var", "block3.layer.4.bn2.num_batches_tracked", "block3.layer.4.conv2.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "bn1.num_batches_tracked", "fc.weight", "fc.bias".

I want to test the epoch from 1 to 100, same issue for your default evaluation epoch range. Here is checkpoint files, model-path-parameter is for directory.

How can I resolve the issue?

Thanks & Regards！
Momo

tabular data/ noisy instances

Hi,
thanks for sharing your implementation. I have two questions about it:

Does it also work on tabular data?
Is it possible to identify the noisy instances (return the noisy IDs or the clean set)?

Thanks!

Error when loading pretrained checkpoint.

I use your code models/wideresnet.py to load the given checkpoint model-wideres-epoch78.pth. But it raised the error:

untimeError: Error(s) in loading state_dict for DataParallel: Unexpected key(s) in state_dict: "module.sub_block1.layer.0.bn1.weight", "module.sub_block1.layer.0.bn1.bias", "module.sub_block1.layer.0.bn1.running_mean", "module.sub_block1.layer.0.bn1.running_var", "module.sub_block1.layer.0.bn1.num_batches_tracked", "module.sub_block1.layer.0.conv1.weight", "module.sub_block1.layer.0.bn2.weight", "module.sub_block1.layer.0.bn2.bias", "module.sub_block1.layer.0.bn2.running_mean", "module.sub_block1.layer.0.bn2.running_var", "module.sub_block1.layer.0.bn2.num_batches_tracked", "module.sub_block1.layer.0.conv2.weight", "module.sub_block1.layer.0.convShortcut.weight", "modul
e.sub_block1.layer.1.bn1.weight", "module.sub_block1.layer.1.bn1.bias", "module.sub_block1.layer.1.bn1.running_mean", "module.sub_block1.layer.1.bn1.running_var", "module.sub_block1.layer.1.bn1.num_batches_tr
acked", "module.sub_block1.layer.1.conv1.weight", "module.sub_block1.layer.1.bn2.weight", "module.sub_block1.layer.1.bn2.bias", "module.sub_block1.layer.1.bn2.running_mean", "module.sub_block1.layer.1.bn2.run
ning_var", "module.sub_block1.layer.1.bn2.num_batches_tracked", "module.sub_block1.layer.1.conv2.weight", "module.sub_block1.layer.2.bn1.weight", "module.sub_block1.layer.2.bn1.bias", "module.sub_block1.layer
.2.bn1.running_mean", "module.sub_block1.layer.2.bn1.running_var", "module.sub_block1.layer.2.bn1.num_batches_tracked", "module.sub_block1.layer.2.conv1.weight", "module.sub_block1.layer.2.bn2.weight", "modul
e.sub_block1.layer.2.bn2.bias", "module.sub_block1.layer.2.bn2.running_mean", "module.sub_block1.layer.2.bn2.running_var", "module.sub_block1.layer.2.bn2.num_batches_tracked", "module.sub_block1.layer.2.conv2
.weight", "module.sub_block1.layer.3.bn1.weight", "module.sub_block1.layer.3.bn1.bias", "module.sub_block1.layer.3.bn1.running_mean", "module.sub_block1.layer.3.bn1.running_var", "module.sub_block1.layer.3.bn
1.num_batches_tracked", "module.sub_block1.layer.3.conv1.weight", "module.sub_block1.layer.3.bn2.weight", "module.sub_block1.layer.3.bn2.bias", "module.sub_block1.layer.3.bn2.running_mean", "module.sub_block1
.layer.3.bn2.running_var", "module.sub_block1.layer.3.bn2.num_batches_tracked", "module.sub_block1.layer.3.conv2.weight", "module.sub_block1.layer.4.bn1.weight"

请问如何与label smoothing结合呢？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.