Git Product home page Git Product logo

cream's Introduction

Neural Architecture Design and Search Tweet

This is a collection of our NAS and Vision Transformer work

TinyCLIP (@ICCV'23): TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

EfficientViT (@CVPR'23): EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

TinyViT (@ECCV'22): TinyViT: Fast Pretraining Distillation for Small Vision Transformers

MiniViT (@CVPR'22): MiniViT: Compressing Vision Transformers with Weight Multiplexing

CDARTS (@TPAMI'22): Cyclic Differentiable Architecture Search

AutoFormerV2 (@NeurIPS'21): Searching the Search Space of Vision Transformer

iRPE (@ICCV'21): Rethinking and Improving Relative Position Encoding for Vision Transformer

AutoFormer (@ICCV'21): AutoFormer: Searching Transformers for Visual Recognition

Cream (@NeurIPS'20): Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

We also implemented our NAS algorithms on Microsoft NNI (Neural Network Intelligence).

News

  • ☀️ Hiring research interns for next-generation model design, efficient large model inference: [email protected]
  • 💥 Sep, 2023: Code for TinyCLIP is now released.
  • 💥 Jul, 2023: TinyCLIP accepted to ICCV'23
  • 💥 May, 2023: Code for EfficientViT is now released.
  • 💥 Mar, 2023: EfficientViT accepted to CVPR'23
  • 💥 Jul, 2022: Code for TinyViT is now released.
  • 💥 Apr, 2022: Code for MiniViT is now released.
  • 💥 Mar, 2022: MiniViT has been accepted by CVPR'22.
  • 💥 Feb, 2022: Code for CDARTS is now released.
  • 💥 Feb, 2022: CDARTS has been accepted by TPAMI'22.
  • 💥 Jan, 2022: Code for AutoFormerV2 is now released.
  • 💥 Oct, 2021: AutoFormerV2 has been accepted by NeurIPS'21, code will be released soon.
  • 💥 Aug, 2021: Code for AutoFormer is now released.
  • 💥 July, 2021: iRPE code (with CUDA Acceleration) is now released. Paper is here.
  • 💥 July, 2021: iRPE has been accepted by ICCV'21.
  • 💥 July, 2021: AutoFormer has been accepted by ICCV'21.
  • 💥 July, 2021: AutoFormer is now available on arXiv.
  • 💥 Oct, 2020: Code for Cream is now released.
  • 💥 Oct, 2020: Cream was accepted to NeurIPS'20

Works

TinyCLIP is a novel cross-modal distillation method for large-scale language-image pre-trained models. The method introduces two core techniques: affinity mimicking and weight inheritance. This work unleashes the capacity of small CLIP models, fully leveraging large-scale models as well as pre-training data and striking the best trade-off between speed and accuracy.

TinyCLIP overview

EfficientViT is a family of high-speed vision transformers. It is built with a new memory efficient building block with a sandwich layout, and an efficient cascaded group attention operation which mitigates attention computation redundancy.

EfficientViT overview

TinyViT is a new family of tiny and efficient vision transformers pretrained on large-scale datasets with our proposed fast distillation framework. The central idea is to transfer knowledge from large pretrained models to small ones. The logits of large teacher models are sparsified and stored in disk in advance to save the memory cost and computation overheads.

TinyViT overview

MiniViT is a new compression framework that achieves parameter reduction in vision transformers while retaining the same performance. The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks. Specifically, we make the weights shared across layers, while imposing a transformation on the weights to increase diversity. Weight distillation over self-attention is also applied to transfer knowledge from large-scale ViT models to weight-multiplexed compact models.

MiniViT overview

In this work, we propose new joint optimization objectives and a novel Cyclic Differentiable ARchiTecture Search framework, dubbed CDARTS. Considering the structure difference, CDARTS builds a cyclic feedback mechanism between the search and evaluation networks with introspective distillation.

CDARTS overview

In this work, instead of searching the architecture in a predefined search space, with the help of AutoFormer, we proposed to search the search space to automatically find a great search space first. After that we search the architectures in the searched space. In addition, we provide insightful observations and guidelines for general vision transformer design.

AutoFormerV2 overview

AutoFormer is new one-shot architecture search framework dedicated to vision transformer search. It entangles the weights of different vision transformer blocks in the same layers during supernet training. Benefiting from the strategy, the trained supernet allows thousands of subnets to be very well-trained. Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained from scratch.

AutoFormer overview

Image RPE (iRPE for short) methods are new relative position encoding methods dedicated to 2D images, considering directional relative distance modeling as well as the interactions between queries and relative position embeddings in self-attention mechanism. The proposed iRPE methods are simple and lightweight, being easily plugged into transformer blocks. Experiments demonstrate that solely due to the proposed encoding methods, DeiT and DETR obtain up to 1.5% (top-1 Acc) and 1.3% (mAP) stable improvements over their original versions on ImageNet and COCO respectively, without tuning any extra hyperparamters such as learning rate and weight decay. Our ablation and analysis also yield interesting findings, some of which run counter to previous understanding.

iRPE overview

[Paper] [Models-Google Drive][Models-Baidu Disk (password: wqw6)] [Slides] [BibTex]

In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.

Bibtex

@InProceedings{tinyclip,
    title     = {TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance},
    author    = {Wu, Kan and Peng, Houwen and Zhou, Zhenghong and Xiao, Bin and Liu, Mengchen and Yuan, Lu and Xuan, Hong and Valenzuela, Michael and Chen, Xi (Stephen) and Wang, Xinggang and Chao, Hongyang and Hu, Han},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {21970-21980}
}

@InProceedings{liu2023efficientvit,
    title     = {EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention},
    author    = {Liu, Xinyu and Peng, Houwen and Zheng, Ningxin and Yang, Yuqing and Hu, Han and Yuan, Yixuan},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2023},
}

@InProceedings{tiny_vit,
  title={TinyViT: Fast Pretraining Distillation for Small Vision Transformers},
  author={Wu, Kan and Zhang, Jinnian and Peng, Houwen and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
  booktitle={European conference on computer vision (ECCV)},
  year={2022}
}

@InProceedings{MiniViT,
    title     = {MiniViT: Compressing Vision Transformers With Weight Multiplexing},
    author    = {Zhang, Jinnian and Peng, Houwen and Wu, Kan and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {12145-12154}
}

@article{CDARTS,
  title={Cyclic Differentiable Architecture Search},
  author={Yu, Hongyuan and Peng, Houwen and Huang, Yan and Fu, Jianlong and Du, Hao and Wang, Liang and Ling, Haibin},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year={2022}
}

@article{S3,
  title={Searching the Search Space of Vision Transformer},
  author={Minghao, Chen and Kan, Wu and Bolin, Ni and Houwen, Peng and Bei, Liu and Jianlong, Fu and Hongyang, Chao and Haibin, Ling},
  booktitle={Conference and Workshop on Neural Information Processing Systems (NeurIPS)},
  year={2021}
}

@InProceedings{iRPE,
    title     = {Rethinking and Improving Relative Position Encoding for Vision Transformer},
    author    = {Wu, Kan and Peng, Houwen and Chen, Minghao and Fu, Jianlong and Chao, Hongyang},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {10033-10041}
}

@InProceedings{AutoFormer,
    title     = {AutoFormer: Searching Transformers for Visual Recognition},
    author    = {Chen, Minghao and Peng, Houwen and Fu, Jianlong and Ling, Haibin},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {12270-12280}
}

@article{Cream,
  title={Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search},
  author={Peng, Houwen and Du, Hao and Yu, Hongyuan and Li, Qi and Liao, Jing and Fu, Jianlong},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

License

License under an MIT license.

cream's People

Contributors

crj1998 avatar dependabot[bot] avatar dominickzhang avatar eltociear avatar hongyuanyu avatar microsoftopensource avatar penghouwen avatar silent-chen avatar tapphughesn avatar wkcn avatar xinyuliu-jeffrey avatar z7zuqer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cream's Issues

Question: Dense prediction

Thanks for your work, I'm looking forward to see AutoFormer code.
As far as I understand you concentrate on image classification task.
Will you release code setup for dense prediction tasks such as segmentation or detection?

Some problems aboout search_for_layer

Thanks for your excellent work!
I am confused by the code of search_structure_supernet.py/search_for_layer
how to understand :
order = [2, 3, 4, 1, 0, 2, 3, 4, 1, 0]
limits = [3, 3, 3, 2, 2, 4, 4, 4, 4, 4]

Installation problem of irpe's CUDA operators

Hi, I come up with a problem when using the cuda operators.

"UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.warnings.warn(msg.format('we could not find ninja.'))

I am not sure if it is due to the compatibility issue of neural_renderer package with latest version of pytorch(1.9).

Thank you.

Architecture Details of AutoFormerV2

Hi, I haven't found specific architecture details of S3 model in your paper. Similarly, I haven't found model config parameters in github folder Cream/AutoFormerV2. Could you provide specific model paramter such as mlp ratio for AutoFormerV2 family. Thank you very much~

RuntimeError with multihead_super.py

Hi , the forward function in multihead_super.py has the code as follows:
qkv = self.qkv(x).reshape(B, N, 3, self.sample_num_heads, -1).permute(2, 0, 3, 1, 4)
I think this line means that calculating qkv tensors by the number of head. But when running code , I meet the error as follows:
qkv = self.qkv(x).reshape(B, N, 3, self.sample_num_heads, -1).permute(2, 0, 3, 1, 4) RuntimeError: shape '[64, 197, 3, 3, -1]' is invalid for input of size 9682944
The last dimension is not divisible. So could you please tell me how to solve it? I am a beginner in the NAS field. Please correct me if I have any questions.

FLOPs in the paper

hi, I have a question about the FLOPs reported in the paper. In Table 5, Cream-S got 287M FLOPs. But, I found it should be 318M FLOPs based on the architecture in the appendix.

Hypernet training code

Came across your nice work on NAS. I wanted to experiment with different search space. I guess I need to train hypernet before I can do search operation. Is Hypernet training code released?

Modify supernet for finetuning on a different dataset

I tried using checkpoints for supernet for training a different dataset but I'm getting a size mismatch error:

python supernet_train.py --data-path .\dslr_split --gp --change_qk --relative_position --mode super --dist-eval --cfg .\experiments\supernet\supernet-B.yaml --resume .\supernet-base.pth --epochs 600 --warmup-epochs 20 --output .\outputs1_B_ft_500_dslr --batch-size 32

Not using distributed mode

Namespace(aa='rand-m9-mstd0.5-inc1', amp=True, batch_size=32, cfg='.\experiments\supernet\supernet-B.yaml', change_qkv=True, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='.\dslr_split', data_set='AMZN', decay_epochs=30, decay_rate=0.1, device='cuda', dist_eval=True, dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=600, eval=False, gp=True, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, lr_power=1.0, max_relative_position=14, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, mode='super', model='', model_ema=False, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, no_abs_pos=False, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='.\outputs1_B_ft_500_dslr', patch_size=16, patience_epochs=10, pin_mem=True, platform='pai', post_norm=False, recount=1, relative_position=True, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='.\supernet-base.pth', rpe_type='bias', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, teacher_model='', train_interpolation='bicubic', warmup_epochs=20, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
Creating SuperVisionTransformer
{'SUPERNET': {'MLP_RATIO': 4.0, 'NUM_HEADS': 10, 'EMBED_DIM': 640, 'DEPTH': 16}, 'SEARCH_SPACE': {'MLP_RATIO': [3.0, 3.5, 4.0], 'NUM_HEADS': [9, 10], 'DEPTH': [14, 15, 16], 'EMBED_DIM': [528, 576, 624]}}
number of params: 79539231
Traceback (most recent call last):
File "supernet_train.py", line 397, in
main(args)
File "supernet_train.py", line 323, in main
model_without_ddp.load_state_dict(checkpoint['model'])
File "C:\Users\iqbal.conda\envs\Autoformer\lib\site-packages\torch\nn\modules\module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Vision_TransformerSuper:
size mismatch for head.weight: copying a param with shape torch.Size([1000, 640]) from checkpoint, the shape in current model is torch.Size([31, 640]).
size mismatch for head.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([31]).

Can you please let me know what exactly I'm doing wrong? Thanks

Question about the MMN network design

Hi, thanks for the nice work and code!

The MMN is a single fully-connected layed defined and used in
https://github.com/microsoft/Cream/blob/main/lib/models/structures/supernet.py#L92
https://github.com/microsoft/Cream/blob/main/lib/models/structures/supernet.py#L127
and then used by the board in
https://github.com/microsoft/Cream/blob/main/lib/models/PrioritizedBoard.py#L32

I do wonder why the MMN size depends on the slice size (i.e. the number of images) and concatenates their class probabilities, instead of using the same weights for each of image?

关于supernet scheduler

lib/utils/util.py 中,scheduler 的自定义函数为 lambda step: (cfg.LR - step / ITERS),是否该改为 lambda step: (1 - step / ITERS) ?

maybe a bug

tool/train.py 中,logger.info 并不是在所有 0 号进程中使用的,在另外的进程中 logger.info 会报错,比如 148 和 173 行。

question about irpe

Hi, thanks for your work. Just a quick question.

line194 of irpe.py:

def _rp_2d_product(diff, **kwargs):
beta_int = int(kwargs['beta'])
S = 2 * beta_int + 1
# the output of piecewise index function is in [-beta_int, beta_int]
r = piecewise_index(diff[:, :, 0], **kwargs) + beta_int # [0, 2 * beta_int]
c = piecewise_index(diff[:, :, 1], **kwargs) + beta_int # [0, 2 * beta_int]
pid = r * S + c
return pid

r and c supposed to be piecewise index for row and column respectively. But I can't understand why times r with the buckets number(or square root of buckets number in product method)?

Thank you!

?Error in simulate_sgd_update for MetaMatchingNetwork

Hello, thanks for your contribution!
I have found something confusing in the code. It would be great if you could help with these:

In

lib/models/MetaMatchingNetwork.py

student weights are updated using SGD in line 19-25.
But in line 43-44, it writes

def simulate_sgd_update(self, w, g, optimizer):
return g * optimizer.param_groups[-1]['lr'] + w

which simulates the updated student weights via gradient ascent in order to take part in the calculation of teacher weights.

It turns out that updating student weights in "simulate_sgd_update" should be in accordance with updating student weights via SGD as in line 23. But in practice gradient ascent is used in one case while gradient descent is used in the other.

P.S. It also a little bit confusing to simulate the updated student weights right before you actually update them. Why not just take the updated weights from the student model? That's the real updated weights afterall.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

I encountered with a runtime error when I tried to search for an architecture based on your code.

/opt/conda/conda-bld/pytorch_1565272279342/work/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
  File "tools/train.py", line 300, in <module>
    main()
  File "tools/train.py", line 259, in main
    est=model_est, local_rank=args.local_rank)
  File "/opt/tiger/cream/lib/core/train.py", line 55, in train_epoch
    output = model(input, random_cand)
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/tiger/cream/lib/models/structures/supernet.py", line 121, in forward
    x = self.forward_features(x, architecture)
  File "/opt/tiger/cream/lib/models/structures/supernet.py", line 113, in forward_features
    x = blocks[arch](x)
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/timm/models/efficientnet_blocks.py", line 133, in forward
    x = self.bn1(x)
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/nn/functional.py", line 1656, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled

Traceback (most recent call last):
  File "tools/train.py", line 300, in <module>
    main()
  File "tools/train.py", line 259, in main
    est=model_est, local_rank=args.local_rank)
  File "/opt/tiger/cream/lib/core/train.py", line 67, in train_epoch
    loss.backward()
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/tiger/.conda/envs/Cream/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [320]] is at version 2507; expected version 2506 instead. Hint: the backtr
ace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I tried to locate the source of the error, and I find that whenever the code update the meta network or add the kd_loss to the final loss the error above appears.
How can I fix this problem?

AssertionError: Invalid type <class 'NoneType'> for key LR_NOISE; valid types = {<class 'str'>, <class 'bool'>, <class 'list'>, <class 'float'>, <class 'int'>, <class 'tuple'>}

Traceback (most recent call last):
File "./tools/main.py", line 13, in
from lib.config import cfg
File "/opt/tiger/xiaxin/work/Cream/tools/../lib/config.py", line 74, in
__C.LR_NOISE = None
File "/home/tiger/.local/lib/python3.7/site-packages/yacs/config.py", line 158, in setattr
type(value), name, _VALID_TYPES
File "/home/tiger/.local/lib/python3.7/site-packages/yacs/config.py", line 525, in _assert_with_logging
assert cond, msg
AssertionError: Invalid type <class 'NoneType'> for key LR_NOISE; valid types = {<class 'str'>, <class 'bool'>, <class 'list'>, <class 'float'>, <class 'int'>, <class 'tuple'>}

Problem of the _calculate_2nd_gradient

When I run the cream nas with pytorch==1.7.1, problem appears:

File "/app/nni_lib/trainer.py", line 294, in _calculate_2nd_gradient
grad_outputs=grad_student_val,
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 192, in grad
inputs, allow_unused)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3136, 1]], which is output 0 of TBackward, is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

The second grad calculation may have some problem?

grad_teacher = torch.autograd.grad(
students_weight[0],
self.model.rand_parameters(teacher_cand, self.pick_method=='meta'),
grad_outputs=grad_student_val,
)

About the parameters configs setting of Autoformer

Hi, thanks for the great works on Autoformer.

I wonder how can I set the param limit configs on the searching process.
For example, would you please provide the configs for AutoFormer-base so that we will get more familiar with configuring=--min-param-limits and--param-limits.

Some wrongs with nvcc fatal : Unsupported gpu architecture 'compute_86'

Hi, thanks for your great work. But I have some problems when I run the following command:
cd /rpe_ops
python setup.py install --user

FAILED: /home/UserDirectory/hongshengz/Stark-main/lib/models/stark/rpe_ops/build/temp.linux-x86_64-3.8/rpe_index_cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/UserDirectory/hongshengz/Stark-main/lib/models/stark/rpe_ops/build/temp.linux-x86_64-3.8/rpe_index_cuda.o.d -DWITH_CUDA -I/home/UserDirectory/hongshengz/anaconda3/lib/python3.8/site-packages/torch/include -I/home/UserDirectory/hongshengz/anaconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/UserDirectory/hongshengz/anaconda3/lib/python3.8/site-packages/torch/include/TH -I/home/UserDirectory/hongshengz/anaconda3/lib/python3.8/site-packages/torch/include/THC -I/home/UserDirectory/hongshengz/anaconda3/include/python3.8 -c -c /home/UserDirectory/hongshengz/Stark-main/lib/models/stark/rpe_ops/rpe_index_cuda.cu -o /home/UserDirectory/hongshengz/Stark-main/lib/models/stark/rpe_ops/build/temp.linux-x86_64-3.8/rpe_index_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=rpe_index_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
nvcc fatal : Unsupported gpu architecture 'compute_86'

My environment is:
RTX3090
CUDA:11.4
torch_version: 1.8.1

RuntimeError: CUDA error: invalid device function?

We have compiled the cuda version of IRPE module with the setup.py file in DETR-with-iRPE. When start to train the model,

there is the issue:

  File "***/rpe_attention/rpe_attention_function.py", line 330, in rpe_multi_head_attention_forward
attn_output_weights_view.add_(rpe_k(q_view, height=hw[0], width=hw[1]))

RuntimeError: CUDA error: invalid device function

The environment of our project is :

pytorch:1.9.1
python:3.8
torchvision: 0.10.1
cudatoolkit: 10.2.89

I debug the train process , the main reason is that the output of function rep_k(*) and rep_q, can not perform add operation with attn_output_weights_view. could you give suggestion?

Does `embed dim` vary across Transformer blocks?

I was confused with this question when I read the paper of AutoFormer. According to the gif file in the README.md, the embed dim parameter does vary across blocks. So how do you deal with the shortcut when the input embed dim is different with the output embed dim for a Transformer block? And how do you deal with the dimension mismatch between one block's output embed and next one's input embed?

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one.

Hello,

Thanks for your great work! Your idea is brilliant!

When i tried to run 'python ./tools/main.py train ./experiments/configs/train/train.yaml', it looks like something wrong happend in backpropagation:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


11/06 02:53:08 PM | Training on Process 0 with 2 GPUs.
11/06 02:53:09 PM | Supernet created, param count: 29809073
11/06 02:53:09 PM | resolution: 224
11/06 02:53:09 PM | choice number: 6
11/06 02:53:15 PM | Using torch DistributedDataParallel. Install NVIDIA Apex for Apex DDP.
11/06 02:53:15 PM | Scheduled epochs: 120
11/06 02:53:31 PM | Train: 0 [ 0/5004 ( 0%)] Loss: 6.901703 (6.9017) KD-Loss: 0.000000 (0.0000) Prec@1: 0.0000 ( 0.0000) Prec@5: 0.0000 ( 0.0000) Time: 1.926s, 132.92/s (1.926s, 132.92/s) LR: 5.000e-01 Data: 1.326 (1.326)
Traceback (most recent call last):
File "tools/train.py", line 239, in
main()
File "tools/train.py", line 217, in main
est=model_est, local_rank=args.local_rank)
File "/home/tiger/Cream/tools/../lib/core/train.py", line 51, in train_epoch
output = model(input, random_cand)
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 459, in forward
self.reducer.prepare_for_backward([])
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /pytorch/torch/csrc/distributed/c10d/reducer.cpp:518)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7efbdb652273 in /opt/tiger/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::prepare_for_backward(std::vector<torch::autograd::Variable, std::allocatortorch::autograd::Variable > const&) + 0x734 (0x7efc25d088e4 in /opt/tiger/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0x69795c (0x7efc25cf795c in /opt/tiger/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x1d3ab4 (0x7efc25833ab4 in /opt/tiger/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: _PyMethodDef_RawFastCallKeywords + 0x254 (0x55b1da967744 in /opt/tiger/conda/bin/python)
frame #5: _PyCFunction_FastCallKeywords + 0x21 (0x55b1da967861 in /opt/tiger/conda/bin/python)
frame #6: _PyEval_EvalFrameDefault + 0x52f8 (0x55b1da9d36e8 in /opt/tiger/conda/bin/python)
frame #7: _PyEval_EvalCodeWithName + 0x2f9 (0x55b1da917539 in /opt/tiger/conda/bin/python)
frame #8: _PyFunction_FastCallDict + 0x1d5 (0x55b1da918635 in /opt/tiger/conda/bin/python)
frame #9: _PyObject_Call_Prepend + 0x63 (0x55b1da936e53 in /opt/tiger/conda/bin/python)
frame #10: PyObject_Call + 0x6e (0x55b1da929dbe in /opt/tiger/conda/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x1e42 (0x55b1da9d0232 in /opt/tiger/conda/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x2f9 (0x55b1da917539 in /opt/tiger/conda/bin/python)
frame #13: _PyFunction_FastCallDict + 0x1d5 (0x55b1da918635 in /opt/tiger/conda/bin/python)
frame #14: _PyObject_Call_Prepend + 0x63 (0x55b1da936e53 in /opt/tiger/conda/bin/python)
frame #15: + 0x16ba3a (0x55b1da96ea3a in /opt/tiger/conda/bin/python)
frame #16: _PyObject_FastCallKeywords + 0x49b (0x55b1da96f8fb in /opt/tiger/conda/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x4a96 (0x55b1da9d2e86 in /opt/tiger/conda/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x55b1da917539 in /opt/tiger/conda/bin/python)
frame #19: _PyFunction_FastCallKeywords + 0x387 (0x55b1da966f57 in /opt/tiger/conda/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x14dc (0x55b1da9cf8cc in /opt/tiger/conda/bin/python)
frame #21: _PyFunction_FastCallKeywords + 0xfb (0x55b1da966ccb in /opt/tiger/conda/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x416 (0x55b1da9ce806 in /opt/tiger/conda/bin/python)
frame #23: _PyEval_EvalCodeWithName + 0x2f9 (0x55b1da917539 in /opt/tiger/conda/bin/python)
frame #24: PyEval_EvalCodeEx + 0x44 (0x55b1da918424 in /opt/tiger/conda/bin/python)
frame #25: PyEval_EvalCode + 0x1c (0x55b1da91844c in /opt/tiger/conda/bin/python)
frame #26: + 0x22ab74 (0x55b1daa2db74 in /opt/tiger/conda/bin/python)
frame #27: PyRun_FileExFlags + 0xa1 (0x55b1daa37eb1 in /opt/tiger/conda/bin/python)
frame #28: PyRun_SimpleFileExFlags + 0x1c3 (0x55b1daa380a3 in /opt/tiger/conda/bin/python)
frame #29: + 0x236195 (0x55b1daa39195 in /opt/tiger/conda/bin/python)
frame #30: _Py_UnixMain + 0x3c (0x55b1daa392bc in /opt/tiger/conda/bin/python)
frame #31: __libc_start_main + 0xeb (0x7efc2c63309b in /lib/x86_64-linux-gnu/libc.so.6)
frame #32: + 0x1db062 (0x55b1da9de062 in /opt/tiger/conda/bin/python)

Is there any solution?

SuperNet training log file?

Hi~ 谢谢你们完美的工作 你能给我提供你们训练超网的LOG文件吗 我发现在训练超网时prec1特别低 如下
image
我在训练spos的时候前6个epoch精度已经到30左右了

Questions about search space of Cream

Hi, thank you for your great work! I am interested in Cream but I met some problems when reading both the paper and the source code.

Question 1:

In supernet.py:

  arch_def = [
      # stage 0, 112x112 in
      ['ds_r1_k3_s1_e1_c16_se0.25'],
      # stage 1, 112x112 in
      ['ir_r1_k3_s2_e4_c24_se0.25', 'ir_r1_k3_s1_e4_c24_se0.25', 'ir_r1_k3_s1_e4_c24_se0.25',
       'ir_r1_k3_s1_e4_c24_se0.25'],
      # stage 2, 56x56 in
      ['ir_r1_k5_s2_e4_c40_se0.25', 'ir_r1_k5_s1_e4_c40_se0.25', 'ir_r1_k5_s2_e4_c40_se0.25',
       'ir_r1_k5_s2_e4_c40_se0.25'],
      # stage 3, 28x28 in
      ['ir_r1_k3_s2_e6_c80_se0.25', 'ir_r1_k3_s1_e4_c80_se0.25', 'ir_r1_k3_s1_e4_c80_se0.25',
       'ir_r2_k3_s1_e4_c80_se0.25'],
      # stage 4, 14x14in
      ['ir_r1_k3_s1_e6_c96_se0.25', 'ir_r1_k3_s1_e6_c96_se0.25', 'ir_r1_k3_s1_e6_c96_se0.25',
       'ir_r1_k3_s1_e6_c96_se0.25'],
      # stage 5, 14x14in
      ['ir_r1_k5_s2_e6_c192_se0.25', 'ir_r1_k5_s1_e6_c192_se0.25', 'ir_r1_k5_s2_e6_c192_se0.25',
       'ir_r1_k5_s2_e6_c192_se0.25'],
      # stage 6, 7x7 in
      ['cn_r1_k1_s1_c320_se0.25'],
  ]

There are specific numbers of blocks for each stage. In this case, there are 4,4,5,4,4 blocks.

However, in your paper, the repeat number is ranging from 4 to 6. The code doesn't match the description in the paper.

image

Question 2:

Another question is about skip connection operation. In your paper, the description is as below:

image

But I can not find the Skip Connection in your search space.

self.choices = [[x, y] for x in choices['kernel_size']

There are only 6 operations in your Search Space.

Some questions about experiments in the search phase

Thanks for your excellent work!
Q1: I am confused by the configure file: ./experiments/configs/train/train.yaml
how to understand :
PRE_PROB: (0.05,0.2,0.05,0.5,0.05,0.15)
Where do these numbers come from?

Q2: In lib/core/train.py file, ‘random_cand’ represents the randomly searched path. What I want to ask is what number is used to represent the SkipConnect?
Related code:

Lib/core/train.py:
random_cand = prioritized_board.get_cand_with_prob(prob)
random_cand.insert(0, [0])
random_cand.append([0])
——————————————————
Lib/models/PrioritizedBoard.py:
def get_cand_with_prob(self, prob=None):
if prob is None:
get_random_cand = [
np.random.choice(
self.choice_num,
item).tolist() for item in self.sta_num]
else:
get_random_cand = [
np.random.choice(
self.choice_num,
item,
prob).tolist() for item in self.sta_num]
return get_random_cand
——————————————————
If a number is used to represent SkipConnect, why is choice_num set to 6?

I'm sorry that I didn't understand these two parts by myself, and I look forward to the author's answer.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu!

When I tried to resume training from the checkpoint file, I encountered with a runtimeError.
The error message is as below:

Traceback (most recent call last):
  File "tools/train.py", line 243, in <module>
    main()
  File "tools/train.py", line 218, in main
    train_metrics = train_epoch(epoch, model, loader_train, optimizer,
  File "/opt/tiger/cream/tools/../lib/core/train.py", line 68, in train_epoch
    optimizer.step()
  File "/usr/local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 67, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/torch/optim/sgd.py", line 106, in step
    buf.mul_(momentum).add_(d_p, alpha=1 - dampening)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cpu!

Part of my train.yaml is as follows:

AUTO_RESUME: True
DATA_DIR: '/opt/tiger/data/ImageNet'
MODEL: 'Supernet_Training'
RESUME_PATH: './experiments/workspace/train/11041800-Supernet_Training/checkpoint-0.pth.tar'
SAVE_PATH: './experiments/workspace/train'

How can I deal with this error and resume training using the checkpoint file?

meta_value questions

Hi,

thank You for Your quick responses so far, I hope You can bear with me.
I have two questions regarding the following piece of code (see https://github.com/microsoft/Cream/blob/main/lib/models/PrioritizedBoard.py#L27)

        elif self.cfg.SUPERNET.PICK_METHOD == 'meta':
            meta_value, cand_idx, teacher_cand = -1000000000, -1, None
            for now_idx, item in enumerate(self.prioritized_board):
                inputx = item[4]
                output = F.softmax(model(inputx, random_cand), dim=1)
                weight = model.module.forward_meta(output - item[5])
                if weight > meta_value:
                    meta_value = weight
                    cand_idx = now_idx
                    teacher_cand = self.prioritized_board[cand_idx][3]
            assert teacher_cand is not None
            meta_value = torch.sigmoid(-weight)
    ...
    return meta_value, teacher_cand
  1. why use torch.sigmoid(-weight), and not torch.sigmoid(-meta_value) ?
    weight will refer to the last entry in the board, and might not belong to the selected teacher_cand

2) why use torch.sigmoid(-weight), and not torch.sigmoid(weight) ?
since weight describes the matching degree (higher is better, the teacher candidate is also selected this way), torch.sigmoid(-weight) will go towards 0.
subsequently, a high matching value has lower L_kd impact on the loss function (https://github.com/microsoft/Cream/blob/main/lib/core/train.py#L64):
loss = (meta_value * kd_loss + (2 - meta_value) * valid_loss) / 2

Am I missing something?

Best,
Kevin

question about resuming

Hi,

Thanks for your help and your great NAS idea!

I have a question about resuming training:

If I want to resume from a checkpoint, from my standpoint, I need 10 best checkpoints + meta-network checkpoint and model checkpoint. However, I don't find any code related to resume prioritized board and meta-network.

How do you deal with it?

Thanks!

DS block in appendix is incorrect?

In appendix B and C, DS blocks used expansion rate as 4 and DepthwiseSeparableConv in timm.models.efficientnet_blocks was used in your code. However, in the original implementation of timm, DepthwiseSeparableConv has no parameters named expansion rate or something. Is that correct?

AttributeError: 'MetaMatchingNetwork' object has no attribute 'TTA'

Traceback (most recent call last):
File "tools/train.py", line 239, in
main()
File "tools/train.py", line 222, in main
local_rank=args.local_rank, logger=logger)
File "/Cream/tools/../lib/core/train.py", line 168, in validate
reduce_factor = cfg.TTA
AttributeError: 'MetaMatchingNetwork' object has no attribute 'TTA'

feature logits problem when num_classes is small

Hello,

Thanks for your great work! Your idea is brilliant!

The meta network use feature logits as input to compute the fitness between sub-networks. But when the num classes is small such as 2, the information loss in feature logits could be very severe, which will makes the meta network hard to learn.

So have you use the feature before last fully connection to learn the meta network? If so, can you share the result.

Best,

论文附录A中的backbone table有误?

附录A中的backbone与附录B的图不对应,某些地方的stride和input shape应该是写错了,烦请看下。此外,pooling 前面的一个 1x1 conv 是普通的卷积吗,代码中思路并不是这么写的。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.