changlin31 / bossnas Goto Github PK
View Code? Open in Web Editor NEW(ICCV 2021) BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
(ICCV 2021) BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
Hi, thanks for your excellent work~
It is inspiring and practical for improving the sub-net ranking correlations. But I have a few questions.
Unsupv. EB
is better than Supv. class
. Do you have a theoretical explanation about it?I have some doubts on how to search on HyTra with datasets different from Imagenet. Is it possible?
I tried to run the search on CIFAR10 but it gives me this error with a HyTracifar10 config file:
RuntimeError: Given groups=1, weight of size [256, 1024, 1, 1], expected input[256, 512, 4, 4] to have 1024 channels, but got 512 channels instead.
This is the config file I created for this purpose. Since the configs file were not so clear to me (my fault) I simply tried to mix the NATScifar10 config file and the HytraImagenet config file to obtain a HyTracifar10 version. The model and the dataset seem to be created/loaded correctly, I think there is a kind of mismatch on size.
Up to now, I'm trying to run only on CIFAR10 but my intention is to generalize the process on different datasets (not only the most famous ones). I would like to know if this generalization can be already obtained with your code (or with slight modifies) or if the work was supposed to run only on the main datasets.
import copy
base = 'base.py'
model = dict(
type='SiameseSupernetsHyTra',
pretrained=None,
base_momentum=0.99,
pre_conv=True,
backbone=dict(
type='SupernetHyTra',
),
start_block=0,
num_block=4,
neck=dict(
type='NonLinearNeckSimCLRProject',
in_channels=2048,
hid_channels=4096,
out_channels=256,
num_layers=2,
sync_bn=False,
with_bias=True,
with_last_bn=False,
with_avg_pool=True),
head=dict(type='LatentPredictHead',
size_average=True,
predictor=dict(type='NonLinearNeckSimCLR',
in_channels=256, hid_channels=4096,
out_channels=256, num_layers=2, sync_bn=False,
with_bias=True, with_last_bn=False, with_avg_pool=False)))
data_source_cfg = dict(type='NATSCifar10', root='../data/cifar/', return_label=False)
train_dataset_type = 'BYOLDataset'
test_dataset_type = 'StoragedBYOLDataset'
img_norm_cfg = dict(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201])
train_pipeline = [
dict(type='RandomCrop', size=32, padding=4),
dict(type='RandomHorizontalFlip'),
]
prefetch = False
if not prefetch:
train_pipeline.extend([dict(type='ToTensor'), dict(type='Normalize', **img_norm_cfg)])
train_pipeline1 = copy.deepcopy(train_pipeline)
train_pipeline2 = copy.deepcopy(train_pipeline)
test_pipeline1 = copy.deepcopy(train_pipeline1)
test_pipeline2 = copy.deepcopy(train_pipeline2)
data = dict(
imgs_per_gpu=256, # total 256*4(gpu)*4(interval)=4096
workers_per_gpu=2,
train=dict(
type=train_dataset_type,
data_source=dict(split='train', **data_source_cfg),
pipeline1=train_pipeline1, pipeline2=train_pipeline2),
val=dict(
type=test_dataset_type,
data_source=dict(split='test', **data_source_cfg),
pipeline1=test_pipeline1, pipeline2=test_pipeline2,),
test=dict(
type=test_dataset_type,
data_source=dict(split='test', **data_source_cfg),
pipeline1=test_pipeline1, pipeline2=test_pipeline2,))
optimizer = dict(type='LARS', lr=4.8, weight_decay=0.000001, momentum=0.9,
paramwise_options={
'(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay=0., lars_exclude=True),
'bias': dict(weight_decay=0., lars_exclude=True)})
use_fp16 = True
update_interval = 8
optimizer_config = dict(update_interval=update_interval, use_fp16=use_fp16)
lr_config = dict(
policy='CosineAnnealing',
min_lr=0.,
warmup='linear',
warmup_iters=1,
warmup_ratio=0.0001, # cannot be 0
warmup_by_epoch=True)
checkpoint_config = dict(interval=1)
total_epochs = 24
custom_hooks = [
dict(type='BYOLHook', end_momentum=1., update_interval=update_interval),
dict(type='RandomPathHook'),
dict(
type='ValBestPathHook',
dataset=data['val'],
bn_dataset=data['train'],
initial=True,
interval=2,
optimizer_cfg=optimizer,
lr_cfg=lr_config,
imgs_per_gpu=256,
workers_per_gpu=4,
epoch_per_stage=6,
resume_best_path='') # e.g. 'path_rank/bestpath_2.yml'
]
cudnn_benchmark = True
Hi,
Your approach is very impressive; I was wondering if you're planning to release the weights of the supernets you trained? (I'm specifically interested in the HyTra supernet)
Hi,
I have another question about MACs. Could you elaborate on how you calculate that? Thanks.
Best
Thanks for your great codes!
Do you provide the codes to re-train the model on the MobileNet? such as BossNAS-M1 and M2
I used the searching code for a small number of epochs, can you share where exactly is the best model architecture stored when any custom NAS is performed?
the pth files are saved in work_dir but im not sure where the corresponding architecture is stored so I can use a custom generated model together with these weights ?
after run code:
cd ranking_nats
python get_model_score_nats.py
I got:
kendall tau begin
BossNAS: KendalltauResult(correlation=-0.534180602248828, pvalue=0.0)
(-0.7180607093955225, 0.0)
SpearmanrResult(correlation=-0.7341493538551311, pvalue=0.0)
Hi, thanks for your great work!
I tried using your given searching code for training the supernet. But I did not figure out how to search the potential architectures from such a supernet?
I guess the validation hook serves as such functions, but I did not find the saved path information after training one epoch. Are there other files I need to explore or just waiting for more epochs to be trained?
Could you advise me about that, thanks in advance for your time and help!
Best,
Haoran
Hi, great work!
I have some question about the code and paper:
In section 3.3 of the paper which is about the searching phase, when calculating the evaluation loss in equation(5) and (6) the probability ensemble of the architecture population is from the online network, but in the code it's from the target network, which makes me confused.
Still in section 3.3, it is mentioned that the searching are with an evolutionary algorithm, I read the references[12] and [54] but still have no clue how the evolutionary algorithm is implemented in the code, to be specific, how the architecture population is evolved?
In the code of hytra_supernet.py, the stage depths are set to [4,3,2,2], is there a particular reason to set so? why not use [4,4,4,4] so that all possible pathes can be chosen?
Thanks a lot for your time and I'm looking forward to your reply!
Hi, very thanks for sharing your nice work. In the paper's formulation (1) and (6), all has λ_k. But it seems to be no explaination about them. Could you please point it out here.
Hello! I try to reproduce your model,but when I evaluate the pretrained model(BossNet-T0-80_8.pth),the ACC1 is too low! Did i miss something? Can you help me?
The run command as follows:
root@v-dev-11135821-66b7bdd9f5-l9rlv:/data/juicefs_hz_cv_v3/11135821/bak/BossNAS/retraining_hytra# python main.py --model bossnet_T0 --input-size 224 --batch-size 128 --eval --resume /data/juicefs_hz_cv_v3/11135821/bak/model/BossNet-T0-80_8.pth
Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=128, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/data/glusterfs_cv_04/public_data/imagenet/CLS-LOC/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=True, inat_category='name', input_size=224, local_rank=0, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='bossnet_T0', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='output/bossnet_T0-20210804-163815', patience_epochs=10, pin_mem=True, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='/data/juicefs_hz_cv_v3/11135821/bak/model/BossNet-T0-80_8.pth', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
Creating model: bossnet_T0
number of params: 38415960
Test: [ 0/261] eta: 0:39:36 loss: 1.9650 (1.9650) acc1: 68.2292 (68.2292) acc5: 90.1042 (90.1042) time: 9.1044 data: 4.8654 max mem: 5605
Test: [ 50/261] eta: 0:01:38 loss: 3.2916 (3.0472) acc1: 41.6667 (47.3039) acc5: 65.1042 (70.5372) time: 0.2928 data: 0.0004 max mem: 5605
Test: [100/261] eta: 0:01:01 loss: 2.9675 (3.1048) acc1: 46.8750 (45.7921) acc5: 69.2708 (69.9722) time: 0.2953 data: 0.0003 max mem: 5605
Test: [150/261] eta: 0:00:39 loss: 2.4230 (2.9457) acc1: 55.2083 (47.8960) acc5: 75.5208 (71.5370) time: 0.2989 data: 0.0003 max mem: 5605
Test: [200/261] eta: 0:00:20 loss: 2.6540 (2.9105) acc1: 48.4375 (48.1913) acc5: 68.7500 (71.3490) time: 0.3023 data: 0.0002 max mem: 5605
Test: [250/261] eta: 0:00:03 loss: 1.6506 (2.8344) acc1: 61.9792 (49.0310) acc5: 83.8542 (72.0431) time: 0.3036 data: 0.0003 max mem: 5605
Test: [260/261] eta: 0:00:00 loss: 1.6135 (2.8001) acc1: 65.6250 (49.5740) acc5: 86.4583 (72.5500) time: 0.3888 data: 0.0001 max mem: 5605
Test: Total time: 0:01:28 (0.3397 s / it)
Hi there, I am trying to search on NATS-Bench using CIFAR 10 dataset and encountered this error. Could you kindly help me with this?
Hi, appreciate it for your time.
I find an issue in the code of Hytra search phase. When searching for the second block, after the first evaluation the chosen best path of the second block will be appended after the best path of the first block, then the training process is conducted in a three block structure.
Detailed codes are as follows:
(val_hook.py)
if self.every_n_epochs(runner, block_inteval):
best_path = results[0][0]
best_path = [int(i) for i in list(best_path)]
if len(model.best_paths) == model.start_block + 1:
model.best_paths.pop()
model.best_paths.append(best_path)
(siamese_supernets_hytra.py )
if self.start_block > 0:
for i, best_path in enumerate(self.best_paths):
img_v1 = self.online_backbone(img_v1,
start_block=i,
forward_op=best_path,
block_op=True)[0]
img_v2 = self.online_backbone(img_v2,
start_block=i,
forward_op=best_path,
block_op=True)[0]
In other word, the searching is not continued afte a frozen best path of previous block, but with two, the best path of the current block chosen by each evaluation stage is also freezed and appended, it means the path of second block will appear twice during searching. I can't understand why doing so.
It will lead to an issue that if the downsampling is used in the freezed best path of the second block. For instance, suppose the spatial resolution has already reached the smallest scale 1/32 in the freezed previous best path of second block, when continuing searcing if the downsampling is occured again in the current path of second block, there will be an error of mismatch in shape. It makes me confused and we did encounter this problem in our implemetation.
I'm sorry if I haven't described the issue clearly. Thanks a lot for your time again and I'm looking forward to your reply.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.