Git Product home page Git Product logo

aw_nas's Introduction

aw_nas: A Modularized and Extensible NAS Framework

Maintained by NICS-EFC Lab (Tsinghua University) and Novauto Technology Co. Ltd. (Beijing China).

Introduction

Neural Architecture Search (NAS) has received extensive attention due to its capability to discover neural network architectures in an automated manner. aw_nas is a NAS framework with various NAS algorithms implemented in a modularized manner. Currently, aw_nas can be used to reproduce the results of many mainstream NAS algorithms, e.g., ENAS, DARTS, SNAS, FBNet, OFA, predictor-based NAS, etc. And we have applied NAS algorithms for various applications & scenarios with aw_nas, including NAS for classification, detection, text modeling, hardware fault tolerance, adversarial robustness, hardware inference efficiency, and so on.

Also, the hardware-related profiling and parsing interface is designed to be general and easily-usable. Along with the flow and interface, aw_nas provides the latency table and some correction model of multiple hardware. See Hardware related for more details.

Contributions are all welcome, including new NAS component implementation, new NAS applications, bug fixes, documentation, and so on.

Components of a NAS system

There are multiple actors that are working together in a NAS system, and they can be categorized into these components:

  • search space
  • controller
  • weights manager
  • evaluator
  • objective

The interface between these components is somehow well-defined. We use a class awnas.rollout.base.BaseRollout to represent the interface object between all these components. Usually, a search space defines one or more rollout types (a subclass of BaseRollout). For example, the basic cell-based search space cnn (class awnas.common.CNNSearchSpace) corresponds to two rollout types: discretediscrete rollouts that are used in RL-based, EVO-based controllers, etc. (class awnas.rollout.base.Rollout); differentiable differentiable rollouts that are used in gradient-based NAS (class awnas.rollout.base.DifferentiableRollout).

NAS framework

Here is a graphical illustration of the NAS flow and corresponding method calls. And here is a brief technical summary of aw_nas, including some reproducing results and descriptions on hardware cost prediction models. This technical summary is also available on arXiv (Github/ArXiv versions might slighly differ).

Install

Using a virtual python environment is encouraged. For example, with Anaconda, you could run conda create -n awnas python==3.7.3 pip first.

  • Supported python versions: 2.7, 3.6, 3.7
  • Supported Pytorch versions: >=1.0.0, <1.5.0 (Currently, some patches in DataParallel replication is not compatible after 1.5.0)

To install awnas, run pip install -r requirements.txt. If you do not want to install the detection extras (required for running search on detection datasets VOC/COCO), omit the ",det" extras during the installation (See the last line in the requirements file). Note that for RTX 3090, torch==1.2.0 in requirements.txt no longer works: using torch would lead to permanent stuck. Check the comments in requirements.cu110.txt.

Architecture plotting depends on the graphviz package, make sure graphiz is installed, e.g. on Ubuntu, you can run sudo apt-get install graphviz.

Usage

After installation, you can run awnas --help to see what sub-commands are available.

Output of an example run (version 0.3.dev3):

07/04 11:41:44 PM plugin              INFO: Check plugins under /home/foxfi/awnas/plugins
07/04 11:41:44 PM plugin              INFO: Loaded plugins:
Usage: awnas [OPTIONS] COMMAND [ARGS]...

  The awnas NAS framework command-line interface. Use `AWNAS_LOG_LEVEL`
  environment variable to modify the log level.

Options:
  --version             Show the version and exit.
  --local_rank INTEGER  the rank of this process  [default: -1]
  --help                Show this message and exit.

Commands:
  search                   Searching for architecture.
  mpsearch                 Multiprocess searching for architecture.
  random-sample            Random sample architectures.
  sample                   Sample architectures, pickle loading controller...
  eval-arch                Eval architecture from file.
  derive                   Derive architectures.
  mptrain                  Multiprocess final training of architecture.
  train                    Train an architecture.
  test                     Test a final-trained model.
  gen-sample-config        Dump the sample configuration.
  gen-final-sample-config  Dump the sample configuration for final training.
  registry                 Print registry information.

Prepare data

When running awnas program, it will assume the data of a dataset with name=<NAME> under AWNAS_DATA/<NAME>, in which AWNAS_DATA base directory is read from the environment variable AWNAS_DATA. If the environment variable is not specified, the default is AWNAS_HOME/data, in which AWNAS_HOME is an environment variable default to be ~/awnas.

  • Cifar-10/Cifar-100: No specific preparation needed.
  • PTB: bash scripts/get_data.sh ptb, the ptb data will be downloaded under ${DATA_BASE}/ptb directory. By default ${DATA_BASE} will be ~/awnas/data.
  • Tiny-ImageNet: bash scripts/get_data.sh tiny-imagenet, the tiny-imagenet data will be downloaded under ${DATA_BASE}/tiny-imagenet directory.
  • Detection datasets VOC/COCO: bash scripts/get_data.sh voc and bash scripts/get_data.sh coco

Run NAS search

ENAS Try running an ENAS [Pham et. al., ICML 2018] search (the results (including configuration backup, search log) in <TRAIN_DIR>):

awnas search examples/basic/enas.yaml --gpu 0 --save-every <SAVE_EVERY> --train-dir <TRAIN_DIR>

There are several sections in the configuration file that describe the configurations of different components in the NAS framework. For example, in example/basic/enas.yaml, different configuration sections are organized as follows:

  1. a cell-based CNN search space: The search space is an extended version from the 5-primitive micro search space in the original ENAS paper.
  2. cifar-10 dataset
  3. RL-learned controller with the embed_lstm RNN network
  4. shared weights based evaluator
  5. shared weights based weights manager: super net
  6. classification objective
  7. trainer: the orchestration of the overall NAS search flow

For a detailed breakup of the ENAS search configuration, please refer to the config notes.

DARTS Also, you can run an improved version of DARTS [Liu et. al., ICLR 2018] search by running:

awnas search examples/basic/darts.yaml --gpu 0 --save-every <SAVE_EVERY> --train-dir <TRAIN_DIR>

We provide a walk-through of the components and flow here. Note that this configuration is a little different from the original DARTS in that 1) entropy_coeff: 0.01: An entropy regularization coefficient of 0.01 is used, which encourage the op distribution to be more close to one-hot; 2) use_prob: false: Gumbel-softmax sampling is used, instead of directly using the probability.

Results Reproduction For the exact reproduction of the results of various popular methods, see the doc, configuration, and results under examples/mloss/.

Generate sample search config

To generate a sample configuration file for searching, try awnas gen-sample-config utility. For example, if you want a sample search configuration for searching on NAS-Bench-101, run

awnas gen-sample-config -r nasbench-101 -d image ./sample_nb101.yaml

Then, check the sample_nb101.yaml file, for each component type, all classes that declare to support the nasbench-101 rollout type would be listed in the file. Delete those you do not need, uncomment those you need, change the default settings, and then that config can be used to run NAS on NAS-Bench-101.

Derive & Eval-arch

The awnas derive utility sample architecture using the trained NAS components. If the --test flag is off (default), only the controller is loaded to sample rollouts; Otherwise, the weights manager and trainer are also loaded to test these rollouts, and the sampled genotypes will be sorted according to the performances in the output file.

An example run is to sample 10 genotypes, and save them into sampled_genotypes.yaml.

awnas derive search_cfg.yaml --load <checkpoint dir dumped during awnas search> -o sampled_genotypes.yaml -n 10 --test --gpu 0 --seed 123

Note that, the files "controller/evaluator/trainer" in the <TRAIN_DIR>/<EPOCH>/ folders contain the state dict of the components, and can be loaded (dumped every <SAVE_EVERY> epochs), while the final checkpoints "controller.pt/evaluator.pt" in the "<TRAIN_DIR>/final/" folder contains a whole pickle of the component object, and can not be directly loaded. If you forget to specificy --save-every cmdline arguments and do not get state-dict checkpoints, you could load the final checkpoint and then dump the needed state dict ckpt by cd <TRAIN_DIR>/final/; python -c "controller = torch.load('./controller.pt'); controller.save('controller')".

The awnas eval-arch utility evaluate genotypes using the trained NAS components. Given a yaml file containing a list of genotypes, one can evaluate these genotypes using the saved NAS checkpoint:

awnas eval-arch search_cfg.yaml sampled_genotypes.yaml --load <checkpoint dir dumped during awnas search> --gpu 0 --seed 123

Final Training of Cell-based Architecture

The awnas.final sub-package provides the final training functionality of cell-based architectures. examples/basic/final_templates/final_template.yaml is a commonly-used configuration template for final training architectures in an ENAS-like search space. To use that template, fill the ``final_model_cfg.genotypes` field with the genotype string derived from the search process. A genotype string example is

CNNGenotype(normal_0=[('dil_conv_3x3', 1, 2), ('skip_connect', 1, 2), ('sep_conv_3x3', 0, 3), ('sep_conv_3x3', 2, 3), ('skip_connect', 3, 4), ('sep_conv_3x3', 0, 4), ('sep_conv_5x5', 1, 5), ('sep_conv_5x5', 0, 5)], reduce_1=[('max_pool_3x3', 0, 2), ('dil_conv_5x5', 0, 2), ('avg_pool_3x3', 1, 3), ('avg_pool_3x3', 2, 3), ('sep_conv_5x5', 1, 4), ('avg_pool_3x3', 1, 4), ('sep_conv_3x3', 1, 5), ('dil_conv_5x5', 3, 5)], normal_0_concat=[2, 3, 4, 5], reduce_1_concat=[2, 3, 4, 5])

Plugin mechanism

aw_nas provides a simple plugin mechanism to support adding additional components or extending existing components outside the package. During initialization, all python scripts (files whose name ends with .py, except those starts with test_) under ~/awnas/plugins/ will be imported. Thus the components defined in these files will be registered automatically.

For example, to reproduce FBNet [Wu et. al., CVPR 2019], we add the implementation of FBNet primitive blocks in examples/plugins/fbnet/fbnet_plugin.py, and register these primitives using aw_nas.ops.register_primitive. To reuse most of the codes of DiffSuperNet implementation (used by DARTS [Liu et. al., ICLR 2018], SNAS [Xie et. al., ICLR 2018], etc.), we create a class WeightInitDiffSuperNet that inherits from DiffSuperNet, and the only difference is an additional weights initialization tailored for FBNet. Besides, an objective LatencyObjective is implemented, which calculates the loss as a weighted sum of the latency loss and the cross-entropy loss.

Under examples/plugins/robustness is the plugin modules for implementing Neural Architecture Search for Adversarial Robustness. For example, various objectives for adversarial robustness evaluation is defined. A new search space with varying node input degrees is defined, since dense connection an important property for adversarial robustness, whereas ENAS/DARTS search spaces constrain the node input degrees to be less or equal than 2. Several supernets (weights_manager) are implemented with adversarial examples cache to avoid re-generate adversarial samples for the same sub-network multiple times.

Besides definitions of new components, you can also use this mechanism to do monkey-patch tricks. For an example, there are various fixed-point plugins under examples/research/ftt-nas/fixed_point_plugins/. In these plugins, the primitives such as nn.Conv2d and nn.Linear is patched to be modules with quantization and fault injection functionalities.

Hardware-related: Hardware profiling and parsing

See Hardware related for the flow and example of hardware profiling and parsing.

Develop New Components

See Develop New Components for the development guide of new components.

Researches

This codebase is related to the following researches (*: Equal contribution; ^: Co-corresponding)

  • Wenshuo Li*, Xuefei Ning*, Guangjun Ge, Xiaoming Chen, Yu Wang, Huazhong Yang, FTT-NAS: Discovering Fault-Tolerant Neural Architecture, in ASP-DAC'20.
  • Xuefei Ning, Guangjun Ge, Wenshuo Li, Zhenhua Zhu, Yin Zheng, Xiaoming Chen, Zhen Gao, Yu Wang, and Huazhong Yang, FTT-NAS: Discovering Fault-Tolerant Neural Architecture, in https://arxiv.org/abs/2003.10375, in TODAES'21. instructions
  • Shulin Zeng*, Hanbo Sun*, Yu Xing, Xuefei Ning, Yi Shan, Xiaoming Chen, Yu Wang, Huazhong Yang, Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search, in ASP-DAC 2020. instructions
  • Xuefei Ning, Yin Zheng, Tianchen Zhao, Yu Wang, Huazhong Yang, A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS, in ECCV'20 and TPAMI'23, https://arxiv.org/abs/2004.01899. instructions
  • Xuefei Ning, Changcheng Tang, Wenshuo Li, Zixuan Zhou, Shuang Liang, Huazhong Yang, Yu Wang, Evaluating Efficient Performance Estimators of Neural Architectures, in NeurIPS'21, https://arxiv.org/abs/2008.03064. instructions
  • Xuefei Ning*, Junbo Zhao*, Wenshuo Li, Tianchen Zhao, Yin Zheng, Huazhong Yang, Yu Wang, Multi-shot NAS for Discovering Adversarially Robust Convolutional Neural Architectures at Targeted Capacities, in https://arxiv.org/abs/2012.11835, 2020. instructions
  • Tianchen Zhao*, Xuefei Ning*, Songyi Yang, Shuang Liang, Peng Lei, Jianfei Chen, Huazhong Yang, Yu Wang, BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures, in https://arxiv.org/abs/2011.10804, 2020. instructions
  • Hanbo Sun*, Chenyu Wang*, Zhenhua Zhu, Xuefei Ning^, Guohao Dai, Huazhong Yang, Yu Wang^, Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture, in DATE'22 and TCAD'23. instructions
  • Zixuan Zhou*, Xuefei Ning*, Yi Cai, Jiashu Han, Yiping Deng, Yuhan Dong, Huazhong Yang, Yu Wang, CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS, in ECCV'22. instructions
  • Xuefei Ning*, Zixuan Zhou*, Junbo Zhao, Tianchen Zhao, Yiping Deng, Changcheng Tang, Shuang Liang, Huazhong Yang, Yu Wang, TA-GATES: An Encoding Scheme for Neural Network Architectures, in NeurIPS'22. instructions
  • Junbo Zhao*, Xuefei Ning*, Enshu Liu, Binxin Ru, Zixuan Zhou, Tianchen Zhao, Chen Chen, Jiajin Zhang, Qingmin Liao, Yu Wang, Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start", in AAAI'22. instructions
  • Enshu Liu*, Xuefei Ning*, Zinan Lin*, Huazhong Yang, Yu Wang, OMS-DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models, in ICML'23.

See the sub-directories under examples/research/ for more details.

If you find this codebase helpful, you can cite the following research for now.

@misc{ning2020awnas,
      title={aw_nas: A Modularized and Extensible NAS framework},
      author={Xuefei Ning and Changcheng Tang and Wenshuo Li and Songyi Yang and Tianchen Zhao and Niansong Zhang and Tianyi Lu and Shuang Liang and Huazhong Yang and Yu Wang},
      year={2020},
      eprint={2012.10388},
      archivePrefix={arXiv},
      primaryClass={cs.NE}
}

References

  • FBNet Wu, Bichen, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. "Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734-10742. 2019.
  • ENAS Pham, Hieu, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. "Efficient Neural Architecture Search via Parameters Sharing." In International Conference on Machine Learning, pp. 4095-4104. 2018.
  • DARTS Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "DARTS: Differentiable Architecture Search." In International Conference on Learning Representations. 2018.
  • SNAS Xie, Sirui, Hehui Zheng, Chunxiao Liu, and Liang Lin. "SNAS: stochastic neural architecture search." In International Conference on Learning Representations. 2018.
  • OFA Cai, Han, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. "Once-for-All: Train One Network and Specialize it for Efficient Deployment." In International Conference on Learning Representations. 2019.

Unit Tests

coverage percentage (Version 0.4.0-dev1)

Run pytest -x ./tests to run the unit tests.

The tests of NAS-Bench-101 and NAS-Bench-201 is skipped by default, run pytest with AWNAS_TEST_NASBENCH env variable set to run those tests: AWNAS_TEST_NASBENCH=1 pytest -x ./tests/test_nasbench*. There are other tests that are skipped because they might be very slow (see the test outputs (marked as "s") and test cases under tests/).

Contact Us

  • Submit issues on Github for technical problems or improvement ideas, we are a small team, but we'll try our best to respond in time.
  • Contact us at [email protected] (Xuefei Ning) and [email protected] (Yu Wang) to discuss about NAS or Efficient DL.
  • Our team is recruiting revisiting students and engineers, if you're interested, check the information on our website.

aw_nas's People

Contributors

a-lincui avatar a-suozhang avatar dididoes avatar floyedshen avatar iltshade avatar patrick22414 avatar sl-zeng avatar tangchangcheng avatar walkerning avatar youcaijun98 avatar zhouzx17 avatar zzzdavid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aw_nas's Issues

Generate sample config error and FBNet support?

Hi, thank you for sharing this great work for NAS!

While running the following command for generating configs:
awnas gen-sample-config -r nasbench-101 -d image ./sample_nb101.yaml
I get error:
Error: Invalid value for "-r" / "--rollout-type": invalid choice: nasbench-101. (choose from discrete, differentiable, mutation, dense_discrete, dense_mutation, ofa, ssd_ofa, compare, general, layer2, layer2-differentiable, macro-stagewise, macro-stagewise-diff, macro-sink-connect-diff, micro-dense, micro-dense-diff)

Seems like nasbench-101 is not supported as the -r argument, and I am wondering what would be the way to generate configs for other NAS approaches like nasbench101 and FBNet?
Thank you!

Missing data dumping code for NAS-Bench-201 in gates.

Dear authors,
Thanks for this nice work.
I try to run the experiments on NAS benchmarks of GATES. However, I found that there was no data dumping code for NAS-Bench-201 though I have successfully repeated the experiments on NAS-Bench-201 for GATES. Specifically, how to generate the PKL file 'nasbench201.pkl' and 'nasbench201_valid.pkl' when running the scripts/nasbench/train_nasbench201_pkl.py?
Look forward to your reply.
Shun Lu

The program is stuck in an endless loop, when run awnas search examples/nasbench/nasbench-101_gates_sa.yaml

When I run "awnas search examples/nasbench/nasbench-101_gates_sa.yaml --gpu 0 --save-every 10 --train-dir /public/data1/users/ziyechen/awnas/logs/nasbench-101_gates_sa", the program is stuck in an endless loop.

I find the program is stuck in the 633 line of https://github.com/walkerning/aw_nas/blob/master/aw_nas/btcs/nasbench_101.py
try:
ss.nasbench._check_spec(new_rollout.genotype)
except api.OutOfDomainError:
continue
I print the mutated genotype, and find many 'none' operations.

I guess the mutation of operation in the 'else' clause may be wrong, since it may change the old operation with the 'none' opearion. And I think we should change 'new_ops = np.random.randint(0, ss.num_op_choices, size=1)[0]' to 'new_ops = np.random.randint(0, ss.num_op_choices-1, size=1)[0]', since the last operation is the 'none' operation.

image

Similar DAG encoding scheme to an existing paper

Hi! The presented DAG encoding scheme is very similar to the asynchronous message passing proposed in our existing paper "D-VAE: A Variational Autoencoder for Directed Acyclic Graphs", published in NeurIPS-2019. Unfortunately, it is not cited or discussed. Could you have a check?

Best,
Muhan

Should we prevent over-regularization?

As in every one-shot parameter training step, only a subset of parameters are active, especially when mepa_sample_size is small. We by default apply weight decay to all super net's parameters in every training step, is this an "over-regularization" or a desired behavior (which i will refer to "auto-regularization"). When some parameters are not active in any of the sampled architecture, maybe they should not be regularized, at least in the very begining of the training. As this might cause this unsampled path to be under trained, and the architecture that is sampled more is trained even better. This could lead to unsufficient exploration maybe?

However, when the controller is somehow well trained, the less sampled path means it just does not work well in the architecture, and thus the less training and over regularizaiton these paths get is an "auto-regularization" of this super network. (But do we really need this auto-regularization in this super network, as the only usage of the supernetwork is to be an performance indicator of it sub networks.

nasbench.api.OutOfDomainError: unsupported op none

When I run awnas search examples/nasbench/nasbench-101_sa.yaml --gpu 0 --save-every 10 --train-dir /public/data1/users/ziyechen/awnas/logs/nasbench-101_sa, an error occurs:

Traceback (most recent call last):
File "/public/data1/users/ziyechen/.conda/envs/aw_nas/bin/awnas", line 33, in
sys.exit(load_entry_point('aw-nas', 'console_scripts', 'awnas')())
File "/public/data1/users/ziyechen/.conda/envs/aw_nas/lib/python3.7/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/public/data1/users/ziyechen/.conda/envs/aw_nas/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/public/data1/users/ziyechen/.conda/envs/aw_nas/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/public/data1/users/ziyechen/.conda/envs/aw_nas/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/public/data1/users/ziyechen/.conda/envs/aw_nas/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/public/data1/users/ziyechen/aw_nas-master/aw_nas/main.py", line 235, in search
trainer.train()
File "/public/data1/users/ziyechen/aw_nas-master/aw_nas/trainer/simple.py", line 293, in train
finished_e_steps, finished_c_steps)
File "/public/data1/users/ziyechen/aw_nas-master/aw_nas/trainer/simple.py", line 204, in _controller_update
step_loss=step_loss))
File "/public/data1/users/ziyechen/aw_nas-master/aw_nas/btcs/nasbench_101.py", line 706, in evaluate_rollouts
query_res = rollout.search_space.nasbench.query(rollout.genotype)
File "/public/data1/users/ziyechen/nasbench/nasbench/api.py", line 237, in query
fixed_stat, computed_stat = self.get_metrics_from_spec(model_spec)
File "/public/data1/users/ziyechen/nasbench/nasbench/api.py", line 364, in get_metrics_from_spec
self._check_spec(model_spec)
File "/public/data1/users/ziyechen/nasbench/nasbench/api.py", line 391, in _check_spec
% (op, self.config['available_ops']))
nasbench.api.OutOfDomainError: unsupported op none (available ops = ['conv3x3-bn-relu', 'conv1x1-bn-relu', 'maxpool3x3'])

RNN search space

  • trainer/controller/rollout code can be shared.

Other things need new implementation..

  • search space definition.
  • dataset
  • rnn_super_net and diff_rnn_super_net implementation...

Support multi-GPU (data parallelism) in `weights_manager`

To search on ImageNet, this seems useful. All the interface can be left unchanged, except when a weights manager init, it should call DataParallel on itself. And when assemble the candidate network, it should also pass the parallelized weights manager to candidate network too.

NAS + performance predictor implementation?

  • Interface classes: Population, Model Record, Mutation, Mutation Rollout
  • population-based controllers
    • random mutation sampler
    • RL learned mutation sampler
  • morphism-based weights manager
    • a simple one for current search space (not morphism, just keep the weights for the edges that are not changed.
  • tune evaluator
    • a naive version with no training curve extrapolation prediction incorporated.
    • performance prediction
  • Async trainer
    • multiprocessing dispatcher
    • ray dispatcher
    • error handling

For V0.4: Code Merge, clean up and documentation

  • clean up examples organization.
  • clean up unused codes, such as trainer/archnetwork_trainer.py, evaluator/bftune.py, controller/compare.py, controller/archnetwork_controller.py...
  • README
  • Hardware related profiling & parsing flow. (flow description and example)
  • Add a Dockerfile seems unnecessary for a Python package with clear dependencies
  • more primer doc:
    • cfg breakup of ENAS/DARTS
    • orchestration pic, including method calls
    • add DARTS instruction message: python file location, method explain
    • the differences of the saved ckpt, what should be used in derive...
  • RNN example configs, search/final

Research docs

  • README/configs/scripts for FTT-NAS
  • README/configs/scripts for search space profiling
  • README/configs/scripts for GATES
  • README/configs/scripts for Surgery

More RL agents: A2C/PPO

  • PPO: no value critic loss included now
  • include value critic? Before tuning the RL agent, should identify whether the learning of the controller is the problem..... Maybe this is not the key problem of current NAS; The correlation/efficency of the evaluator, a more flexible/better search space, more objectives might be more important.
  • DDPG for continuous/pseudo-continuous search space decisions

Performance issues to profile

  • About sub_named_parameters: From a simple profiling run, named_parameters actually take's a significant amount of time, but I'm not sure it's due to the sub_named_parameters calculation. Profile what performance difference the candidate_cached_named_parameters and candidate_member_mask switch bring.
  • Cache can be done at lower level's of sub_named_parameters, can run profiling to see use a weakref.WeakValueDictionary to cache op_type to named_parameters help with the performance. I think we're good as this is certainly not the bottleneck
  • Let's see if del candidate_net after use can help reduce the memory significantly.

Ray dependency check

It's been a long time since I last use the ray dispatcher. Should test whether it still works, maybe add a unit test.

Hardware: Profiling Network Generation Problem

Description

In function assemble_profiling_nets:

def assemble_profiling_nets(profiling_primitives,

The first conv layer is written fixed.

But, profiling_primitives also contains a first conv layer, such as:

{'prim_type': 'conv_3x3', 'spatial_size': 224, 'C': 3, 'C_out': 41, 'stride': 2, 'affine': True}

this layer would not be added to geno, thus causing infinity loop at:

because the input first conv layer will never be put into geno in the following net generations.

Adversarial Supernet training

Dear author,

Thanks a lot for developing such a good framework. I am interested in adversarial robustness. Could you show me how to implement adversarial supernet training by using the plugins? Thanks a lot for your time.

Best wishes,
Jia

About FTT-NAS GPU search problem...

Hello author, after I tried to complete all the environment installation and configuration according to your requirements, I first used the CPU to search for the structure and found that ENAS and FTT-NAS are no problems, but when I use GPU to search, ENAS is no problem. But FTT-NAS has a problem, I don’t know how to solve this problem now...
....................................................
The following is an error message......
...................................................
fault_injection.py", line 244, in get_reward perfs = self.get_perfs(inputs, outputs, targets, cand_net)
fault_injection.py", line 254, in get_perfs outputs_f = cand_net.forward_one_step_callback(inputs, callback=self.inject) super_net.py", line 175, in forward_one_step_callback callback(context.last_state, context)
fault_injection.py", line 337, in inject context.last_state = self.injector.inject(state, n_mac=n_mac)
fault_injection.py", line 180, in inject return eval("self.inject_" + self.mode)(out, **kwargs)
fault_injection.py", line 159, in inject_fixed size=size_)])
prod() received an invalid combination of arguments - got (out=NoneType, axis=NoneType, )

Dataloader stuck

Occasionally, the data loader in the search process will stuck at some point... Usually the first time when the controller queue is used. This might be related to this issue: pytorch/pytorch/issues/1355.
pytorch/pytorch/issues/1355#issuecomment-308587289 said this issue might be related to shm running out. But there are 32G shm configured, and the actual usage is never close to that.
Try adding some swap space to avoid data handling thread being killed due to running out of memory (not work).

Seems this might be also due to calling iter on the data loader too early, and not used it for a long while.

Regrading TA-GATES dataset

Hello,

The pre-processed data you provided seems to have a specific representation.
Can you provide preprocessing code? or can you give how did you make it?

Thank you!

I seem to found a small bug in your FTT-NAS project...

First of all, thank you for taking the time to modify your FTT-NAS guidance document.
After you modify the FTT-NAS README, I proceeded step by step according to your instructions, but in the end there was an error like the previous question, which is stuck at
aw_nas/aw_nas/objective/fault_injection.py 159 line np.random.randint(0, 2 * _n_err,size=size_)

I think the problem should appear aw_nas/aw_nas/objective/fault_injection.py on line 152:
size_ = fault_ind.sum().cpu().data

The size parameter in np.random.randint does not seem to support Tensor type variables,It only supports int or tuple... So I tried the following to modify your code:
size_ = fault_ind.sum().cpu().data --> size_ = fault_ind.sum().cpu().numpy()

After my attempts to modify, this BUG seems to disappear, and now the project can work normally.
So, if you have time, please confirm whether this bug appears when FTT-NAS uses GPU to search, and whether my solution is appropriate, thank you very much!

After you revised the project last time, I tried to run FTT-NAS search. It was normal at first, but after 10epochs were found, an error was reported when running the test() method...

Hi, helpful author, we meet again...Although I don’t want to follow a questioning method...
The error message is as follows:
image
The error seems to stop in your eval_queue method. I compared this method before and after your modification. The method before and after the modification works normally, but this problem occurs after the modification.
If you have time, you can confirm whether this method really has a little problem as I described, thank you very much.
Maybe this is my own question...

question about supernet training.

Hi, thanks for your great job! When I have set up the environment and run " bash run_cfg.sh ../../plugins/robustness/cfgs/stagewise_predbased/search-multishoteval_stagewisess_fgsm_gates_2000M.yaml"
, the error is as follows.

image

How can I solve it? Best wishes!

How to tune each rollout and how to use the Pareto-based EA

Hi, I tried to set: "inner_controller_type: pareto-evo" and "evaluator_type: tune" for robust predictor-based search. But it seems multishot-robnas does not support these. Could you elaborate how to tune each rollout and how to use the Pareto-based EA? Thank you so much.

Question about the supernet training?

Hi,thanks for your great work! I setup awnas and the environment.
When I prepare the environment, I use "ln -s readlink -f plugins/robustness ${HOME}/awnas/plugins".
However, the error is as follows

"aw_nas.utils.registry.RegistryError: No registry item dense_rob available in registry search_space."

How can I solve the problem?
Best wishes!

Differential relaxation of arch

Description

Implement biasd-path reparametrization instead of learning-signal based optimization for learning the controller.

Differentiable relaxation of the sampling provide a more controllable way of backward the gradients through the discrete arch r.v.s. As the 'hard/soft' level can control the bias-variance trade-off of the learning process.

Details:

  • Need a new DifferentiableSuperNet weights manager and a new DifferentiableController controller;
    • In the DifferentiableController, the sampling probability of the node/op on each edge are modeled as global parameters. At first, we can the sample the operation only.
    • The conditional dependency implied by RNN-based controller is principlely prefered, as the architecture decisions/actions should be dependent, especially the choices of input nodes for one step. And, I suppose that with independent node/op decision for every step, the search will be trapped more quickly into a local minimum, and does not explore very well. But how can we sample all the architecture parameters, when there are so many discrete decisions in the whole sampling process. Or we can use a network similar to rl controller networks to sample the sampled path, and use the global parameters as the learnable prior of every op/edges, when there is no sampling for that edge, just use the prior op distribution on that edge...
  • The rollout object passed through weights manager and controller is also different from the current Rollout, as the arch representation is different now... So rollout must have their subclasses too... The weights manager (the consumer/assembler) and the controller (the producer/sampler) are generally not agnostic to the rollout type, so it's reasonable to add an interface to specificy which type of rollout a controller produce and a weights manager can take. The main script can be responsible to check if this rollout interface match. The handling of DifferentiableRollout in trainer is different too... e.g. mepa trainer should call set_perf with in-graph loss tensor when using differentiable rollout (eval should pass self._criterion instead of _ce_loss_mean in), but call set_perf with acc or detached loss when using DiscreteRollout...
  • Reuse some of the controller network code, add supports for sample and return actions as one-hot sample (could be a "soft" relaxed one, e.g. samples from a gumbel-softmax as a relaxation for categorical samples.) Cannot reuse the code... As the differential relaxation need sample for every op and edges...

Refactor evolutionary controller

Currently, the evolutionary controller for each search space is implemented separately! This is not elegant since there are multiple copies of the same logic (e.g., tournament selection in controller.sample, killing mechanism in controller.step, and population save&load utility). Let us change evolution to use a single controller (Implement RegularizedEvoController), and implement mutate method in Rollout class definitions.

Also, implement ParetoEvoController for pareto evolutionary search for multiple objectives!

The ModelRecord class should be changed into a search-space-anogstic one too (Seems okay now? need some test). Current population requires template final training cfg to work, actually it can be unnecessary. Mutation rollout should proxy method calls and attribute lookup to a mutated rollout. But to enable the access to mutation and the parent rollout, it should also provide several special calls/attributes. These info should be put into ModelRecord by the evaluator actually, and also managed by Population. The controller does not need to access the ModelRecord.

Minor issues running hardware latency profiling

Hi, I discovered a few small issues while running OFA latency profiling with mobilenet v2 block.

MobileNetV2 Block argument list

lambda expansion, C, C_out, stride, kernel_size, affine=True: MobileNetV2Block(expansion, C, C_out, stride, kernel_size, affine=affine))

this line causes error:

AssertionError: The passed parameters are different from the formal parameter list of primitive type `mobilenet_v2_block`, expected odict_keys(['expansion', 'C', 'C_out', 'stride', 'kernel_size', 'affine']), got ['C', 'C_out', 'stride', 'affine', 'kernel_size', 'activation', 'use_se', 'expansion']

another is a typo:

https://github.com/walkerning/aw_nas/blob/master/aw_nas/hardware/ofa_obj.py#L82

might be

use_ses = use_ses or [
            None,
        ] * len(strides)

otherwise it causes problem:

TypeError: 'NoneType' object is not subscriptable

A hardware friendly framework

  • latency profiling flow: 正在开发
  • fixed-point arithmetic: 之前用的 patch的方式实现的 patch plugin
  • multi-precision fixed-point search 或者pruning search space: 需不需要参考一下APQ,看看有什么好做的

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.