carpedm20 / enas-pytorch Goto Github PK

View Code? Open in Web Editor NEW

2.7K 108.0 494.0 13.2 MB

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

License: Apache License 2.0

Python 98.62% Shell 1.38%

pytorch neural-architecture-search google-brain

enas-pytorch's Introduction

Efficient Neural Architecture Search (ENAS) in PyTorch

PyTorch implementation of Efficient Neural Architecture Search via Parameters Sharing.

ENAS reduce the computational requirement (GPU-hours) of Neural Architecture Search (NAS) by 1000x via parameter sharing between models that are subgraphs within a large computational graph. SOTA on Penn Treebank language modeling.

**[Caveat] Use official code from the authors: link**

Prerequisites

Python 3.6+
PyTorch==0.3.1
tqdm, scipy, imageio, graphviz, tensorboardX

Usage

Install prerequisites with:

conda install graphviz
pip install -r requirements.txt

To train ENAS to discover a recurrent cell for RNN:

python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 \
               --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001

python main.py --network_type rnn --dataset wikitext

To train ENAS to discover CNN architecture (in progress):

python main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True \
               --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1

or you can use your own dataset by placing images like:

data
├── YOUR_TEXT_DATASET
│   ├── test.txt
│   ├── train.txt
│   └── valid.txt
├── YOUR_IMAGE_DATASET
│   ├── test
│   │   ├── xxx.jpg (name doesn't matter)
│   │   ├── yyy.jpg (name doesn't matter)
│   │   └── ...
│   ├── train
│   │   ├── xxx.jpg
│   │   └── ...
│   └── valid
│       ├── xxx.jpg
│       └── ...
├── image.py
└── text.py

To generate gif image of generated samples:

python generate_gif.py --model_name=ptb_2018-02-15_11-20-02 --output=sample.gif

More configurations can be found here.

Results

Efficient Neural Architecture Search (ENAS) is composed of two sets of learnable parameters, controller LSTM θ and the shared parameters ω. These two parameters are alternatively trained and only trained controller is used to derive novel architectures.

1. Discovering Recurrent Cells

Controller LSTM decide 1) what activation function to use and 2) which previous node to connect.

The RNN cell ENAS discovered for Penn Treebank and WikiText-2 dataset:

Best discovered ENAS cell for Penn Treebank at epoch 27:

You can see the details of training (e.g. reward, entropy, loss) with:

tensorboard --logdir=logs --port=6006

2. Discovering Convolutional Neural Networks

Controller LSTM samples 1) what computation operation to use and 2) which previous node to connect.

The CNN network ENAS discovered for CIFAR-10 dataset:

(in progress)

3. Designing Convolutional Cells

(in progress)

Reference

Author

Taehoon Kim / @carpedm20

enas-pytorch's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 oppa3109 chris-chris jaehyunahn73 rockt sunilpentapati hal2001 codeaudit xpertasks daitomanabe remper kurnianggoro trungtrinh44 siyeong-lee bullud tmadl shyamalschandra ryanmaynard barseghyanartur shaunstanislauslau liviust lan1991xu amoliu shujian2015 winwinjjiang dubledpark bo-rc jeffhsu3 lazarusgoh snlee81 jdily onisimchukv ml-ai-nlp-ir stoplime zhaomeng1028 ml-lab cppowboy jeonsw hedgefair shiyongde dukebw zhiqicheng seonho fireae ngchc wn9081 wronskia junkwhinger hainow liuguoyou wanjinchang zxydi1992 mybian nikolayvoronchikhin qasimhbti pavanjuturu gridl dthboyd ankurpandey42 gregoire22enpc miguelperalvo beaubol salazar-ai-associates radhikari54 vhcg77 srikumarant hbcbh1999 sbarman25 quantumgame ds-iitr ihsvinc niksonx nickmccarthy101 jinxin0924 zhangxujinsh dl-deeplearning tony32769 cvtower 4nonymou5 why702 ashaw596 0xqq jacksianyun junchenjin bityangke kidkid168 guxiaopihai codegank hsiaoq sahilarora93 jprothero denethor1997 cyaai lukw00heck whitepoplar022 quanpn90 sanwushuosi soupersoul huihuizhao ituco

enas-pytorch's Issues

network doesn't seem to be saved

I'm getting this error
File "C:\Users\Neda\Anaconda3\lib\site-packages\pygraphviz\agraph.py", line 1364, in _run_prog raise IOError(b"".join(errors).decode(self.encoding)) OSError: Format: "dot" not recognized. Use one of:

which is relate to the File "D:\Neda\Pytorch\NAS\ENAS-pytorch-master_carpedm20_original\utils.py", line 85, in draw_network graph.layout(prog='dot')

So, it cannot save the network, and networks folder in logs is empty. Could please someone point me to the right direction?

Has the cnn module could been run? It's seems that author has gived up this project

Retraining from scratch yields worse results

Hello,

As written in the paper (2.2 Deriving Architectures) I tried to re-retrain from scratch the best derived model, but it surprisingly gives worse result when I retrain it from scratch than if I would keep the original (shared) weights.
I expected training the best model (dag) from scratch to be faster and eventually have a better perplexity, but it's not the case.

I do the following:

Launch ENAS with the --load_path argument, which loads a previous run, and the --mode test, which will call a custom test method inside the trainer class
(In the test method) I reset the shared weights with self.shared.reset_parameters()
I derive the best model (dag)
Then I train this model from scratch, iterating over the train set for N epochs (like in the train_shared method)

The following picture shows the loss and ppl during the "normal" training (first slope) and after reseting the shared weights (second slope). The second slope only trains the same best model (dag).

Has anyone any idea about why resetting the shared weight and re-training from scratch is so bad?

Error when i tried to train rnn

when i tried to train rnn with the command
python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035
--shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001
one error occures at util.detach(h), it seems that when h is a tensor, this function will be infinitely recursive, causing an error.
Can you tell me why this is happening?

errors when running

When I run this code by : python main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True
--controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1

I get some errors. Would you please tell me what changes I shoud do to this code before I run it? Looking forward to your reply.

Why each activation function in each node has different id?

@dukebw Hi, thanks for your code. I have a detailed problem:
In your code, you build embedding :

        num_total_tokens = sum(self.num_tokens)

        self.encoder = torch.nn.Embedding(num_total_tokens,
                                          args.controller_hid)

This code shows that each previous node and activation function in different nodes have different ids. I am wondering about it. It would be great if you could help check this.

Errors When running

@dukebw ,Hi,thanks for your work,when I run this code I meet some problems.

When I run it using the run.sh by default ,I get
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 30, in main
trnr = trainer.Trainer(args, dataset)
File "/home/axi/ENAS-pytorch-master-3/trainer.py", line 160, in init
self.build_model()
File "/home/axi/ENAS-pytorch-master-3/trainer.py", line 192, in build_model
self.shared.cuda()
File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in cuda
return self._apply(lambda t: t.cuda(device_id))
File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 124, in _apply
param.data = fn(param.data)
File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in
return self._apply(lambda t: t.cuda(device_id))
File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 66, in cuda
return new_type(self.size()).copy(self, async)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu:66

While I have 3　GPUS,10 G memory.
2. When I run it using : python main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1,I get:
2018-04-29 19:01:57,957:INFO::[] Make directories : logs/cifar_2018-04-29_19-01-57
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 26, in main
dataset = data.image.Image(args.data_path)
File "/home/axi/ENAS-pytorch-master-2/data/image.py", line 8, in init
if args.datset == 'cifar10':
AttributeError: 'str' object has no attribute 'datset'
and after I make some changes,I get other errors such as:
2018-04-29 18:49:24,745:INFO::[] Make directories : logs/cifar10_2018-04-29_18-49-24
Files already downloaded and verified
2018-04-29 18:49:27,464:INFO::regularizing:
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 30, in main
trnr = trainer.Trainer(args, dataset)
File "/home/axi/ENAS-pytorch-master-1/trainer.py", line 139, in init
self.cuda)
File "/home/axi/ENAS-pytorch-master-1/utils.py", line 148, in batchify
data = data.narrow(0, 0, nbatch * bsz)
AttributeError: 'DataLoader' object has no attribute 'narrow'
or
2018-04-29 18:22:50,192:INFO::[*] Make directories : logs/cifar10_2018-04-29_18-22-50
Files already downloaded and verified
2018-04-29 18:22:55,041:INFO::regularizing:
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 30, in main
trnr = trainer.Trainer(args, dataset)
File "/home/axi/ENAS-pytorch-master-1/trainer.py", line 139, in init
self.cuda)
File "/home/axi/ENAS-pytorch-master-1/utils.py", line 147, in batchify
nbatch = data.size // bsz
AttributeError: 'DataLoader' object has no attribute 'size'

Would you please tell me what changes I should make before I run the code.Thanks for you response.

Can you add an option to use or not use the GPU when calling PyTorch?

Otherwise, I will do it myself and check it in! Thanks for your time!

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

When training using any of the example configurations from the documentation I get the error:
"RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED"

Reproducing
For example running:
python main.py --network_type rnn --dataset wikitext

My system configuration
CUDA 10.1
Python 3.7.3
PyTorch 1.1.0
Arch Linux
GPU: RTX 2070

Other PyTorch applications work just fine.

Full output (from pipenv environment):

% python main.py --network_type rnn --dataset wikitext                                                                    oliver@oliver
2019-06-14 16:30:31,585:INFO::[*] Make directories : logs/wikitext_2019-06-14_16-30-31
2019-06-14 16:30:49,909:INFO::regularizing:
2019-06-14 16:30:54,743:INFO::# of parameters: 169,315,278
2019-06-14 16:30:54,834:INFO::[*] MODEL dir: logs/wikitext_2019-06-14_16-30-31
2019-06-14 16:30:54,834:INFO::[*] PARAM path: logs/wikitext_2019-06-14_16-30-31/params.json
Traceback (most recent call last):
  File "main.py", line 54, in <module>
    main(args)
  File "main.py", line 34, in main
    trnr.train()
  File "/home/oliver/code/ENAS-pytorch/trainer.py", line 222, in train
    self.train_shared(dag=dag)
  File "/home/oliver/code/ENAS-pytorch/trainer.py", line 305, in train_shared
    dags)
  File "/home/oliver/code/ENAS-pytorch/trainer.py", line 251, in get_loss
    output, hidden, extra_out = self.shared(inputs, dag, hidden=hidden)
  File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/oliver/code/ENAS-pytorch/models/shared_rnn.py", line 235, in forward
    logit, hidden = self.cell(x_t, hidden, dag)
  File "/home/oliver/code/ENAS-pytorch/models/shared_rnn.py", line 354, in cell
    output = self.batch_norm(output)
  File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward
    exponential_average_factor, self.eps)
  File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/functional.py", line 1697, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Debugging
Debugging the parameters passed to batch_norm I found that the following parameters are all on cuda-device: input, weight, bias, running_mean, running_var. Which is all reasonable.
The remaining vars are reasonable as well.

Will the code repository be maintained？

gpu_nums> 1

If want to run on multi gpu, when self.shared is forwarding , should use Modulelist's data (like self._w_h(which is a type of ModuleList)).
Otherwise will raise an error :( RuntimeError: tensors are on different GPUs) , beacuse when self.forward(xx), the parameter are used stored in list data structure, and would not replicate to another gpu.

RuntimeError: saved_forINTERNAL ASSERT FAILED at

I encountered this strange error. Here is the output, thank you.

Traceback (most recent call last):
File "D:/xiangmu/ENAS-pytorch-master/main.py", line 56, in
main(args)
File "D:/xiangmu/ENAS-pytorch-master/main.py", line 35, in main
trnr.train()
File "D:\xiangmu\ENAS-pytorch-master\trainer.py", line 223, in train
self.train_shared(dag=dag)
File "D:\xiangmu\ENAS-pytorch-master\trainer.py", line 314, in train_shared
loss.backward()
File "C:\Users\sunhaonan.conda\envs\enas\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\sunhaonan.conda\envs\enas\lib\site-packages\torch\autograd_init_.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: saved_forINTERNAL ASSERT FAILED at "..\torch\csrc\autograd\saved_variable.cpp":133, please report a bug to PyTorch. No grad_fn for non-leaf saved tensor

The embedding/encoder may not be working the way you think it is

In issues 33 (#33) I wondered why the Embedding encoder was as large as it was. The response from @carpedm20 was:

You can assume the same activation in a different place to have the same semantics (embedding) or not. I assumed it's different because activation in different locations may have a separate role.

That makes sense. However, I've been running through this section of the code (Controller.sample()) in the debugger trying to understand what's going on... when mode is 0 (the case when activation func is being selected) then sum(self.num_tokens[:mode]) is 0. So the line:

inputs = utils.get_variable(
                action[:, 0] + sum(self.num_tokens[:mode]),
                requires_grad=False)

is always just the action[:,0] component which is a value from 0 to 3 (one of 4 activation functions in the activation function list) since sum(self.num_tokens[:0] is 0.

And when mode is 1, the sum(self.num_tokens[:mode]) is always 4 - so not sure how you can get anything higher than len(args.shared_rnn_activations)+self.args.num_blocks here. mode can only take on values of 0 or 1. Either I'm missing something or maybe it's a bug?

with self.args.num_blocks = 6, I see self.num_tokens is: [4, 1, 4, 2, 4, 3, 4, 4, 4, 5, 4, 6, 4] and sum(self.num_tokens) = 49

To actually use all of those 49 entries of the embedding table I suspect that what we need for inputs is something along the lines of:

    inputs = utils.get_variable(
                action[:, 0] + sum(self.num_tokens[:block_idx]) ,
                requires_grad=False)

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I encountered this strange error. Here is the output

$ python main.py 
2020-10-17 06:19:37,971:INFO::[*] Make directories : logs/ptb_2020-10-17_06-19-37
2020-10-17 06:19:45,686:INFO::regularizing:
2020-10-17 06:19:56,858:INFO::# of parameters: 146,014,000
2020-10-17 06:19:57,208:INFO::[*] MODEL dir: logs/ptb_2020-10-17_06-19-37
2020-10-17 06:19:57,208:INFO::[*] PARAM path: logs/ptb_2020-10-17_06-19-37/params.json
/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
2020-10-17 06:19:57,872:INFO::max hidden 3.5992980003356934
2020-10-17 06:19:58,043:INFO::abs max grad 0
/home/ubuntu/ENAS-pytorch/trainer.py:323: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  self.args.shared_grad_clip)
2020-10-17 06:19:58,879:INFO::abs max grad 0.05615033581852913
2020-10-17 06:19:59,448:INFO::max hidden 9.425106048583984
2020-10-17 06:19:59,774:INFO::abs max grad 0.0575626865029335
2020-10-17 06:20:01,810:INFO::abs max grad 0.12187317758798599
2020-10-17 06:20:03,771:INFO::abs max grad 0.5459710359573364
2020-10-17 06:20:07,741:INFO::max hidden 15.914213180541992
2020-10-17 06:20:17,945:INFO::abs max grad 0.8663018941879272
2020-10-17 06:20:41,948:INFO::| epoch   0 | lr 20.00 | raw loss 8.39 | loss 8.39 | ppl  4402.23
2020-10-17 06:21:21,796:INFO::| epoch   0 | lr 20.00 | raw loss 7.20 | loss 7.20 | ppl  1343.73
2020-10-17 06:21:26,601:INFO::max hidden 20.534639358520508
2020-10-17 06:22:06,855:INFO::| epoch   0 | lr 20.00 | raw loss 7.00 | loss 7.00 | ppl  1093.28
2020-10-17 06:22:07,417:INFO::max hidden 22.71334457397461
2020-10-17 06:22:19,596:INFO::clipped 1 hidden states in one forward pass. max clipped hidden state norm: 25.37160301208496
Traceback (most recent call last):
  File "main.py", line 54, in <module>
    main(args)
  File "main.py", line 34, in main
    trnr.train()
  File "/home/ubuntu/ENAS-pytorch/trainer.py", line 222, in train
    self.train_shared(dag=dag)
  File "/home/ubuntu/ENAS-pytorch/trainer.py", line 313, in train_shared
    loss.backward()
  File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 1000]], which is output 0 of AddBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Hidden State of RNN (not averaged ?)

Hello guys ,
Thank you very much for this work 👍 ! I have a quick question about the code :
Why in the cell method you output output, h[self.args.num_blocks - 1]. It looks like the h is only one loose end , whereas in the paper it looks like h = output = average over the loose ends.

Reproduce ENAS results on RNN

Hello @carpedm20 ,

Thanks a lot for this nice implementation of the ENAS paper. Did you manage to reproduce their results by retraining the model from scratch?

Thanks,
Best

RuntimeError: grad can be implicitly created only for scalar outputs

I encountered this strange error. Here is the output, thank you.
Before, it was showing that the error cannot run on CPU and GPU at the same time, I added . cuda() after loss, it starts showing this error.

Traceback (most recent call last):
File "D:/xiangmu/ENAS-pytorch-master/main.py", line 56, in
main(args)
File "D:/xiangmu/ENAS-pytorch-master/main.py", line 35, in main
trnr.train()
File "D:\xiangmu\ENAS-pytorch-master\trainer.py", line 223, in train
self.train_shared(dag=dag)
File "D:\xiangmu\ENAS-pytorch-master\trainer.py", line 317, in train_shared
loss.backward()
File "C:\Users\sunhaonan.conda\envs\enas\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\sunhaonan.conda\envs\enas\lib\site-packages\torch\autograd_init_.py", line 150, in backward
grad_tensors_ = make_grads(tensors, grad_tensors)
File "C:\Users\sunhaonan.conda\envs\enas\lib\site-packages\torch\autograd_init_.py", line 51, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

About 1x1 convolution

Is there any idea about how to deal with 1x1 convolution for channel alignment for parameter sharing CNN? Since the size changes as the graph change.

Cannot use test and reproduce the result?

@dukebw Hi, thanks for your code. I download it and run it but meet with 3 main problems:

It seems that it cannot reproduce the result in the paper? I run it using the run.sh by default and the eval ppl is around 80~100 until the end of the training (150 epochs)
There is no test function in the Trainer class. I add one using the evaluation method (by passing self.test_data as the arg). However, the ppl is around 1500. Even when I pass self.train_data, self.eval_data or self.valid_data, it`s also around 1500.
After training is done, when I call either test() or derive() and pass the arg --load_path, the self.shared.load_state_dict in load_model() throws out an error as "KeyError: unexpected key batch_norm.weight in state_dict". Moreover, I print the self.shared.state_dict.keys() and the content loaded by torch.load from the checkpoint, and find that, parameters stored in the checkpoint contain 4 parameters related with batch normalization as "batch_norm.weight", "batch_norm.bias", "bath_norm.running_mean", "batch_norm.running_var", while the parameters shown by self.shared.load_state_dict not.

It would be great if you could help check these.

AttributeError：‘DataLoader’ object has no attribute ‘size’

Hello, guys. When I run this command:
python main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1
I got a AttributeError：‘DataLoader’ object has no attribute ‘size’.
This is in 148 line in the utils.py. And 'Dataloader' object has no attribute 'narrow' and 'view'. The version of pytorch in my computer is 0.4.1. Is my version wrong?
Has anyone got any ideas for this problem?

Discovering Convolutional Neural Networks is also in progress?

Controller.encoder seems much too large

From the Controller constructor:

class Controller(torch.nn.Module):
    def __init__(self, args):
        torch.nn.Module.__init__(self)
        self.args = args
        self.forward_evals = 0
        if self.args.network_type == 'rnn':
            # NOTE(brendan): `num_tokens` here is just the activation function
            # for every even step,
            self.num_tokens = [len(args.shared_rnn_activations)]
            for idx in range(self.args.num_blocks):
                self.num_tokens += [idx + 1,
                                    len(args.shared_rnn_activations)]
            self.func_names = args.shared_rnn_activations
        elif self.args.network_type == 'cnn':
            self.num_tokens = [len(args.shared_cnn_types),
                               self.args.num_blocks]
            self.func_names = args.shared_cnn_types

        num_total_tokens = sum(self.num_tokens) #why sum the tokens here?
        #Shouldn't this be: num_total_tokens = len(args.shared_rnn_activations)+self.args.num_blocks
        self.encoder = torch.nn.Embedding(num_total_tokens,
                                          args.controller_hid)

It seems like num_total_tokens doesn't need to be summation of the self.num_tokens - in the case where self.args.num_blocks = 6, that number is 49. Yet from what I can tell, the largest number you can ever get where the embedding is used in Controller.forward() is going to be len(args.shared_rnn_activations)+self.args.num_blocks (in this case that would be 10)

Can't run the code, SyntaxError: invalid syntax

I suppose I'm the only one with this problem, seeing everyone else can actually run the program.

I tried both

python3 main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 \
               --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001

python3 main.py --network_type rnn --dataset wikitext

but an error comes out with

File "main.py", line 28
    raise NotImplementedError(f"{args.dataset} is not supported")
                                                               ^
SyntaxError: invalid syntax

Does anyone have any idea why? I'm using Ubuntu on Docker, with PyTorch installed (as well as those listed in requirement.txt, except pygraphviz (due to installation error, but this shouldn't raise any errors until it's actually called in utils.py, which I commented out anyway).

pygraphviz can not be installed

$ sudo pip3 install pygraphviz
WARNING: The directory '/home/usr1/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting pygraphviz
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7e/b1/d6d849ddaf6f11036f9980d433f383d4c13d1ebcfc3cd09bc845bda7e433/pygraphviz-1.5.zip (117 kB)
|████████████████████████████████| 117 kB 11.0 MB/s
Installing collected packages: pygraphviz
Running setup.py install for pygraphviz ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0z76ssje/pygraphviz/setup.py'"'"'; file='"'"'/tmp/pip-install-0z76ssje/pygraphviz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-_abu592r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/pygraphviz
cwd: /tmp/pip-install-0z76ssje/pygraphviz/
Complete output (34 lines):
running install
Trying dpkg
dpkg-query: no path found matching pattern graphviz
Could not run dpkg
Trying pkg-config
Package libcgraph was not found in the pkg-config search path.
Perhaps you should add the directory containing `libcgraph.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libcgraph' found
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-0z76ssje/pygraphviz/setup.py", line 70, in
setup(
File "/usr/local/lib/python3.8/site-packages/setuptools/init.py", line 145, in setup
return distutils.core.setup(**attrs)
File "/usr/local/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/local/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-install-0z76ssje/pygraphviz/setup_commands.py", line 44, in modified_run
self.include_path, self.library_path = get_graphviz_dirs()
File "/tmp/pip-install-0z76ssje/pygraphviz/setup_extra.py", line 162, in get_graphviz_dirs
include_dirs, library_dirs = _try_configure(include_dirs, library_dirs, _pkg_config)
File "/tmp/pip-install-0z76ssje/pygraphviz/setup_extra.py", line 117, in _try_configure
i, l = try_function()
File "/tmp/pip-install-0z76ssje/pygraphviz/setup_extra.py", line 72, in _pkg_config
output = S.check_output(['pkg-config', '--libs-only-L', 'libcgraph'])
File "/usr/local/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/local/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pkg-config', '--libs-only-L', 'libcgraph']' returned non-zero exit status 1.
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0z76ssje/pygraphviz/setup.py'"'"'; file='"'"'/tmp/pip-install-0z76ssje/pygraphviz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-_abu592r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/pygraphviz Check the logs for full command output.

Best DAG doesn't seem to be saved on it's own

I notice that in Trainer.train() self.save_model() is called at certain times to save all of the shared weights in the "super graph" (my terminology), but I don't see that the best dag is tracked such that at the end of train() we have the best DAG/model RNN for the PTB task. save_model() saves all of the weights for the entire shared weight space, but doesn't show which DAG (which sub-graph of the larger graph) represents the best RNN constructed during training - unless I'm missing something?

Do you think Parameters Sharing worked efficiently in CNN?

I don't think it can finish search architectures of CNN in 11.5 hours, because the parameters sharing is not as efficient as in RNN.

RuntimeError: cuda runtime error (59) : device-side assert triggered

Running with the default settings on 1 GPU leads to this error after running successfully for a lot of epochs. (both wikitext and ptb)

train_shared| loss: 2.844:  11%|███████▏                                                           | 3500/32634 [00:40<05:19, 91.09it/s]
2018-03-07 05:26:19,885:INFO::| epoch  83 | lr 1.20 | loss 2.97 | ppl    19.43
train_shared| loss: 2.750:  16%|██████████▊                                                        | 5250/32634 [00:59<04:58, 91.77it/s]
2018-03-07 05:26:39,208:INFO::| epoch  83 | lr 1.20 | loss 2.87 | ppl    17.70
train_shared| loss: 2.619:  21%|██████████████▎                                                    | 7000/32634 [01:19<05:03, 84.33it/s]
2018-03-07 05:26:58,999:INFO::| epoch  83 | lr 1.20 | loss 2.95 | ppl    19.18
train_shared| loss: 2.627:  27%|█████████████████▉                                                 | 8750/32634 [01:38<04:20, 91.85it/s]
2018-03-07 05:27:18,144:INFO::| epoch  83 | lr 1.20 | loss 3.26 | ppl    25.94
train_shared| loss: 4.249:  32%|█████████████████████▏                                            | 10500/32634 [01:57<04:00, 91.96it/s]
2018-03-07 05:27:37,222:INFO::| epoch  83 | lr 1.20 | loss 2.93 | ppl    18.67
train_shared| loss: 2.546:  38%|████████████████████████▊                                         | 12250/32634 [02:17<04:00, 84.66it/s]
2018-03-07 05:27:56,890:INFO::| epoch  83 | lr 1.20 | loss 2.96 | ppl    19.35
train_shared| loss:   nan:  43%|████████████████████████████▎                                     | 14000/32634 [02:37<03:35, 86.57it/s]
2018-03-07 05:28:17,371:INFO::| epoch  83 | lr 1.20 | loss nan | ppl      nan
train_shared| loss:   nan:  43%|████████████████████████████▍                                     | 14035/32634 [02:37<03:34, 86.72it/sT
HCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorCopy.cu line=100 error=59 : device-side asse
rt triggered
Traceback (most recent call last):
  File "main.py", line 45, in <module>
    main(args)
  File "main.py", line 34, in main
    trainer.train()
  File "/home/karan/metalearning-project/ENAS-pytorch/trainer.py", line 94, in train
    self.train_controller()
  File "/home/karan/metalearning-project/ENAS-pytorch/trainer.py", line 225, in train_controller
    dags, log_probs, entropies = self.controller.sample(with_details=True)
  File "/home/karan/metalearning-project/ENAS-pytorch/models/controller.py", line 96, in sample
    is_embed=block_idx==0)
  File "/home/karan/metalearning-project/ENAS-pytorch/models/controller.py", line 67, in forward
    logits = self.decoders[block_idx](hx)
  File "/home/karan/anaconda2/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/karan/anaconda2/envs/torch/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/karan/anaconda2/envs/torch/lib/python3.6/site-packages/torch/nn/functional.py", line 835, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/TH
CTensorCopy.cu:100
/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorRandom.cuh:179: void sampleMultinomialOnce(long *, long, int, T *
, T *) [with T = float, AccT = float]: block: [0,0,0], thread: [0,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorRandom.cuh:179: void sampleMultinomialOnce(long *, long, int, T *
, T *) [with T = float, AccT = float]: block: [0,0,0], thread: [1,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorRandom.cuh:179: void sampleMultinomialOnce(long *, long, int, T *
, T *) [with T = float, AccT = float]: block: [0,0,0], thread: [2,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorRandom.cuh:179: void sampleMultinomialOnce(long *, long, int, T *
, T *) [with T = float, AccT = float]: block: [0,0,0], thread: [3,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.

ModuleNotFoundError: No module named 'basenet'

Do I need additional files to import basenet?

SyntaxError: invalid syntax

hello, guys. I have implemented ENAS-pytorch on my computer. When I run this command:
python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001
i got a SyntaxError as the picture shown:

AttributeError: 'str' object has no attribute 'datset'

Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 26, in main
dataset = data.image.Image(args.data_path)
File "/home/comp/csxinhe/Code/ENAS-pytorch/data/image.py", line 8, in init
if args.datset == 'cifar10':
AttributeError: 'str' object has no attribute 'datset'

The recommend GPU

What is the recommend GPU for ENAS? Does the NVIDIA GeForce RTX 2080 Ti a good choice? Thanks

CUDA out of memory

First off, thanks for making this, looks great!

I downloaded the repo and I'm trying to run examples to test out the repo before moving on. Unfortunately I'm running into a problem with running the training in that almost immediately CUDA runs out of memory. I'm running on a GTX 1050 with 4GB of RAM (about 3GB avalible to use for training), same as the 980 you mentioned you were running on? I was just wondering if you had any ideas about what could be causing this issue! Full error message below.

python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001
2018-02-16 22:22:54,351:INFO::[*] Make directories : logs/ptb_2018-02-16_22-22-54
2018-02-16 22:22:59,204:INFO::# of parameters: 146,014,000
2018-02-16 22:22:59,315:INFO::[*] MODEL dir: logs/ptb_2018-02-16_22-22-54
2018-02-16 22:22:59,316:INFO::[*] PARAM path: logs/ptb_2018-02-16_22-22-54/params.json
train_shared:   0%|   | 0/14524 [00:00<?, ?it/s]
/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/models/controller.py:96: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  probs = F.softmax(logits)
/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/models/controller.py:97: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  log_prob = F.log_softmax(logits)
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "main.py", line 45, in <module>
    main(args)
  File "main.py", line 34, in main
    trainer.train()
  File "/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/trainer.py", line 87, in train
    self.train_shared()
  File "/home/mjhutchinson/Documents/Machine Learning/ENAS-pytorch/trainer.py", line 143, in train_shared
    loss.backward()
  File "/home/mjhutchinson/.conda/envs/pytorch/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/mjhutchinson/.conda/envs/pytorch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58

If there's any other info that would be helpful pleas let me know!

a bug related to CNN search?

Hi, thanks for sharing the code. I am implementing the CNN part. I think the block_idx in forward function should be moded by 2 when CNN case as you used only two softmax. Could you check it? Thanks.

ENAS-pytorch/models/controller.py

Lines 173 to 177 in 25c4a89

 for block_idx in range(2*(self.args.num_blocks - 1) + 1): 

 logits, hidden = self.forward(inputs, 

 hidden, 

 block_idx, 

 is_embed=(block_idx == 0))

Cifar10 CNN problem

utils.py", line 150, nbatch = data.size(0) // bsz
AttributeError: 'DataLoader' object has no attribute 'size'
-> nbatch = len (data.size)

image.py line 9,
if args == 'cifar' -> if args == 'data/cifar'

line 30-31/38-39,
-> batch_size = 200, shuffle=True,
-> num_workers= 4 , pin_memory=True)

Dependencies Versions

Can you please list the versions of all the dependencies of this repository?

AttributeError: 'Namespace' object has no attribute 'num_workers'

kukby@kukby-GI5KN54:~/ENAS-pytorch-master$ python3 main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1
2020-03-03 20:59:42,792:INFO::[*] Make directories : logs/cifar_2020-03-03_20-59-42
Files already downloaded and verified
Traceback (most recent call last):
File "main.py", line 54, in
main(args)
File "main.py", line 26, in main
dataset = data.image.Image(args)
File "/home/kukby/ENAS-pytorch-master/data/image.py", line 30, in init
num_workers=args.num_workers, pin_memory=True)

version of dependencies

I think it's necessary to give a list of dependencies version. There are so many changes in the python packages such as scipy and it quite difficult to setup the environment and reproduce the result without the exact version.

TypeError: iteration over a 0-d tensor

I tried running:

python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001

But got:


2018-09-20 11:34:56,560:INFO::[*] Make directories : logs/ptb_2018-09-20_11-34-56
2018-09-20 11:35:01,015:INFO::regularizing:
2018-09-20 11:35:06,032:INFO::# of parameters: 146,014,000
2018-09-20 11:35:06,127:INFO::[*] MODEL dir: logs/ptb_2018-09-20_11-34-56
2018-09-20 11:35:06,128:INFO::[*] PARAM path: logs/ptb_2018-09-20_11-34-56/params.json
/home/phil/anaconda3/envs/conda_env/lib/python3.7/site-packages/torch/nn/functional.py:995: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
/home/phil/anaconda3/envs/conda_env/lib/python3.7/site-packages/torch/nn/functional.py:1006: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
  File "main.py", line 48, in <module>
    main(args)
  File "main.py", line 34, in main
    trnr.train()
  File "/home/phil/devel/ENAS-pytorch/trainer.py", line 216, in train
    self.train_shared()
  File "/home/phil/devel/ENAS-pytorch/trainer.py", line 297, in train_shared
    hidden = utils.detach(hidden)
  File "/home/phil/devel/ENAS-pytorch/utils.py", line 130, in detach
    return tuple(detach(v) for v in h)
  File "/home/phil/devel/ENAS-pytorch/utils.py", line 130, in <genexpr>
    return tuple(detach(v) for v in h)
  File "/home/phil/devel/ENAS-pytorch/utils.py", line 130, in detach
    return tuple(detach(v) for v in h)
  File "/home/phil/devel/ENAS-pytorch/utils.py", line 130, in <genexpr>
    return tuple(detach(v) for v in h)
  File "/home/phil/devel/ENAS-pytorch/utils.py", line 130, in detach
    return tuple(detach(v) for v in h)
  File "/home/phil/anaconda3/envs/conda_env/lib/python3.7/site-packages/torch/tensor.py", line 381, in __iter__
    raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor

Running Pytorch 0.4.1 on Python 3.7 (also tried on python 3.6.6, pytorch 0.4.1 and had same issue)

Command to retrain searched model

Thanks for your implementation!

May I ask for the command to retrain the searched model like ./scripts/ptb_final.sh in original repo? Or is this feature not yet implemented?

In the CNN, how do you concatenate intermediate outputs coming from different amounts of pooling?

I'm trying to reimplement the CNN part but I'm wondering: how do you concatenate the intermediate results coming from different layers, some with and some without pooling? For example, Fig. 7 in the paper or the image in the readme both show that some layers take as input the concat of the intermediate output of a max pooling layer and some other conv layer before it. But if you assume that you have inputs of 224x224, strides=1 in the convs and the right amount of padding, the output of a conv layer will be n_filtersx224x224, while the pooling output (assuming stride 2) will be n_filtersx112x112. How do you concatenate the two intermediate outputs?

Understanding the output during training

Hey,

Thanks a lot for implementing ENAS in Pytorch! I am able to run ENAS with Penn Treebank dataset. I am trying to understand the output during training. At beginning, the output shows some information about the gradients, for example:

2018-08-29 23:53:50,910:INFO::abs max grad 0.5459082126617432
2018-08-29 23:53:56,523:INFO::abs max grad 0.569364070892334
2018-08-29 23:54:02,990:INFO::abs max grad 0.6024199724197388
2018-08-29 23:54:11,814:INFO::max hidden 16.342870712280273

Could anyone help me understand the output a little? Why does the output stop showing ppl and why the epoch number is always 0? How to directly monitor the ppl?

Thanks!
Yue

AssertionError: Torch not compiled with CUDA enabled

I try to run the ptb example using the command: (No GPU on my laptop)
python main.py --network_type rnn --dataset ptb --num_gpu 0

The program ran for a while, and an error occurred.

Can anyone give me some advice? Thanks!

REINFORCE

It is clear that controller falls into a local optimal while it can't find better actions from REINFORCE. I think unknown c of c/valid ppl, moving average baseline and temperature of logits are what needed to be fixed. See more details (especially TODOs) in 497c2e7.

Improvement on requirement installation

Just in case - for people not using conda, and using ubuntu

sudo apt-get install graphviz-dev graphviz
will install the dependency for pygraphviz

Consulting perplexity test question with RNN cell in ENAS

Hello，Kim.Currently I have been testing the RNN architecture in figure 6 found in the article. However, the perplexity I got is about 84 at about 41 epoch, which is not equal to 55.8 found in Table 1, Section 3.2 in ENAS. The details of my test experiment are as follows:
In the code, I use the "single" mode in the config.py to train the architecture of the figure 6 in the article. The DAG used is {-1: [Node(id=0, name='tanh')], -2: [Node(id=0, name='tanh')], 0: [Node(id=1, name='tanh')], 1: [Node(id=2, name='ReLU'), Node(id=3, name='tanh')], 2: [Node(id=4, name='ReLU'), Node(id=5, name='tanh'), Node(id=6, name='tanh')], 6: [Node(id=7, name='ReLU')], 7: [Node(id=8, name='ReLU')], 8: [Node(id=9, name='ReLU'), Node(id=10, name='ReLU'), Node(id=11, name='ReLU')], 3: [Node(id=12, name='avg')], 4: [Node(id=12, name='avg')], 5: [Node(id=12, name='avg')], 9: [Node(id=12, name='avg')], 10: [Node(id=12, name='avg')], 11: [Node(id=12, name='avg')], 12: [Node(id=13, name=‘h[t]')]}.

The data set used is PTB. For the Penn Treebank experiments, ω is trained for about 400 steps, each on a minibatch of 64 examples, where the gradient ∇ω is computed using back-propagation through time, truncated at 35-time steps. And I evaluate ppl over the entire validation set (batch size = 1).

The weights were trained using the SGD method, with an initial learning rate of 20, and after 15 epochs, it attenuated with a factor of 0.96. A total of 150 epoch were tested. The number of hidden layers is 1000 and the number of embed layers is 1000, the number of activation functional blocks is 12. The total parameters are (1000 + 1000)* 1000 * 12 = 24M.

As for techniques, Dropout = 0.5. And I set the activation_regularization, temporal_activation_regularization, temporal_activation_regularization_amount as True in config.py to use the weight penalty techniques in the code. Weight Tying is also used in the code. Additionally, I also augment the simple transformations between nodes in the constructed recurrent cell with highway connections(Zilly et al., 2017).

Could please tell me if I have done something wrong with the learning rate or other configurations for testing the DAG in figure 6 of the paper. I am very looking forward to your reply.

The question about INF network parameters

When I was running your code, I found that some network parameters into INF. Is there any suggestion to solve this problem. Thanks.
By the way, I was curiosity about the test ppl after training.

	for block_idx in range(2*(self.args.num_blocks - 1) + 1):
	logits, hidden = self.forward(inputs,
	hidden,
	block_idx,
	is_embed=(block_idx == 0))