iboing / cliquenet Goto Github PK

Convolutional Neural Networks with Alternately Updated Clique (to appear in CVPR 2018)

License: MIT License

Python 100.00%

computer-vision deep-learning image-recognition

cliquenet's Introduction

CliqueNet

This repository is for Convolutional Neural Networks with Alternately Updated Clique (to appear in CVPR 2018, Oral presentation),

by Yibo Yang, Zhisheng Zhong, Tiancheng Shen, and Zhouchen Lin.

citation

If you find CliqueNet useful in your research, please consider citing:

@article{yang18,
 author={Yibo Yang and Zhisheng Zhong and Tiancheng Shen and Zhouchen Lin},
 title={Convolutional Neural Networks with Alternately Updated Clique},
 journal={arXiv preprint arXiv:1802.10419},
 year={2018}
}

Introduction
Usage
Ablation experiments
Comparison with state of the arts
Results on ImageNet

Introduction

CliqueNet is a newly proposed convolutional neural network architecture where any pair of layers in the same block are connected bilaterally (Fig 1). Any layer is both the input and output another one, and information flow can be maximized. During propagation, the layers are updated alternately (Fig 2), so that each layer will always receive the feedback information from the layers that are updated more lately. We show that the refined features are more discriminative and lead to a better performance. On benchmark classification datasets including CIFAR-10, CIFAR-100, SVHN, and ILSVRC 2012, we achieve better or comparable results over state of the arts with fewer parameters. This repo contains the code of our project, and also provides some experimental results that are out of the paper.

Fig 1. An illustration of a block with 4 layers. Node 0 denotes the input layer of this block.

Fig 2. Alternate updating rule in CliqueNet. "{}" denotes the concatenating operator.

Usage

Our experiments are conducted with TensorFlow in Python 2.
Clone this repo: git clone https://github.com/iboing/CliqueNet
An example to train a model on CIFAR or SVHN:

python train.py --gpu [gpu id] --dataset [cifar-10 or cifar-100 or SVHN] --k [filters per layer] --T [all layers of three blocks] --dir [path to save models]

Additional techniques (optional): if you want to use attentional transition, bottleneck architecture, or compression strategy in our paper, add --if_a True, --if_b True, and --if_c True, respectively.

Ablation experiments

With the feedback connections, CliqueNet alternately re-update previous layers with updated layers, to enable refined features. The weights among layers are re-used for multiple times, so that a deeper representation space can be attained with a fixed number of parameters. In order to test the effectiveness of CliqueNet's feature refinement, we analyze the features generated in different stages by conducting experiments using different versions of CliqueNet. As illustrated by Fig 3, the CliqueNet(I+I) only uses Stage-I feature. The CliqueNet(I+II) uses Stage-I feature concatenated with input layer as the block feature, but transits Stage-II feature into the next block. The CliqueNet(II+II) only uses refined features.

Fig 3. A schema for CliqueNet(i+j), i,j belong to {I,II}.

Model	block feature	transit	error(%)
CliqueNet(I+I)	{ X_0, Stage-I }	Stage-I	6.64
CliqueNet(I+II)	{ X_0, Stage-I }	Stage-II	6.10
CliqueNet(II+II)	{ X_0, Stage-II }	Stage-II	5.76

Tab 1. Resutls of different versions of CliqueNets.

To run the experiments above, please modify train.py as:

from models.cliquenet_I_I import build_model

for CliqueNet(I+I), and

from models.cliquenet_I_II import build_model

for CliqueNet(I+II).

We further consider a situation where the feedback is not processed entirely. Concretely, when k=64 and T=15, we use the Stage-II feature, but only the first X steps, see Fig 2. Then X=0 is just the case of CliqueNet(I+I), and X=5 corresponds to CliqueNet(II+II).

Model	CIFAR-10	CIFAR-100
CliqueNet(X=0)	5.83	24.79
CliqueNet(X=1)	5.63	24.65
CliqueNet(X=2)	5.54	24.37
CliqueNet(X=3)	5.41	23.75
CliqueNet(X=4)	5.20	24.04
CliqueNet(X=5)	5.12	23.73

Tab 2. Performance of CliqueNets with different X.

To run the experiments with different X, modify train.py as:

from models.cliquenet_X import build_model

and set the value of X in ./models/cliquenet_X.py

Comparison with state of the arts

The results listed below demonstrate the superiority of CliqueNet over DenseNet when there are no additional techniques (bottleneck, compression, etc.).

Model	FLOPs	Params	CIFAR-10	CIFAR-100	SVHN
DenseNet (k = 12, T = 36)	0.53G	1.0M	7.00	27.55	1.79
DenseNet (k = 12, T = 96)	3.54G	7.0M	5.77	23.79	1.67
DenseNet (k = 24, T = 96)	13.78G	27.2M	5.83	23.42	1.59

CliqueNet (k = 36, T = 12)	0.91G	0.94M	5.93	27.32	1.77
CliqueNet (k = 64, T = 15)	4.21G	4.49M	5.12	23.98	1.62
CliqueNet (k = 80, T = 15)	6.45G	6.94M	5.10	23.32	1.56
CliqueNet (k = 80, T = 18)	9.45G	10.14M	5.06	23.14	1.51

Tab 3. Main results on CIFAR and SVHN without data augmentation.

Because larger T would lead to higher computation cost and slightly more parameters, we prefer using a larger k in our experiments. To make comparisons more fair, we also consider the situation where k and T of DenseNets and CliqueNets are exactly the same, see Tab 4.

Model	Params	CIFAR-10	CIFAR-100
DenseNet(k=12,T=36)	1.02M	7.00	27.55
CliqueNet(k=12,T=36)	1.05M	5.79	26.85

DenseNet(k=24,T=18)	0.99M	7.13	27.70
CliqueNet(k=24,T=18)	0.99M	6.04	26.57

DenseNet(k=36,T=12)	0.96M	6.89	27.54
CliqueNet(k=36,T=12)	0.94M	5.93	27.32

Tab 4. Comparisons with the same k and T.

Note that the result of DenseNet(k=12, T=36) is reported by original paper. The others are implementated by ourselves under the same experimental settings.

Results on ImageNet

Our code for experiments on ImageNet with TensorFlow will be released soon.

Here we provide a PyTorch version to train a CliqueNet on ImageNet. An example to run:

python train_imagenet.py [path to the imagenet dataset]

(As the default, CliqueNet-S3 is trained, batchsize is 160 and attentional transition is used.)

The PyTorch pre-trained model can be downloaded here (Google Drive): S3_model.

cliquenet's People

Contributors

Stargazers

Watchers

cliquenet's Issues

About the number of layers in blocks

The description of the number of layers in blocks seems mix the cases having bottleneck layers with the cases without bottleneck layers in your paper according to your code.
All the models for ImageNet have bottleneck layers. The title of Table 3 and the code indicate that the number of layers refers to the times of growth. One model for CIFAR, CliqueNet (k = 150; T = 30), has bottleneck layers too. The title of Table 4 shown that T is the total number of layers in three blocks. On the contrary, your code indicate that the number is 15 actually with the same meaning as the models for ImageNet.
If my understanding is correct, the description in your paper really mislead readers. It is better to comment on it.

Ambiguity in the explanation: Figure 1 and Table 1.

In the paper the Table 1 Bottom layers should be addressed as Top Layers and vice versa. Because, the all layers except input layers are being updated and X0 is the input layer and is written under the column of bottom layers so maybe I think either the column names should be switched.
I make above understanding based on the assumption that the top layers update the bottom layers.
If I misunderstood please someone explain this and rectify me.

Thanks

HOW apply Grad_Cam on Cliquenet

Cliquenet structure is different from the most cnn，such as densenet.
I can use grad_CAM on Densenet, but don't know how to apply Grad_Cam on Cliquenet?
Please give some suggestions, thx.

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Hello,I use python3.6 to run the program.
But I had a problem like this,
if os.path.exists(result_dir) == False:
File "D:\Downloads\Software\python37\lib\genericpath.py", line 19, in exists
os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
If you have encountered this problem and it has been solved, please let me know, I would appreciate it.Thank you.

Mean and std deviation computed on both training set and validation set for CIFAR/SVHN

In line 74~76 in dataloader/data_generator.py, mean and standard deviation are computed from both training set and validation set for CIFAR and SVHN. Should they be computed only over the training set to normalize both training data and validation data in data preprocessing?

How do you calculate number of parameters and Gflops

Hi ! how do you calculate number of parameters and Gflops? Do you have code ? Thanks！

could you release your pretrained model?

How to use the datasets?

I have downloaded the code and data set, how to run the program?
my version is python3 with tensorflow1.5,I have hanged the code to python3,but cannot run it.

it seems a bug in build_cliquenet(pytorch)

my input size is `1, 3, 128,128`

after output = self.pool(output), I got an error:

Traceback (most recent call last):
  File "D:/projects/machine_learning/gta5/train/clique_train.py", line 176, in <module>
    Net()
  File "D:/projects/machine_learning/gta5/train/clique_train.py", line 38, in Net
    out = model.forward(x)
  File "D:\projects\machine_learning\gta5\model\pytorch\cliquenet.py", line 70, in forward
    feature_I_list.append(self.list_gb[i](block_feature_I))
  File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\projects\machine_learning\gta5\model\pytorch\utils.py", line 65, in forward
    output = self.pool(output)
  File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Anaconda3\lib\site-packages\torch\nn\modules\pooling.py", line 547, in forward
    self.padding, self.ceil_mode, self.count_include_pad)
RuntimeError: Given input size: (152x32x32). Calculated output size: (152x0x0). Output size is too small at c:\programdata\miniconda3\conda-bld\pytorch_1524549877902\work\aten\src\thcunn\generic/SpatialAveragePooling.cu:63

Inconsistent number of parameters for CliqueNet (k = 150; T = 30) on CIFAR-10

There is a model, CliqueNet (k = 150; T = 30), in Table 4 in your paper. I have trained it using the command:
python train.py --gpu 0 --dataset cifar-10 --k 150 --T 30 --dir ./checkpoints --if_a True --if_b True --if_c True . However, the total number of parameters is shown as 10.48M but not 10.02M which is written in your paper. It is the only model with the inconsistent number of parameters while your codes show that all the other models have the same number of parameters as what are written in paper.
Would you please check this conflict?

could you give me imagenet which you used?

Why don't you use bias in convolution?

Hi there! I can't find any bias used in your code, but bias=False. Could you please tell me why? Thanks

Maybe something wrong in "./models/utils.py"

The code in Line 350 in “./models/utils.py” maybe wrong,
stage_II=tf.concat((stage_II, blob_dict[str(layer_id)]), axis=3) .
The "blob_dict" may be "blob_dict_new" , since here is the Stage-II feature which is inputted to the next
block.

Error in load_state_dict with your pretrained model

Hi! Thanks for a nice network.
I have a problem. When I try to continue training with pretrained model that you shared I get
KeyError: 'unexpected key "module.block1.conv_param.1.weight" in state_dict'

Is Nesterov momentum used for ImageNet?

There is a sentence in your paper:

We train our models using stochastic gradient descent (SGD) with 0.9 Nesterov momentum and 10-4 weight decay.

But in line 77 in train_imagenet.py, nesterov=True is not set in torch.optim.SGD(). Hence, is Nesterov momentum used in models for ImageNet on earth?

os question

hi, i have a problem :Traceback (most recent call last):
if not exists(result_dir):
os.mkdir(result_dir)
TypeError: mkdir: can't specify None for path argument
can you tell me how to fix it ,ths

Is Xavier initialization used for the weights of fc on ImageNet?

The parameters are initialized according to [12] and the weights of fully connected layer are using Xavier initialization [10].

is written in your paper. However, Pytorch code for ImageNet only uses the default initialization of full connection but not Xavier initialization.
Thus, is Xavier initialization used for fc on ImageNet on earth?

Maybe something wrong in ‘/imagenet_pytorch/utils.py’？

I noticed that:

in file ‘/imagenet_pytorch/utils.py’
line 152
block_feature_II = torch.cat((block_feature_II, self.blob_dict_list[1][str(layer_id)]), 1)

maybe the self.blob_dict_list[1] should be replaced with self.blob_dict_list[self.loop_num]
to get the final loop result instead of the first loop result?