Git Product home page Git Product logo

cliquenet's Introduction

CliqueNet

This repository is for Convolutional Neural Networks with Alternately Updated Clique (to appear in CVPR 2018, Oral presentation),

by Yibo Yang, Zhisheng Zhong, Tiancheng Shen, and Zhouchen Lin.

citation

If you find CliqueNet useful in your research, please consider citing:

@article{yang18,
 author={Yibo Yang and Zhisheng Zhong and Tiancheng Shen and Zhouchen Lin},
 title={Convolutional Neural Networks with Alternately Updated Clique},
 journal={arXiv preprint arXiv:1802.10419},
 year={2018}
}

table of contents

Introduction

CliqueNet is a newly proposed convolutional neural network architecture where any pair of layers in the same block are connected bilaterally (Fig 1). Any layer is both the input and output another one, and information flow can be maximized. During propagation, the layers are updated alternately (Fig 2), so that each layer will always receive the feedback information from the layers that are updated more lately. We show that the refined features are more discriminative and lead to a better performance. On benchmark classification datasets including CIFAR-10, CIFAR-100, SVHN, and ILSVRC 2012, we achieve better or comparable results over state of the arts with fewer parameters. This repo contains the code of our project, and also provides some experimental results that are out of the paper.

Fig 1. An illustration of a block with 4 layers. Node 0 denotes the input layer of this block.

Fig 2. Alternate updating rule in CliqueNet. "{}" denotes the concatenating operator.

Usage

  • Our experiments are conducted with TensorFlow in Python 2.
  • Clone this repo: git clone https://github.com/iboing/CliqueNet
  • An example to train a model on CIFAR or SVHN:
python train.py --gpu [gpu id] --dataset [cifar-10 or cifar-100 or SVHN] --k [filters per layer] --T [all layers of three blocks] --dir [path to save models]
  • Additional techniques (optional): if you want to use attentional transition, bottleneck architecture, or compression strategy in our paper, add --if_a True, --if_b True, and --if_c True, respectively.

Ablation experiments

With the feedback connections, CliqueNet alternately re-update previous layers with updated layers, to enable refined features. The weights among layers are re-used for multiple times, so that a deeper representation space can be attained with a fixed number of parameters. In order to test the effectiveness of CliqueNet's feature refinement, we analyze the features generated in different stages by conducting experiments using different versions of CliqueNet. As illustrated by Fig 3, the CliqueNet(I+I) only uses Stage-I feature. The CliqueNet(I+II) uses Stage-I feature concatenated with input layer as the block feature, but transits Stage-II feature into the next block. The CliqueNet(II+II) only uses refined features.

Fig 3. A schema for CliqueNet(i+j), i,j belong to {I,II}.

Model block feature transit error(%)
CliqueNet(I+I) { X_0, Stage-I } Stage-I 6.64
CliqueNet(I+II) { X_0, Stage-I } Stage-II 6.10
CliqueNet(II+II) { X_0, Stage-II } Stage-II 5.76

Tab 1. Resutls of different versions of CliqueNets.

To run the experiments above, please modify train.py as:

from models.cliquenet_I_I import build_model

for CliqueNet(I+I), and

from models.cliquenet_I_II import build_model

for CliqueNet(I+II).

We further consider a situation where the feedback is not processed entirely. Concretely, when k=64 and T=15, we use the Stage-II feature, but only the first X steps, see Fig 2. Then X=0 is just the case of CliqueNet(I+I), and X=5 corresponds to CliqueNet(II+II).

Model CIFAR-10 CIFAR-100
CliqueNet(X=0) 5.83 24.79
CliqueNet(X=1) 5.63 24.65
CliqueNet(X=2) 5.54 24.37
CliqueNet(X=3) 5.41 23.75
CliqueNet(X=4) 5.20 24.04
CliqueNet(X=5) 5.12 23.73

Tab 2. Performance of CliqueNets with different X.

To run the experiments with different X, modify train.py as:

from models.cliquenet_X import build_model

and set the value of X in ./models/cliquenet_X.py

Comparison with state of the arts

The results listed below demonstrate the superiority of CliqueNet over DenseNet when there are no additional techniques (bottleneck, compression, etc.).

Model FLOPs Params CIFAR-10 CIFAR-100 SVHN
DenseNet (k = 12, T = 36) 0.53G 1.0M 7.00 27.55 1.79
DenseNet (k = 12, T = 96) 3.54G 7.0M 5.77 23.79 1.67
DenseNet (k = 24, T = 96) 13.78G 27.2M 5.83 23.42 1.59
CliqueNet (k = 36, T = 12) 0.91G 0.94M 5.93 27.32 1.77
CliqueNet (k = 64, T = 15) 4.21G 4.49M 5.12 23.98 1.62
CliqueNet (k = 80, T = 15) 6.45G 6.94M 5.10 23.32 1.56
CliqueNet (k = 80, T = 18) 9.45G 10.14M 5.06 23.14 1.51

Tab 3. Main results on CIFAR and SVHN without data augmentation.

Because larger T would lead to higher computation cost and slightly more parameters, we prefer using a larger k in our experiments. To make comparisons more fair, we also consider the situation where k and T of DenseNets and CliqueNets are exactly the same, see Tab 4.

Model Params CIFAR-10 CIFAR-100
DenseNet(k=12,T=36) 1.02M 7.00 27.55
CliqueNet(k=12,T=36) 1.05M 5.79 26.85
DenseNet(k=24,T=18) 0.99M 7.13 27.70
CliqueNet(k=24,T=18) 0.99M 6.04 26.57
DenseNet(k=36,T=12) 0.96M 6.89 27.54
CliqueNet(k=36,T=12) 0.94M 5.93 27.32

Tab 4. Comparisons with the same k and T.

Note that the result of DenseNet(k=12, T=36) is reported by original paper. The others are implementated by ourselves under the same experimental settings.

Results on ImageNet

Our code for experiments on ImageNet with TensorFlow will be released soon.

Here we provide a PyTorch version to train a CliqueNet on ImageNet. An example to run:

python train_imagenet.py [path to the imagenet dataset]

(As the default, CliqueNet-S3 is trained, batchsize is 160 and attentional transition is used.)

The PyTorch pre-trained model can be downloaded here (Google Drive): S3_model.

cliquenet's People

Contributors

iboing avatar stc1995 avatar zs-zhong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cliquenet's Issues

About the number of layers in blocks

The description of the number of layers in blocks seems mix the cases having bottleneck layers with the cases without bottleneck layers in your paper according to your code.
All the models for ImageNet have bottleneck layers. The title of Table 3 and the code indicate that the number of layers refers to the times of growth. One model for CIFAR, CliqueNet (k = 150; T = 30), has bottleneck layers too. The title of Table 4 shown that T is the total number of layers in three blocks. On the contrary, your code indicate that the number is 15 actually with the same meaning as the models for ImageNet.
If my understanding is correct, the description in your paper really mislead readers. It is better to comment on it.

Ambiguity in the explanation: Figure 1 and Table 1.

In the paper the Table 1 Bottom layers should be addressed as Top Layers and vice versa. Because, the all layers except input layers are being updated and X0 is the input layer and is written under the column of bottom layers so maybe I think either the column names should be switched.
I make above understanding based on the assumption that the top layers update the bottom layers.
If I misunderstood please someone explain this and rectify me.

Thanks

HOW apply Grad_Cam on Cliquenet

Cliquenet structure is different from the most cnn,such as densenet.
I can use grad_CAM on Densenet, but don't know how to apply Grad_Cam on Cliquenet?
Please give some suggestions, thx.

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Hello,I use python3.6 to run the program.
But I had a problem like this,
if os.path.exists(result_dir) == False:
File "D:\Downloads\Software\python37\lib\genericpath.py", line 19, in exists
os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
If you have encountered this problem and it has been solved, please let me know, I would appreciate it.Thank you.

How to use the datasets?

I have downloaded the code and data set, how to run the program?
my version is python3 with tensorflow1.5,I have hanged the code to python3,but cannot run it.

it seems a bug in build_cliquenet(pytorch)

my input size is 1, 3, 128,128

g2p7 n ptv17cponjtcw
ztvzj 5 vuul n8c eet 7
gy6iz609 d0h qn 46bs4e

after output = self.pool(output), I got an error:

Traceback (most recent call last):
  File "D:/projects/machine_learning/gta5/train/clique_train.py", line 176, in <module>
    Net()
  File "D:/projects/machine_learning/gta5/train/clique_train.py", line 38, in Net
    out = model.forward(x)
  File "D:\projects\machine_learning\gta5\model\pytorch\cliquenet.py", line 70, in forward
    feature_I_list.append(self.list_gb[i](block_feature_I))
  File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\projects\machine_learning\gta5\model\pytorch\utils.py", line 65, in forward
    output = self.pool(output)
  File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Anaconda3\lib\site-packages\torch\nn\modules\pooling.py", line 547, in forward
    self.padding, self.ceil_mode, self.count_include_pad)
RuntimeError: Given input size: (152x32x32). Calculated output size: (152x0x0). Output size is too small at c:\programdata\miniconda3\conda-bld\pytorch_1524549877902\work\aten\src\thcunn\generic/SpatialAveragePooling.cu:63

Inconsistent number of parameters for CliqueNet (k = 150; T = 30) on CIFAR-10

There is a model, CliqueNet (k = 150; T = 30), in Table 4 in your paper. I have trained it using the command:
python train.py --gpu 0 --dataset cifar-10 --k 150 --T 30 --dir ./checkpoints --if_a True --if_b True --if_c True . However, the total number of parameters is shown as 10.48M but not 10.02M which is written in your paper. It is the only model with the inconsistent number of parameters while your codes show that all the other models have the same number of parameters as what are written in paper.
Would you please check this conflict?

Maybe something wrong in "./models/utils.py"

The code in Line 350 in “./models/utils.py” maybe wrong,
stage_II=tf.concat((stage_II, blob_dict[str(layer_id)]), axis=3) .
The "blob_dict" may be "blob_dict_new" , since here is the Stage-II feature which is inputted to the next
block.

Error in load_state_dict with your pretrained model

Hi! Thanks for a nice network.
I have a problem. When I try to continue training with pretrained model that you shared I get
KeyError: 'unexpected key "module.block1.conv_param.1.weight" in state_dict'

Is Nesterov momentum used for ImageNet?

There is a sentence in your paper:

We train our models using stochastic gradient descent (SGD) with 0.9 Nesterov momentum and 10-4 weight decay.

But in line 77 in train_imagenet.py, nesterov=True is not set in torch.optim.SGD(). Hence, is Nesterov momentum used in models for ImageNet on earth?

os question

hi, i have a problem :Traceback (most recent call last):
if not exists(result_dir):
os.mkdir(result_dir)
TypeError: mkdir: can't specify None for path argument
can you tell me how to fix it ,ths

Is Xavier initialization used for the weights of fc on ImageNet?

The parameters are initialized according to [12] and the weights of fully connected layer are using Xavier initialization [10].

is written in your paper. However, Pytorch code for ImageNet only uses the default initialization of full connection but not Xavier initialization.
Thus, is Xavier initialization used for fc on ImageNet on earth?

Maybe something wrong in ‘/imagenet_pytorch/utils.py’?

I noticed that:

in file ‘/imagenet_pytorch/utils.py’
line 152
block_feature_II = torch.cat((block_feature_II, self.blob_dict_list[1][str(layer_id)]), 1)

maybe the self.blob_dict_list[1] should be replaced with self.blob_dict_list[self.loop_num]
to get the final loop result instead of the first loop result?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.