Git Product home page Git Product logo

densenet's People

Contributors

ajschumacher avatar cmasch avatar gaohuang avatar liuzhuang13 avatar nikhil-kasukurthi avatar okason97 avatar taineleau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

densenet's Issues

A question about network structure

Hi, I want to design a DenseNet which has the smalle number of layers.(e.g. L=30 or another). How to set each dense block and k (L denotes the network depth and k its growth rate.)? Are there any rules?

Convolution after ReLU in Dense Layer Question

I've seen that you use:

BN -> ReLU -> Conv3x3 -> Dropout

on the normal case, or

BN -> ReLU -> Conv1x1 -> Dropout -> BN -> ReLU -> Conv3x3 -> Dropout

when using bottleneck. The question is why? Most networks use e.g.

Conv3x3 -> BN -> ReLU -> Dropout

Why did you invert the order? Did you get better results this way?

Thanks in advance!

error using CAddTable and ConcatTable

There are a lot of residual blocks in your densenet, I tried to build a few in my own network, but why torch just gives me errors when using CAddTable and ConcatTable. Could you please give some advice? The code I use is here:

    a=image.load('test.png')
    input=torch.Tensor(1,3,60,60)
    input[1]=a
    input=input:cuda()
local conv_block_1 = nn.Sequential()
conv_block_1:add(cudnn.SpatialConvolution(3, 16, 5, 5, 1, 1, 2, 2))--  ,(60+2*2-5)/1+1=60
conv_block_1:add(cudnn.SpatialBatchNormalization(16))
conv_block_1:add(cudnn.ReLU(true))

local conv_block_2 = nn.Sequential()
conv_block_2:add(cudnn.SpatialConvolution(16, 32, 5, 5, 1, 1, 2, 2)) -- (60+2*2-5)/1+1=60
conv_block_2:add(cudnn.SpatialBatchNormalization(32))
conv_block_2:add(cudnn.ReLU(true))

local conv_block_3 = nn.Sequential()
conv_block_3:add(cudnn.SpatialConvolution(32, 16, 5, 5, 1, 1, 2, 2)) -- (60+2*2-5)/1+1=60
conv_block_3:add(cudnn.SpatialBatchNormalization(16))
conv_block_3:add(cudnn.ReLU(true))

local concat_block_1 = nn.ConcatTable()
concat_block_1:add(conv_block_1)  ----
concat_block_1:add(conv_block_3 )

local add_block_1 = nn.Sequential()
add_block_1:add(concat_block_1)
add_block_1:add(nn.CAddTable(true))
add_block_1:add(cudnn.ReLU(true))

local model=nn.Sequential()
model:add(conv_block_1)
 
model:add(conv_block_2)
    model:add(conv_block_3)
   model:add(add_block_1)
    model:cuda()
    model:forward(input)

and the error reads like this:
In 4 module of nn.Sequential:
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 1 module of nn.Sequential:
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:102: input has to contain: 3 feature maps, but received input of size: 1 x 16 x 60 x 60
stack traceback:

About a tensorflow implementation

I've followed one of Tensorflow implementations of DenseNet (https://github.com/ikhlestov/vision_networks) to reproduce DenseNet-BC-100-12.
It seemed to me that the tensorflow implementation is nearly equivalent with one from this repo,
but I couldn't reach to ~4.5 % error (the best one was about ~4.8 %, by the way)
Could you give me any reasons why it is? I already compared two codes very carefully, but couldn't find.

results on cifar100

Hi,

Thanks for the great work and released code. I have tried to run several times on cifar100+ with the DenseNet-BC (L=190, k=40), but it's hard to reproduce the result 17.18. My training script looks like this, simply replace the dataset to be cifar100 and without the efficient setting:
python demo.py --depth 190 --growth_rate 40 --save ckpts --batch_size 64 --valid_size 0
The best result I got is about 17.3x, did you think this result is also acceptable or did I miss anything? Thanks a lot.

Nice figures !

Hey,
I am sorry to ask this, but your figures are really nice, I have no experience for drawing nn figure and I would like to follow your style if you let me :)
Could you please tell me what did you use to make such simple and nice looking figures ?
Thanks !

Cuda out of memory when use memory efficient densenet for deploying

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0129 15:24:34.494936 153936 DenseBlock_layer.cu:203] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0129 18:17:47.501026 32543 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***

When i deploy the trained model with memory efficient densenet with matlab, i always counter this problem . I think it will be solved to reset cuda memory after each time calculation, can you give me some advice to solve this problem.

Purpose using first convolution

In your network architecture for CIFAR and imagenet dataset, what does purpose use the first convolution (before pooling-denseblock1)? In the imagenet, you use two convolutions block before entering the dense-block, while CIFAR just one, any reason? Thanks

Wide-DenseNet

Congrats for best paper on CVPR 2017!
I'm troubled with the memory problem with densenet. Would you share your wide-densenet implementation and pre-train models publicly?

Best!

Memory efficient implementation of Caffe

Hi,
I saw this caffe implementation which is memory efficient.
https://github.com/Tongcheng/DN_CaffeScript

And I also notice this in wiki

Memory efficient implementation (newly added feature on June 6, 2017)

There is an option -optMemory which is very useful for reducing GPU memory footprint when training a DenseNet. By default, the value is set to 2, which activates the shareGradInput function 
....

Does that caffe use the above memory efficient way to implementation?

Thanks.

Covolution before entering the first dense block for imagenet dataset

Hi, there

For the imagenet dataset, DenseNet use 7x7 Conv before entering the first dense block;
I also read the following paper condensenet , which use 3x3 conv before entering the first block.
I wonder if i can change the 7x7 conv to 3x3, and keep the pooling unchanged (since it makes densenet more parameter- efficient). Does it hurt DenseNet's performance on imagenet?

The amount of parameters

I use the following setting, as suggested in the github.
L=40,k=12, no bottleneck
However, the parameter number is not 1M, it's 0.6M.
This problem also happen when I turn bottelneck on. I got different parameter number than the reported one.
Please tell me where do I miss. Thank you.

Calling the model:

dn_opt = {}
dn_opt.depth = 40
dn_opt.dataset = 'cifar10'
model = paths.dofile('densenet.lua')(dn_opt)
model:cuda()
print(model:getParameters():size())

In densenet.lua

local growthRate = 12

    --dropout rate, set it to 0 to disable dropout, non-zero number to enable dropout and set drop rate
    local dropRate = 0

    --#channels before entering the first denseblock
    local nChannels = 2 * growthRate

    --compression rate at transition layers
    local reduction = 0.5

    --whether to use bottleneck structures
    local bottleneck = false

Output of the parameter size

599050
[torch.LongStorage of size 1]

Great results! CIFAR-100 top1 accuracy ~ 100%

Not a bug, just a praise: Within a few hours the CIFAR-100 accuracy goes up to 100% !

 | Epoch: [190][423/782]    Time 0.166  Data 0.000  Err 0.0518  top1   0.000  top5   0.000
 | Epoch: [190][424/782]    Time 0.166  Data 0.000  Err 0.1035  top1   3.125  top5   0.000
 | Epoch: [190][425/782]    Time 0.166  Data 0.000  Err 0.0389  top1   1.562  top5   0.000

Legendary! Time to increase the test set.

./checkpoints.lua:52: attempt to call method 'clearState' (a nil value)

  1. Great work, a milestone! Never seen top5 go down so seamlessly.

  2. little bug:

th main.lua -netType densenet -depth 40 -dataset cifar10 -batchSize 64 -nEpochs 300 -optnet true
...
 | Test: [1][157/157]    Time 0.021  Data 0.000  top1  87.500 ( 83.420)  top5  50.000 ( 55.870) 
 * Finished epoch # 1     top1:  83.420  top5:  55.870
 * Best model   83.42   55.87   

./checkpoints.lua:52: attempt to call method 'clearState' (a nil value)

Deep-Narrow DenseNet

I was wondering if you ever tried the extreme case growth_rate = 1 with a very deep network? Just as an exercise I implemented a fully-connected dense block with growth_rate = 1 and depth = 50 on a 2D dataset so I could visualize what each neuron was learning, the results where very nice.

The layers within the second and third dense block don't assign the least weight to the outputs of the transition layer in my trained model

I am not sure if it's appropriate to open this issue in github project, this is a question about the heatmap in your paper.

I trained a DenseNet on C10+ with L = 40 and k = 12, which is same as yours , and then I verified the weights on a trained model with 94.6% accuracy, but I didn't get the same result as your observation 3. In my test, the layers within the second and third dense block assign considerable weight to the outputs of the transition layer.

For example, the first conv layer in the second dense block has 0.013281956 average weight on the 1st transition layer output(168 channels, i.e. all the input channels), the second conv layer has 0.011933382 average weight on the 1st transition layer output(first 168 channels), and 0.024417713 average weight on the 12 channels outputted from the first conv layer. This is reasonable because closer channels are more important. The rest layers have similar weights distributions on the old channels and the new channels. And similar condition is in dense block 3.

My densenet and training code is aligned to yours, including augmentation and input norm, see https://github.com/seasonyc/densenet/blob/master/densenet.py and https://github.com/seasonyc/densenet/blob/master/cifar10-test.py. The model file is in https://github.com/seasonyc/densenet/blob/master/dense_augmodel-ep0300-loss0.112-acc0.999-val_loss0.332-val_acc0.946.h5, and my code to count the weights is in https://github.com/seasonyc/densenet/blob/master/weights-verify.py.

I know the models trained in different times are different, even the features of conv filters are different, but I believe the weights distributions are similar in statistics. So although we have different models, we should have similar result.

I did this verification because I feel the observation 3 is a little unreasonable. The 1st conv layer uses the information from the previous dense block very much, and then the 2nd conv layer ignores the information from hundreds of channels but only uses the information from 12 channels, can the 1st conv layer really concentrate hundreds of channels into 12 channels by training?

Do you want to double-check this?

Thanks
YC

Why did you use MomentumOptimizer? and dropout...

Hello
When I saw DenseNet, I implemented it with Tensorflow. (Using MNIST data)

The Questions are :

  1. When I experimented, AdamOptimizer performed better than MomentumOptimizer.
    Is this just MNIST? I do not yet have an experiment with CIFAR.

  2. In the case of dropout, I apply only to the bottleneck layer, not to the transition layer. is this right?

  3. Does Batch Normalization only apply when training? Or does it apply to both test and training?

  4. I wonder what global average pooling is.
    And I wonder how to do it in tensorflow.

Please advise if you have any special reason.
And if you can see the tensorflow code, I'd like you to see if I implemented it correctly.
https://github.com/taki0112/Densenet-Tensorflow

Thank you

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Traceback (most recent call last):
File "densenet.py", line 162, in
run()
File "densenet.py", line 160, in run
run_model(data, image_dim, label_count, 40)
File "densenet.py", line 94, in run_model
current, features = block(current, layers, 16, 12, is_training, keep_prob)
File "densenet.py", line 72, in block
current = tf.concat(3, (current, tmp))
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1061, in concat
dtype=dtypes.int32).get_shape(
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 611, in convert_to_tensor
as_ref=False)
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 676, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 121, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 376, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

can you tell me , what's the problem?

ImageNet test

I'm trying to make DenseNet for ImageNet dataset. But, it doesn't converge well.
Have you ever try DenseNet to ImageNet dataset?
Please share it if you have any successful densenet network for imagenet.

cifar validation loss decrease than increase after learning rate change

Hello, I have one question when training denseNet: the validation loss get a sharp decrease than increase after learning rate changed from 0.1 to 0.01
I trained the densenet (depth_40_k_12) on cifar100 by tensorflow implementation
https://github.com/YixuanLi/densenet-tensorflow
I just modifed the code follow your data augmentation step (subtract channel mean, then divide by std)
However the validation loss seems werid(In figure)I have following two questions

cifar100_d_40_k_12
(1) Do you met the same problem when training cifar100 dataset(or it may be some tensorflow implementation error)
(2) Did your validation loss include L2 loss part?
Since the validation error seems no problem here(25.53%, 1.1% higher than that in the paper)
Thanks in advance

Why not share the first BN and ReLU?

Hi,

The features go through BN-ReLU-Conv-BN-ReLU-Conv , then concatenate the features from different layers. Since BN is applied to each channel, and ReLU applies element-wisely. Why not share the first BN-ReLU? The features go through Conv-BN-ReLU-Conv-BN-ReLU, then concatenate the output of ReLU features? Is their any difference?

Thanks.

Pretrained weights for the 0.8M parameters config

Hi,
could you please upload weights for imageNet for DenseNet-BC(L=100, k=12) which has only 0.8M parameters? It is compact and when we expand the network for the task of semantic segmentation, this really helps for controlling the number of parameters.

If you know any other resources for finding weights for this config, I would be grateful if you let me know.

question about standardization

I find that in the training and testing phase, the dataset is standardized as a whole batch, including computing the mean and variance from the entire cifar10 dataset.
However, when the model is deployed, the image is feed individually, how should we preprocess the image?
What mean and variance should we use when the input image is of the same category but not included in the cifar10 dataset?

Parameters and computation

Hi there and great work! I've actually also figured out the very same concept myself prior to finding out you guys have already tested and published it. ✍(◔◡◔) Some of the design decisions I've made were different, so I'd like to compare.

Where you're reporting results on Cifars, if you could also add the number of parameters you are using and, possibly, an estimated amount of computation, that would be highly beneficial. It's really necessary for serious comparisons and ability to perfect even this very architecture. Also, if you could add your training logs that would also be of great insight.

As for how to measure the amount of computation, that's quite a tough thing to do, so I'd recommend to at least measure training time, which is a very inexact measure, but, well, provides at least some insights.

I've had 19.5% on Cifar-100+ with mean and std not adjusted (whole dataset just scaled to [0..1] values) with 24m params and forward+backwards running for 220 sec / epoch on GTX Titan X with the best dense-type architecture that I designed prior (I could only experiment on a single GTX Titan X, don't really have a lot of computational resources). It didn't have preactivation. It would most likely at least match the results those you've published for DenseNet (L=100, k=24) CIFAR-100+ if I used the right dataset (with std and mean adjusted). My code https://github.com/ibmua/Breaking-Cifar/blob/master/models/hoard-2-x.lua (uses 4-spaced tabs. To achieve that result I used depth=2 sequences=2 , here's a log of the end of training https://github.com/ibmua/Breaking-Cifar/blob/master/logs/load_59251794/log.txt ). Mind that I used groups, which are only accessible via Soumith's "cudnn", so if you'll want to try this you probably want to clone the whole thing. Also, not that I didn't use any Droupout (haven't even tried)

What is proper way of counting parameters?

Hi author, as you claimed in both repo and paper, the numer of parameters of densenet-100-12 is 7.0M and densenet-100-24 is 27.72M. However when I examine the parameters in following way

-- main.lua, line 32
-- Create model
local model, criterion = models.setup(opt, checkpoint)

params = model:getParameters()
print(#params)

I got 4.06M for densenet-100-12 and 16.11M for densenet-100-24. Did I count it in a wrong way?

DenseNet on Pascal VOC

Hi, I think DenseNet is a promising model and I try to take it as the backbone of Faster R-CNN in object detection. I chose DenseNet169 pretrained on ImageNet to replace the ResNet50 backbone of Faster R-CNN and used the same hyperparameters configuration as ResNet50 version of Faster R-CNN. However, the training result of DenseNet version is worse than ResNet50 version(roughly 3% lower on VOC2012 test). Can you help me analyse the reason or give me any advice? @liuzhuang13 Many thanks!

about nninit package

I ran into a bug said 'could not found nninit package', I am not sure if this package is what I need.
It seems like the package is not used at all,shoud we just remove it?
And......Since the nninit package is not defaultly installed, I think the README file should mention it.

error when loading pretrained model??

When i load the model with 201 layers which are pretrained on ImageNet, it output error message below:

torch/install/share/lua/5.1/nn/SpatialMaxPooling.lua:47: attempt to index field 'THNN' (a nil value)

it seems that the version of the pretrained model is not compatible with the latest torch packages. By the way, i am using the latest version of torch packages. can you provide a pretrained model using the lastest version of torch?

Median of best test error or test error after training?

Hi, guys. In your paper, you compared your results with some other methods. Did you report the median of best test error during training or median of test error after training? What is the common way to report results?
Besides, why did not you report the results using both data augmentation and dropout?

Add layer bugs

Hi,

check out the code at

https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua#L54

param requires nOutChannels, but you pass growth rate.

Also add layer only adds direct connection to the next layer, so you form n-1 connections, as far as I understood from your paper it should have n-2, n-3 , etc... for every layer in the block.
I'm guessing this is not the version you used to train the models.

Thanks for sharing.

validation top1 error is odd

Hi
I trained a densenet according the paper on cifar10.The top1 error of test set is odd. The result is saved in this file.
net-train.pdf
How can i make the top1 error curve smooth ?
Thanks

I tried to reproduce Wide-DenseNet-BC results on cifar10, but got 0.5% more than your error

I tested Wide-DenseNet-BC (L=40, k=48) on CIFAR-10 augmentation, see https://github.com/seasonyc/densenet/blob/bf99d7f459ca7754c37ff58c6610eb76e93f7990/cifar10-test.py#L217 in https://github.com/seasonyc/densenet
but could only get 4.5% error rate.

I tried to tune some hyper parameters, e.g. dropout, weight decay, learning rate... but always couldn't get better result. Now I am testing to follow the lr decay of wide resnet training, i.e. initial 0.1, by 0.2 per 60 epochs, but I very suspect if it will take effect...

Would you like to give me any suggestion for it?

Thanks
YC

Why 3 dense blocks, instead of downsampling

Hey there,

First of all, let me congratulate the authors. This is a very solid architecture that resembles cortical computation.

I have a question regarding the choice of dense blocks.
Due to the spatial size of the feature maps, dense connections are partitioned into blocks, creating iso-resolution maps in each block and transition layers that downsample between blocks.

Another option would be getting rid of blocks, connecting every layer with every other layer regardless of spatial size by using downsampling when there is a resolution mismatch.

Is there an experimental (i.e. worse performance, overfitting) or computational (i.e. more parameters) reason for not reporting this?

Thanks,
Ozgur

DenseNet architecture question

I may be misunderstanding the architecture, but why does DenseNet decide to concatenate feature maps from the current layer to pass backward instead of using "true" residual connections?

DenseNet structure on imagenet

Hi author,

I notice that in your experiment on imagenet, instead of having same repeated layers in each block, you set different layers for each block. Do you design it following some patterns, or just trying different combinations?

DenseNet on ImageNet

I've just read your paper which is really interesting.
I was wondering whether you tried learning a DenseNet version on ImageNet ?
Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.