forresti / squeezenet Goto Github PK

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters

License: BSD 2-Clause "Simplified" License

squeezenet's Introduction

The Caffe-compatible files that you are probably looking for:

SqueezeNet_v1.0/train_val.prototxt          #model architecture
SqueezeNet_v1.0/solver.prototxt             #additional training details (learning rate schedule, etc.)
SqueezeNet_v1.0/squeezenet_v1.0.caffemodel  #pretrained model parameters

If you find SqueezeNet useful in your research, please consider citing the SqueezeNet paper:

@article{SqueezeNet,
    Author = {Forrest N. Iandola and Song Han and Matthew W. Moskewicz and Khalid Ashraf and William J. Dally and Kurt Keutzer},
    Title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$0.5MB model size},
    Journal = {arXiv:1602.07360},
    Year = {2016}
}

Helpful hints:

Getting the SqueezeNet model: git clone <this repo>. In this repository, we include Caffe-compatible files for the model architecture, the solver configuration, and the pretrained model (4.8MB uncompressed).
Batch size. We have experimented with batch sizes ranging from 32 to 1024. In this repo, our default batch size is 512. If implemented naively on a single GPU, a batch size this large may result in running out of memory. An effective workaround is to use hierarchical batching (sometimes called "delayed batching"). Caffe supports hierarchical batching by doing train_val.prototxt>batch_size training samples concurrently in memory. After solver.prototxt>iter_size iterations, the gradients are summed and the model is updated. Mathematically, the batch size is batch_size * iter_size. In the included prototxt files, we have set (batch_size=32, iter_size=16), but any combination of batch_size and iter_size that multiply to 512 will produce eqivalent results. In fact, with the same random number generator seed, the model will be fully reproducable if trained multiple times. Finally, note that in Caffe iter_size is applied while training on the training set but not while testing on the test set.
Implementing Fire modules. In the paper, we describe the expand portion of the Fire layer as a collection of 1x1 and 3x3 filters. Caffe does not natively support a convolution layer that has multiple filter sizes. To work around this, we implement expand1x1 and expand3x3 layers and concatenate the results together in the channel dimension.
The SqueezeNet team has released a few variants of SqueezeNet. Each of these include pretrained models, and the non-compressed versions include training protocols, too.

SqueezeNet v1.0 (in this repo), the base model described in our SqueezeNet paper.

Compressed SqueezeNet v1.0, as described in the SqueezeNet paper.

SqueezeNet v1.0 with Residual Connections, which delivers higher accuracy without increasing the model size.

SqueezeNet v1.0 with Dense→Sparse→Dense (DSD) Training, which delivers higher accuracy without increasing the model size.

SqueezeNet v1.1 (in this repo), which requires 2.4x less computation than SqueezeNet v1.0 without diminshing accuracy.

Community adoption of SqueezeNet:

SqueezeNet in the MXNet framework, by Guo Haria

SqueezeNet in the Chainer framework, by Eddie Bell

SqueezeNet in the Keras framework, by dt42.io

SqueezeNet in the Tensorflow framework, by Domenick Poster

SqueezeNet in the PyTorch framework, by Marat Dukhan

SqueezeNet in the CoreML framework

Neural Art using SqueezeNet, by Pavel Gonchar

SqueezeNet compression in Ristretto, by Philipp Gysel

If you like SqueezeNet, you might also like SqueezeNext! (SqueezeNext paper, SqueezeNext code)

squeezenet's People

Contributors

Stargazers

Watchers

Forkers

ml-lab deepmodel yanweifu baiyancheng20 tybxiaobao skylook terrychenism liulei2776 codeaudit buaaswf westamine caomw shyamalschandra eriche2016 guoyilin zhuikonger hnkulkarni liangdu hariag ericustc briantxbai liyancas lvchigo alex-lw zhouyzzz kltsyn openipd shaoli-huang papamadeleine2022 hyzcn bygreencn tfwu nueluno cloudherods loliod dorniwang pinglmlcv yashwant linzhineng mldl kaynewest chrisyang amos-zq hoangt gujunli jassonvia voidrank sunxingxingtf zhengfangwu tammyyang chenglongchen lochappy waterxt hchengwang songhan fervorarc walkoncross iamzhangzhuping ishtiakzaman zhangyangang liuhuiwisdom mydude mikkel wranglerwong khemanta jimmyoic anjith2006 yossibiton leliaonvidia thomasdic2000 jianweilin shannonyu clear-datacenter countlessmelons iamwx lijiannuist remyyang yiweichen04 zhangyuancv apprisi yangjunpro nrupatunga chakkritte brettll milestonesvn stoneyang-cv vikingmew gaffey wenyafei4 kmfeng ae86208 oztc coderx7 chemccandless weilamchung yzli srinivasgowda097 arasharchor alexwangbin issac8huxley

squeezenet's Issues

SqueezeNet out of memory with batch size (512) smaller than AlexNet (1024)

Hi, it's magic to see squeeze the parameters so much, great work. Two issues when I "caffe time" the model in Titan X:

SqueezeNet is slower than AlexNet with the same batch size of 256;
SqueezeNet:

I0722 09:33:39.867264 18424 caffe.cpp:377] Average Forward pass: 128.444 ms.
I0722 09:33:39.867269 18424 caffe.cpp:379] Average Backward pass: 307.341 ms.
I0722 09:33:39.867275 18424 caffe.cpp:381] Average Forward-Backward: 436.085 ms.

AlexNet:
I0722 09:34:11.348625 18438 caffe.cpp:377] Average Forward pass: 91.4737 ms.
I0722 09:34:11.348630 18438 caffe.cpp:379] Average Backward pass: 175.433 ms.
I0722 09:34:11.348635 18438 caffe.cpp:381] Average Forward-Backward: 267.041 ms.

SqueezeNet out of memory with batch size (512) smaller than AlexNet (1024)

Did I do something wrong or these are the issues after increasing the # of layers?

Thank you so much.

SqueezeNet v1.1 with Residual Connections with Dense→Sparse→Dense (DSD) Training

Hi, is there any plan to, or is it possible to provide the model for SqueezeNet v1.1 with Residual Connections trained via Dense→Sparse→Dense (DSD) Training?

SqueezeNet with Deep Compression

how to compress the SqueezeNet with deep compression? Can you share the training code of SqueezeNet with deep compression?

deploy.prototxt

Do you have a plan to release deploy.prototxt?
I plan to run classify.py from Caffe to test squeezenet performance for images I have and compare it with alexnet

tensorflow- After hundreds of epochs, my total_loss stay around 0.6~0.7, and not decreased

I wrote my own network, when start training my network, my total_loss decreased to 0.6~0.7 and does not goingdown any more everytime. I try to finetuning bias and stddev, but nothing worked.
And What is more, the accuracy does not increase, it is like some random number...
what can i do to fix the network?

build my own sqeezenet

@forresti I want to use your idea to sqeeze my own big network, but i have no idea how to implement it. Can you give me some ideas to sqeeze any big model?? Can I use the original model to finetune the sqeezed model??

regarding performance improvement for AlexNet

Hi,
I want to know the performance improvement for SqueezeNet with reference to AlexNet.
Any idea?

Thanks.
William. J.

why the conv10 in SqueezeNet1.0 has a pad of 1?

conv10 is a layer which has only 1*1 conv filters,could anyone please tell me why the pad is set to 1?thanks

1.1 deploy.prototxt

crop_size is 227 instead of 224?

Why v1.1 does not have deploy.prototxt

Hello,
I am wondering why v1.1 does not have deploy.prototxt. Missed commit? Thank you.

The SqueezeNet deploy.caffemodel files have all 0.0 weight and bias data

The comments on the SqueezeNet models indicate that the .caffemodel files include the training data. The deploy.caffemodel files' weights and biases are 0.0. Is this intentional? It wasn't evident from the comments. I see the squeezenet_v1.1.caffemodel does have non-zero weights and biases.

Look at any of the parameter data

import caffe
import numpy
nnet = caffe.Net("deploy.prototxt",1, weights= "deploy.caffemodel")
net.params['conv1'][0].data[...] # weights
net.params['conv1'][1].data[...] # biases

Fine-tuning SqueezeNet

When fine-tuning SqueezeNet, should some layers be frozen?

Thanks

Darknet/YOLO or Caffe/R-FCN + Squeezenet

I think a very interesting combination with SqueezeNet is RFCN or YOLO for object detection. I'm trying to port SqueezeNet from Caffe to Darknet + YOLO.

Could someone help to review it?

It is a port from v1.1

squeezenet.cfg

[net]
batch=64
subdivisions=1
height=227
width=227
channels=3
momentum=0.9
decay=0.0005

learning_rate=0.001
policy=steps
steps=20,40,60,80,20000,30000
scales=5,5,2,2,.1,.1
max_batches=40000

[crop]
crop_width=227
crop_height=227
flip=0
angle=0
saturation = 1.5
exposure = 1.5

# SqueezeNet: conv1
[convolutional]
filters=64
size=3
stride=2
activation=relu

# SqueezeNet: pool1
[maxpool]
size=3
stride=2

# SqueezeNet: fire2/squeeze1x1
[convolutional]
filters=16
size=1
activation=relu

# SqueezeNet: fire2/expand1x1
[convolutional]
filters=64
size=1
activation=relu

# SqueezeNet: fire2/expand3x3
[convolutional]
filters=64
size=3
pad=1
activation=relu

# SqueezeNet: fire2/concat
[route]
layers=-3

# SqueezeNet: fire3/squeeze1x1
[convolutional]
filters=16
size=1
activation=relu

# SqueezeNet:fire3/expand1x1
[convolutional]
filters=64
size=1
activation=relu

# SqueezeNet: fire3/expand3x3
[convolutional]
filters=64
size=3
pad=1
activation=relu

# SqueezeNet: fire3/concat
[route]
layers=-3

# SqueezeNet: pool3
[maxpool]
size=3
stride=2

# SqueezeNet: fire4/squeeze1x1
[convolutional]
filters=32
size=1
activation=relu

# SqueezeNet: fire4/expand1x1
[convolutional]
filters=128
size=1
activation=relu

# SqueezeNet: fire4/expand3x3
[convolutional]
filters=128
size=3
pad=1
activation=relu

# SqueezeNet: fire4/concat
[route]
layers=-3

# SqueezeNet: fire5/squeeze1x1
[convolutional]
filters=32
size=1
activation=relu

# SqueezeNet: fire5/expand1x1
[convolutional]
filters=128
size=1
activation=relu

# SqueezeNet: fire5/expand3x3
[convolutional]
filters=128
size=3
pad=1
activation=relu

# SqueezeNet: fire5/concat
[route]
layers=-3

# SqueezeNet: pool5
[maxpool]
size=3
stride=2

# SqueezeNet: fire6/squeeze1x1
[convolutional]
filters=48
size=1
activation=relu

# SqueezeNet: fire6/expand1x1
[convolutional]
filters=192
size=1
activation=relu

# SqueezeNet: fire6/expand3x3
[convolutional]
filters=192
size=3
pad=1
activation=relu

# SqueezeNet: fire6/concat
[route]
layers=-3

# SqueezeNet: fire7/squeeze1x1
[convolutional]
filters=48
size=1
activation=relu

# SqueezeNet: fire7/expand1x1
[convolutional]
filters=192
size=1
activation=relu

# SqueezeNet: fire7/expand3x3
[convolutional]
filters=192
size=3
pad=1
activation=relu

# SqueezeNet: fire7/concat
[route]
layers=-3

# SqueezeNet: fire8/squeeze1x1
[convolutional]
filters=64
size=1
activation=relu

# SqueezeNet: fire8/expand1x1
[convolutional]
filters=256
size=1
activation=relu

# SqueezeNet: fire8/expand3x3
[convolutional]
filters=256
size=3
pad=1
activation=relu

# SqueezeNet: fire8/concat
[route]
layers=-3

# SqueezeNet: fire9/squeeze1x1
[convolutional]
filters=64
size=1
activation=relu

# SqueezeNet: fire9/expand1x1
[convolutional]
filters=256
size=1
activation=relu

# SqueezeNet: fire9/expand3x3
[convolutional]
filters=256
size=3
pad=1
activation=relu

# SqueezeNet: fire9/concat
[route]
layers=-3

# SqueezeNet: drop9
[dropout]
probability=.5

# SqueezeNet: conv10
[convolutional]
filters=1000
size=1
activation=relu

# SqueezeNet: pool10
[avgpool]

# YoLo: output = (5 * 2 + CLASSES) * SIDE^2
[connected]
output=784
activation=linear

# YoLo
[detection]
classes=1
coords=4
rescore=1
side=7
num=3
softmax=0
sqrt=1
jitter=.2

object_scale=1
noobject_scale=.5
class_scale=1
coord_scale=5

I'm no sure how to exactly port these cases:

    weight_filler {
      type: "xavier"
    }

    weight_filler {
      type: "gaussian"
      mean: 0.0
      std: 0.01
    }

Concat layers are strange too, don't know what index use on [route] frame=-?

Wrong Numbers?

Hello,
I'm trying to use this model, but the number in the various strata looks weird.
Shouldn't the input be 3X224x224, as in the paper, instead of 3X227x227?
And what does it mean that the first dimension is 10?

btw, awesome work.

Reproduce AlexNet results.

I'm trying to reproduce this example http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb using SqueezeNet, but for this picture https://github.com/BVLC/caffe/blob/master/examples/images/cat.jpg predicted class is 278 which is n02119789 kit fox, Vulpes macrotis from https://github.com/HoldenCaulfieldRye/caffe/blob/master/data/ilsvrc12/synset_words.txt

Is it normal? Or something is wrong?

Here is full code:

import numpy as np
import matplotlib.pyplot as plt

# The caffe module needs to be on the Python path;
import sys
caffe_root = '/home/myuser/Downloads/caffe'# Change this line !
sys.path.insert(0, caffe_root + 'python')

# If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.
import caffe

caffe.set_mode_cpu()

model_def = '/home/myuser/Desktop/GeneralDBCreator/python/SqueezeNet/SqueezeNet_v1.0/deploy.prototxt'
model_weights = '/home/myuser/Desktop/GeneralDBCreator/python/SqueezeNet/SqueezeNet_v1.0/squeezenet_v1.0.caffemodel'

net = caffe.Net(model_def,      # defines the structure of the model
                model_weights,  # contains the trained weights
                caffe.TEST)     # use test mode (e.g., don't perform dropout)

net.blobs['data'].reshape(1,        # batch size
                          3,         # 3-channel (BGR) images
                          227, 227)  # image size is 227x227

image_path= '/home/myuser/Desktop/GeneralDBCreator/python/cat.jpg' # Change this line !
image = caffe.io.load_image(image_path)

mu= np.array([104.0069879317889, 116.66876761696767, 122.6789143406786])
#mu= np.array([104, 117, 123])

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)           # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR
transformed_image = transformer.preprocess('data', image)
net.blobs['data'].data[...] = transformed_image

output = net.forward()

output_prob = output['prob'][0]  # the output probability vector for the first image in the batch

print 'predicted class is:', output_prob.argmax()

labels_file = '/home/myuser/Desktop/GeneralDBCreator/python/synset_words.txt'
labels = np.loadtxt(labels_file, str, delimiter='\t')
print 'output label:', labels[output_prob.argmax()]

ReLU after classifier

Hi, I have noticed that you put ReLU after classifier, which is not a common practice. Is there some reason for it?

layer {
  name: "conv10"
  type: "Convolution"
  bottom: "fire9/concat"
  top: "conv10"
  convolution_param {
    num_output: 1000
    kernel_size: 1
    weight_filler {
      type: "gaussian"
      mean: 0.0
      std: 0.01
    }
  }
}
layer {
  name: "relu_conv10"
  type: "ReLU"
  bottom: "conv10"
  top: "conv10"
}
layer {
  name: "pool10"
  type: "Pooling"
  bottom: "conv10"
  top: "pool10"
  pooling_param {
    pool: AVE
    global_pooling: true
  }
}

Pre-trained model that DSD technique is applied?

Do you have any plan to release a pre-trained SqueezeNet model that DSD technique is applied? Thanks.

SqueezeNet training on cifar

The accuracy is 0, and the loss is too high all the time when I run the model on cifar10.
Do I need to delete avg pooling layer?

Type of squeezenet_v1.0.caffemodel is PCX image (image/x-pcx)?

Hi,
I have downloaded the provided model. But when I looked its properties, type is PCX image not binary??

How can I have a binary type??

Number of anchors reducing mAP

Any intuition why does increasing number of anchors from 9 to 16, decreases mAP by 10%? I was expecting if it does not increase mAP at least it should have kept same. Is it due to replacing FC with conv layer in your SqueezeDet vs. YOLO?

Initialization weights

This work is very exciting! The provided weights does work as expected. The prototxt works out of the box with the default ilsvrc2012 lmdb data that came with caffe's examples.

However, my training loss from scratch has not decreased even after the full 85k iterations. I tried rebuilding the latest version of caffe, running a second time, and increasing the batch size by 4x: none of these attempts seemed to help. Am I correct in understanding that the model is meant to be trained end-to-end without tricks like layer-by-layer training or anything like that?

To help me diagnose my problem, would it be possible for you to provide a reference set of initialization weights caffemodel (or/and one of your earliest intermediate snapshots)?

Thank you for your help!

Any one run SqueezeNet by opencv dnn module??

I tried to run SqueezeNet in opencv dnn module
but i got the opencv error
Assertion failed <dim <= 2> in cv::Mat::reshape..............................

Anyone succesfully run squeezeNet in opencv dnn module??!!

Image Width Issue

I am getting this:

2017-07-14 23:09:09.599393-0500 Light[10405:2914525] [core] Error Domain=com.apple.CoreML Code=1 "Input image feature image does not match model description" UserInfo={NSLocalizedDescription=Input image feature image does not match model description, NSUnderlyingError=0x1c0a5ae50 {Error Domain=com.apple.CoreML Code=1 "Image is not valid width 227, instead is 1280" UserInfo={NSLocalizedDescription=Image is not valid width 227, instead is 1280}}}

training from scratch, random seed

Hi
Can you explain why do we need the random seed? I noticed the random seed for ImageNet classification is set to 34. I also trained a model for face verification, without random seed, it sometimes doesn't converge. But when I set the random seed as 1000, it always converges. Can you explain how to determine random seed when faced with different training tasks?

SqueezeNet benchmark

Hi,

SqueezeNet is really cool architecture! I have added it to my caffenet-variants benchmark and it looks even better than caffenet.
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Architectures.md

Name	Accuracy	LogLoss	Comments
CaffeNet128-2048	0.470	2.36	Pool5 = 3x3,fc6-fc7=2048
CaffeNet128-4096	0.497	2.24	Pool5 = 3x3, fc6-fc7=4096
SqueezeNet128	0.530	2.08	Reference SqueezeNet solver, but linear lr_policy and batch_size=256 (320K iters)
SqueezeNet128+ELU	0.555	1.95	Reference solver, but linear lr_policy and batch_size=256 (320K iters).ELU

Note, that because of speed reasons, I use image size = 128 px, so performances of all nets are degraded compared to classical 227px.

I`d like to suggest a bit different solver setup for SqueezeNet.
According to my tests on caffenet128, linear lr_policy works better, than squared, as in your solver:
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Lr_policy.md

Name	Accuracy	LogLoss	Comments
Step 100K	0.470	2.36	Default caffenet solver, max_iter=320K
Poly lr, p=0.5, sqrt	0.483	2.29	bvlc_quick_googlenet_solver, All the way worse than "step", leading at finish
Poly lr, p=2.0, sqr	0.483	2.299
Poly lr, p=1.0, linear	0.493	_2.24_

Best regards, Dmytro.

SqueezeNet in PyTorch

SqueezeNet 1.0 and 1.1 are now available as a built-in model in the official pytorch/vision repo. In addition to model implementation, I pre-trained SqueezeNet 1.0 and 1.1 on ImageNet for PyTorch model zoo, and accuracy is even slightly better than the original Caffe models:

Model	Top-1 accuracy	Top-5 accuracy
SqueezeNet 1.0	58.000%	80.488%
SqueezeNet 1.1	58.184%	80.514%

Links: code in the repo, discussion is in the PR

Replace global pooling with explicitly defined window

While trying out the SqueezeNet variants (1.0 and 1.1) on a Jetson TX1 dev board with TensorRT 1.0.0, I got the following error:

Parameter check failed in addPooling, condition: windowSize.h > 0 && windowSize.w > 0 && windowSize.h*windowSize.w < MAX_KERNEL_DIMS_PRODUCT
error parsing layer type Pooling index 64

I believe this refers to the following layer in the definitions (identical in both variants):

layer {
  name: "pool10"
  type: "Pooling"
  bottom: "conv10"
  top: "pool10"
  pooling_param {
    pool: AVE
    global_pooling: true
  }
}

I've just got the following advice from NVIDIA:

TensorRT caffe parser doesn't support global pooling, so it's just taking the H and W parameters from the network definition, and those default to 0.
The API check is complaining that there isn't a valid pooling layer definition.
If you replace the global pooling with an explicitly defined window, TensorRT should work.

Alas, I'm not a Caffe expert, so I'm struggling a bit with how to do that. Can anyone suggest please how the SqueezeNet definitions should be updated, so as to maintain the recognition accuracy?

Has anyone successfully trained Squeezenet with residual connections?

I have trained the two versions of Squeezenet, with success, thanks @forresti !

When training the one with residual connections, I am stucked. Whatever learning policy I took, the one shipped in this repo, or the plainly step, I cannot train it to the results given in the paper. The accuracy is a bit lower than Squeezenet v1.0....

I know that I should post this in that repo, but I can't find issues tab there....

Anyone could shed me some light? Thanks in advance!

Padding in conv-10 layer

Hello
I wonder why we need to use pad=1 in conv-10 layer.
What is the goal to use padding with a 1x1 kernel ?
Thanks in advance for your help.
Alex

layer {
name: "conv10"
type: "Convolution"
bottom: "fire9/concat"
top: "conv10"
convolution_param {
num_output: 1000
pad: 1
kernel_size: 1
}
}

Layer for Feature Extraction

Hi, thanks for your sharing first of all.

Which layer is the best for feature extraction? Did you study any test about it?

Thanks.

model convert

I use the tensorflow, how to convert the .caffemodel to .pkl. Thank you!!!

why can not get the output of the prob layer?

I fine-tuned my own data based on the train_val.prototxt in which I change the num_output to 12(I just prepared 12 class person) and the name of conv10 to myconv10. When training, the accuracy reached 1 quickly, like below:

I0122 16:43:48.676445 13661 solver.cpp:218] Iteration 40 (0.0557035 iter/s, 718.088s/40 iters), loss = -nan
I0122 16:43:48.676497 13661 solver.cpp:237] Train net output #0: accuracy = 1
I0122 16:43:48.676512 13661 solver.cpp:237] Train net output #1: accuracy_top5 = 1
I0122 16:43:48.676530 13661 solver.cpp:237] Train net output #2: loss = -nan (* 1 = -nan loss)
I0122 16:43:48.676544 13661 sgd_solver.cpp:105] Iteration 40, lr = 0.03984

but sadly, when doing the prediction, I found the out put of prob layer is nan, here is the result:
output {'prob': array([[[[nan]],

    [[nan]],

    [[nan]],

    [[nan]],

    [[nan]],

    [[nan]],

    [[nan]],

    [[nan]],

    [[nan]],

    [[nan]],

    [[nan]],

    [[nan]]]], dtype=float32)}

did anybody meet this before?

Fine-tuning SqueezeNet

First, Thank you for sharing this awesome work,

I am trying to fine tune SqueezeNet to my own dataset (which is basically a subset of ImageNet labels),

Changes made in order to fine tune, inspired by this:

Changed name of conv10 to conv10-new.
Added param block to conv10-new to increase learning rate for this layer:

  param {
    lr_mult: 5
    decay_mult: 1
  }
  param {
    lr_mult: 10
    decay_mult: 0
  }

Changed conv10-new num_output to my own number of classes
Decrease solver base_lr by a factor of 10 to 0.004

(Tried several numbers, so far the above performed best)

While I was able to do it with AlexNet, with SqueezeNet my accuracy is about 20% lower, any tips for fine tuning?

About the training data resolution

Hi,
I have a question about the training data resolution of SqueezeNet. Is it resized to 256x256 or with the smaller side 256 px. I have use both resolutions to test your pretrained model. Seem 256x256 resolution get a higher accuracy.

conv10 layer has pad 1

As noticed by @milakov, it's quite surprising to see the following 1x1 convolutional layer having the padding of 1:.

layer {
  name: "conv10"
  type: "Convolution"
  bottom: "fire9/concat"
  top: "conv10"
  convolution_param {
    num_output: 1000
    pad: 1
    kernel_size: 1
  }
}

Any comments?

SqueezeNet is slower when using GPU than when using CPU?

I'm trying to measure speed of doing inference on a single image with SqueezeNet. When I run it on CPU, SqueezeNet seems fast enough (comparing to VGG). But when it is on GPU, SqueezeNet gets very slower, even slower than CPU.

Does anyone know why it gets slow on GPU? Should I do something on SqueezeNet when I run it on GPU?

Here are some results of experiments I have made for SqueezeNet vs VGG in terms of their speeds both on CPU and GPU.

On CPU, SqueezeNet is much faster than VGG16.

[inference time]
VGG average response time: 2.21110591888[sec/image]
SqueezeNet average response time: 0.288291954994[sec/image]

On GPU, VGG16 gets really faster, even faster than SqueezeNet. And SqueezeNet gets even slower than it on CPU.

[inference time]
VGG16 average response time: 0.0961683591207[sec/image] # get very fast
SqueezeNet average response time: 1.50337402026[sec/image] # get very slow <= why?

Thanks!

squeezenet of resnet?

the most recent best performance net is resnet. any squeezenet version of resnet?

Train SqueezeNet from scratch ?

Hello guys, I want to train SqueezeNet from scratch with my own data instead of using ImageNet dataset. I would like to ask is that possible or not. I am new to Caffe, I already know Tensorflow but seems like there is no SqueezeNet Tensorflow version support.

SqueezeNet speed slower than Alexnet

@macd @forresti @antingshen @samster25 @terrychenism
Hi, I use command"
caffe.exe time --model=SqueezeNet_v1.1_deploy.prototxt -gpu 0 -iterations 100
" to test the time.
AlexNet:11ms, SqueezeNet:30ms.
even use cudnn v4, the time of SqueezeNet is still twice or even three times than AlexNet.
Do you have any advice?

v1.1 loss does not decrease

This is my training log.

https://gist.github.com/kli-nlpr/e0705a0d58a04178b8e6dbe554e7f072

The traning loss is always about 6.9....

I use the same train_val.prototxt and solver.prototxt as yours.

Thanks.

v1.1 dimensions mismatch in the first layers

Hi, seems that original v1.0 had nice dimension relationships:
227 -> (227-7)/2+1=111 -> (111-3)/2+1=55 etc.
But in v1.1 we start to get:
227 -> (227-3)/2+1=113 -> (113-3)/2+1=56 etc.

To get output of conv1 to be 111, input image should be decreased to 223x223. Not sure how exactly Caffe handles this, but something mismatch in v1.1. Any idea?

Would you alos share your caffe training/val log file as well?

Hello,
Thank you for your contribution and beautiful work.
May I ask you to also share with us the log files for your training ?
Thank you in advance

binarized SqueezeNet

Hi,

Has anyone tried to train a binary-weight/activation model of SqueezeNet? I'm trying to do so with XNOR-Net, but I can't get past 30% top-1 accuracy with binary weights only and 24% top-1 accuracy with binary weights and binary activations. I was expecting top-1 accuracies similar to binarized AlexNet (~50% with binary weights, ~40% with binary weights and binary activations, respectively).

Help is highly appreciated.

Thanks,
Alex

How to use this SqueezeNet question?

Hello, how will Squeezenet used to compress other network models, such as Faster-rcnn, the specific how to operate it, can you explain it in detail? Thank you

Image Preprocessing for stated top5 accuracy

What image preprocessing was performed on the imagenet images to achieve the stated feedforward top5 accuracy?
E.g. resize uniformly to 256 at the smallest dimension, then center crop
I've had a tough time figuring this out, and any help would be much appreciated.
Many thanks!

Comparing GPU memory usage to BVLC reference CaffeNet

Thanks for sharing this work. I am comparing the GPU memory utilization of the BVLC CaffeNet and SqueezeNet. The GPU Memory usage is not what I expect on Ubuntu 14.04 with a Titan X.

Idle:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     337MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   129MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
+-----------------------------------------------------------------------------+

After loading a caffe.Classifier with SqueezeNet's weights and deploy.prototxt with PyCaffe in a Jupyter notebook:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     337MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   131MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
|    0     13713    C   /usr/bin/python                                229MiB |
+-----------------------------------------------------------------------------+

While classiyfing with SqueezeNet: (t = timeit.Timer('net.predict([image], oversample=True).flatten().argsort()[:5]', 'from main import net, image') t.timeit(100):)

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     337MiB |
|    0      2631    G   compiz                                         106MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   137MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
|    0     13713    C   /usr/bin/python                                543MiB |
+-----------------------------------------------------------------------------+

BVLC CaffeNet Comparison

Idle:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     337MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   133MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
+-----------------------------------------------------------------------------+

After creating a CaffeNet caffe.Classifier:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     338MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   139MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
|    0     14231    C   /usr/bin/python                                184MiB |
+-----------------------------------------------------------------------------+

While classiyfing with CaffeNet: (t = timeit.Timer('net.predict([image], oversample=True).flatten().argsort()[:5]', 'from __main__ import net, image') t.timeit(100):)

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     338MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   139MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
|    0     14231    C   /usr/bin/python                                465MiB |
+-----------------------------------------------------------------------------+

SqueezeNet appears to use more GPU memory than the reference BVLC CaffeNet. Am I missing something?

Top-1 Acc=61.0% on ImageNet, without any sacrificing compared with SqueezeNet v1.1.

Hi,
I've trained a new model based on SqueezeNet V1.1, and it achieved 61% top-1 accuracy on ImageNet without sacrificing parameter numbers and efficiency.
I've uploaded my model to this [https://github.com/miaow1988/SqueezeNet_v1.2] repository.
Would you please added my repository to your README.md file, so more people could know this work.

Jie

why cost 2500+ MB when training but the model only 5MB big?

Is much of them is the training image data? thank you!

In V1.1 train_val.prototxt, why are the TRAIN & TEST phase (in loss and accuracy layers) commented out?

Here's the snippet from the train_val.prototxt file for SqueezeNet V1.1. Thank you.

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "pool10"
  bottom: "label"
  top: "loss"
  #include {
  #  phase: TRAIN
  #}
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "pool10"
  bottom: "label"
  top: "accuracy"
  #include {
  #  phase: TEST
  #}
}
layer {
  name: "accuracy_top5"
  type: "Accuracy"
  bottom: "pool10"
  bottom: "label"
  top: "accuracy_top5"
  #include {
  #  phase: TEST
  #}
  accuracy_param {
    top_k: 5
  }
}