The vsad from wangzheallen

vsad's Introduction

this is the release code for Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition:

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
Zhe Wang, Limin Wang, Yali Wang, Bowen Zhang, and Yu Qiao

The performance is as below:

acc	MIT_indoor	SUN397
mean:	78.5	63.5
VLAD:	83.9	70.1
FV	83.6	69.0
VSAD	84.9	71.7

Note: The encoding method based on our scene_patchenet feature surpass human performance on sun397(68.5%).

Feature

we released the concise and effective feature for MIT indoor feature, it is notated as hybrid_PatchNet+VSAD in the paper which obtains 86.1 accuracy. You can use it as baseline or as complementary feature for further study.

acc on MIT	dimension	storage
86.1	1002562*2	1.9G

Model

Our trained scene_patchnet and object_patchenet, the model is based on cudnn_v4, if your system is based on cudnn_v5, you can use the code below cudnn_v4 to cudnn_v5: https://github.com/yjxiong/caffe/blob/action_recog/python/bn_convert_style.py

acc	Top5
Object_patchnet_on_ImageNet:	85.3
Scene_patchnet_on_Places205:	82.7

They both take 128 * 128 patches as input.

Code

mit_hybrid_vsad.mat -- you can use this feature as your baseline or to concatenate for further study, it is only 100*256*2*2 dimensions while performs 86.1 acc on MIT indoor, you can download from mit_hybrid_vsad.mat
extracting_feature_exmaple.m -- you can use this code as template to extract scene_patchnet_feature or object_patchnet_probability, for scene_patchnet_feature it is global average pool feature and for for object_patchnet_probability it is fully connnect feature with softmax function
for_encoder_scene67.mat -- serve as assist to your handle on MIT_indoor dataset, from vl_feat
for_encoder_sun397.mat -- serve as assist to your handle on sun397 dataset
mit_pca.mat -- our generated scene_patchnet_feature pca matrix for mit indoor, used in vsad_encoding_example.m
mit_vsad_codebook.mat -- our generated semantical codebook for mit_indoor, used in vsad_encoding_example.m
multi_crop.m -- dense crop as 10 * 10 grid, used in extracting_feature_example.m
object_selection_256.mat -- 256 objects selected from 1000(in ImageNet), applied to both MIT_indoor and SUN397
sun_pca.mat -- our generated scene_patchnet_feature pca matrix for sun397, used in vsad_encoding_example.m
sun_vsad_codebook.mat -- our generated semantical codebookfor sun397, used in vsad_encoding_example.m
vsad_encoding_example.m -- an example for VSAD encoding algorithm
vsad_encoding.m -- our developed VSAD encoding function
plot_mit_sun.m -- Plot the figure in the below of this page
xticklabel_rotate.m -- Serve for plot_mit_sun and rotate the text in the figure

Usage

1. Download code and model

2. Extract scene_net_feature and object_net_probability (extracting_feature_example.m, multi_crop.m)

3. VSAD encoding (vsad_encoding.m, vsad_encoding_example.m, mit_pca.mat, mit_vsad_codebook.mat, object_selection_256.mat)

Contact

Figure Plot for Reference

vsad's People

Contributors

Stargazers

Watchers

vsad's Issues

The download link for mit_hybrid_vsad.mat is broken.

The download link for mit_hybrid_vsad.mat is broken. Could you fix it please？

A small question

Thank you very much for providing the model and code.But I have a question,will the caffe installed under the 32-bit operating system can run the program?

Feature dimension of final classifier

@wangzheallen Thanks for sharing your interesting work!
I have a little question about the feature dimension of the final SVM classifier.
According to your paper and code, the dimension of vsad code ( i.e. the features for SVM classifier) is 2*len(f)*len(p) , where f denotes descriptors from your scene-PatchNet's feature layer(with dimension
of 100 reduced from1024 ) and p denotes codewords from object-PatchNet's softmax layer(i.e. probabilities with dimension of 256 reduced from 1000 ).
Even if dimension reduction is employed, the feature dimensions for SVM are still very high(say, 2×100×256=51200 according to your paper). Will this cause some problems to the final classification? What do you think? (Let me know if i was wrong somewhere! )
Thanks in advance!

Thank you for your code.I have a question,If I want to use this project, I need to train a new model by myself just use the file"model"?Is it right? And if so ,I need to use MATLAB with caffe also？ Thank you for your reply

Recommend Projects

wangzheallen / vsad Goto Github PK

vsad's Introduction

Feature

Model

Code

Usage

vsad's People

Contributors

Stargazers

Watchers

Forkers

vsad's Issues

The download link for mit_hybrid_vsad.mat is broken.

A small question

Feature dimension of final classifier

Thank you for your code.I have a question,If I want to use this project, I need to train a new model by myself just use the file"model"?Is it right? And if so ,I need to use MATLAB with caffe also？ Thank you for your reply

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent