Git Product home page Git Product logo

vsad's Introduction

this is the release code for Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition:

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
Zhe Wang, Limin Wang, Yali Wang, Bowen Zhang, and Yu Qiao

The performance is as below:

acc MIT_indoor SUN397
mean: 78.5 63.5
VLAD: 83.9 70.1
FV 83.6 69.0
VSAD 84.9 71.7

Note: The encoding method based on our scene_patchenet feature surpass human performance on sun397(68.5%).

Feature

we released the concise and effective feature for MIT indoor feature, it is notated as hybrid_PatchNet+VSAD in the paper which obtains 86.1 accuracy. You can use it as baseline or as complementary feature for further study.
acc on MIT dimension storage
86.1 100*256*2*2 1.9G

Model

Our trained scene_patchnet and object_patchenet, the model is based on cudnn_v4, if your system is based on cudnn_v5, you can use the code below cudnn_v4 to cudnn_v5: https://github.com/yjxiong/caffe/blob/action_recog/python/bn_convert_style.py

acc Top5
Object_patchnet_on_ImageNet: 85.3
Scene_patchnet_on_Places205: 82.7

They both take 128 * 128 patches as input.

Code

  • mit_hybrid_vsad.mat -- you can use this feature as your baseline or to concatenate for further study, it is only 100*256*2*2 dimensions while performs 86.1 acc on MIT indoor, you can download from mit_hybrid_vsad.mat
  • extracting_feature_exmaple.m -- you can use this code as template to extract scene_patchnet_feature or object_patchnet_probability, for scene_patchnet_feature it is global average pool feature and for for object_patchnet_probability it is fully connnect feature with softmax function
  • for_encoder_scene67.mat -- serve as assist to your handle on MIT_indoor dataset, from vl_feat
  • for_encoder_sun397.mat -- serve as assist to your handle on sun397 dataset
  • mit_pca.mat -- our generated scene_patchnet_feature pca matrix for mit indoor, used in vsad_encoding_example.m
  • mit_vsad_codebook.mat -- our generated semantical codebook for mit_indoor, used in vsad_encoding_example.m
  • multi_crop.m -- dense crop as 10 * 10 grid, used in extracting_feature_example.m
  • object_selection_256.mat -- 256 objects selected from 1000(in ImageNet), applied to both MIT_indoor and SUN397
  • sun_pca.mat -- our generated scene_patchnet_feature pca matrix for sun397, used in vsad_encoding_example.m
  • sun_vsad_codebook.mat -- our generated semantical codebookfor sun397, used in vsad_encoding_example.m
  • vsad_encoding_example.m -- an example for VSAD encoding algorithm
  • vsad_encoding.m -- our developed VSAD encoding function
  • plot_mit_sun.m -- Plot the figure in the below of this page
  • xticklabel_rotate.m -- Serve for plot_mit_sun and rotate the text in the figure

Usage

1. Download code and model

2. Extract scene_net_feature and object_net_probability (extracting_feature_example.m, multi_crop.m)

3. VSAD encoding (vsad_encoding.m, vsad_encoding_example.m, mit_pca.mat, mit_vsad_codebook.mat, object_selection_256.mat)

Contact

Figure Plot for Reference

Alt text

vsad's People

Contributors

wangzheallen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

vsad's Issues

A small question

Thank you very much for providing the model and code.But I have a question,will the caffe installed under the 32-bit operating system can run the program?

Feature dimension of final classifier

@wangzheallen Thanks for sharing your interesting work!
I have a little question about the feature dimension of the final SVM classifier.
According to your paper and code, the dimension of vsad code ( i.e. the features for SVM classifier) is 2*len(f)*len(p) , where f denotes descriptors from your scene-PatchNet's feature layer(with dimension
of 100 reduced from1024 ) and p denotes codewords from object-PatchNet's softmax layer(i.e. probabilities with dimension of 256 reduced from 1000 ).
Even if dimension reduction is employed, the feature dimensions for SVM are still very high(say, 2×100×256=51200 according to your paper). Will this cause some problems to the final classification? What do you think? (Let me know if i was wrong somewhere! )
Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.