zhengshou / autoloc Goto Github PK

AutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos. ECCV'18.

License: MIT License

Python 4.93% Smarty 0.12% CMake 1.22% Makefile 0.27% Dockerfile 0.03% HTML 0.08% CSS 0.10% Jupyter Notebook 57.08% C++ 33.09% Shell 0.29% Cuda 2.41% MATLAB 0.36%

autoloc's People

Contributors

Stargazers

Watchers

Forkers

suhaisheng timmywy2 killsking betagoing super-ljg asankagp medical-utralsonic-object-detection wanboyang seulty xzxedu ywmsama xujinglin mymuli

autoloc's Issues

video width

Hi, many thanks for your work! May I ask what is the format of video width when used to clip [x1,x2]. I thought the video width is a number that shows the number of snippets, but it seems to be an array in the code.

Could you share your final caffemodel for test on both datasets?

Dear author:

Really great work!

Could you share the final caffemodel on both datasets so that we can generate the detection result without training by ourselves?

That would be really nice of you!

File missing？

Dear author:

When I tried to compile your BVLC_Caffe, I encountered a bug which told me that "caffe/layers/log_layer. hpp " was missing. I downloaded it from GITHUB of BVLC and covered it with your version. After that, I successfully compiled Caffe.

I'm not sure if this is a bug or if there's something wrong with my operation.

Thanks a lot！

当我尝试去编译你的BVLC_CAFFE时，我遇到了一个bug，它提示我缺少了 "caffe/layers/log_layer.hpp"这个文件，我从BVLC的GITHUB上下载了CAFFE并用你们的版本覆盖他的版本后，我成功的编译了CAFFE。不知道这是不是一个问题？顺便感谢~

Regarding to paper: why learning will gives better result than the supervision itself?

Hi,

Thanks for the nice work. I thought about the question a lot, but can't figure out why. If I may, can I take several minutes from you to answer that?

The thing is: I have the impression that, there are no shared parameters between localization branch and classification branch, so classification branch should be trained firstly, and then the localization branch, right? That is to say, OIC /layer or OIC selection is serving as the supervision for the localization branch. If all are correct, I don't understand why the autoloc result is better than the supervision(OIC selection)? How it outperforms the supervision if no shared parameters allowed?

That confused me a lot, and also hard for me to find the answer in the paper. I really appreciate it if you can help. Hope to get you back soon!

Thanks!
June

Code release date

Hello,

Is there a scheduled release date for the code?

Thank you,

label tool

Which temporal annotation tool did you use?Thank you.

One drive feature is expired.

Hi, I tried to download features from OneDrive. However, If found this message:

This item might not exist or is no longer available
This item might have been deleted, expired, or you might not have permission to view it. Contact the owner of this item for more information

Could you check and re-upload the feature, please?

Thank you.

'bbox_pred' layer shape mismatch when loading the pretrained model from autoloc_iter_200.caffemodel

Dear author:
I got shape mismatch error when loading the pretrained model from your autoloc_iter_200.caffemodel.
The error is:
F1227 23:08:33.047071 149335 net.cpp:759] Cannot copy param 0 weights from layer 'bbox_pred'; shape mismatch. Source param shape is 26 128 1 1 (3328); target param shape is 12 128 1 1 (1536). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

The num_output of layer 'bbox_pred' is calculated by 2 * len(cfg.TRAIN.ANCHOR_SCALES) and should be 12*128*1*1, but the given shape from the caffemodel is 26*128*1*1. Maybe you train the model with 13 anchors not 6 anchors that reported in the config.

So how to fix this problem?

What's the _frame_prior for when calculated the oic loss?

Dear author:

I read the code about oic_loss layer. I am wondering what's the purpose of _frame_prior when you select the most likely anchor per temporal position?

Why don't you use the original score instead of calculating ref_score using following code?

len_frame = inner_frames[1, :, x] - inner_frames[0, :, x]
ref_scores = cur_scores * norm.pdf(len_frame, *cur_prior)
a = np.argmax(ref_scores)

number of validation videos

Dear author:
Why there are only features of 2304 validation videos provided, while in your paper it is claimed that there are 2383 videos in validation set.

Looking forward to your answer! Thanks a lot!

zhengshou / autoloc Goto Github PK

autoloc's People

Contributors

Stargazers

Watchers

Forkers

autoloc's Issues

video width

Could you share your final caffemodel for test on both datasets?

File missing？

Regarding to paper: why learning will gives better result than the supervision itself?

Code release date

label tool

One drive feature is expired.

'bbox_pred' layer shape mismatch when loading the pretrained model from autoloc_iter_200.caffemodel

What's the _frame_prior for when calculated the oic loss?

number of validation videos

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent