Git Product home page Git Product logo

scnn's Introduction

Segment-CNN

By Zheng Shou, Dongang Wang, and Shih-Fu Chang.

Introduction

Segment-CNN (S-CNN) is a segment-based deep learning framework for temporal action localization in untrimmed long videos.

This code has been tested on Ubuntu 14.04 with NVIDIA GTX 980 of 4GB memory for models based on C3D-v1.0 and tested with NVIDIA Titan X GPU of 12GB memory for models based on C3D-v1.1.

Current code suffices to run demo, repeat our experimental results, and train your own models. Please use "Issues" to ask questions or report bugs. Thanks. [ Mar. 2019: we stop maintaining new issues for this repository because many people have successfully reproduced our results and most common questions have been raised and addressed in the closed issues. ]

License

S-CNN is released under the MIT License (refer to the LICENSE file for details).

Citing

If you find S-CNN useful, please consider citing:

@inproceedings{scnn_shou_wang_chang_cvpr16,
  author = {Zheng Shou and Dongang Wang and Shih-Fu Chang},
  title = {Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs},
  year = {2016},
  booktitle = {CVPR} 
  }
  
@article{tran2017convnet,
  title={Convnet architecture search for spatiotemporal feature learning},
  author={Tran, Du and Ray, Jamie and Shou, Zheng and Chang, Shih-Fu and Paluri, Manohar},
  journal={arXiv preprint arXiv:1708.05038},
  year={2017}
}

We build this repo based on C3D and THUMOS Challenge 2014 . Please cite the following papers as well:

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, ICCV 2015.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional Architecture for Fast Feature Embedding, arXiv 2014.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014.

@misc{THUMOS14,
  author = "Jiang, Y.-G. and Liu, J. and Roshan Zamir, A. and Toderici, G. and Laptev, I. and Shah, M. and Sukthankar, R.",
  title = "{THUMOS} Challenge: Action Recognition with a Large Number of Classes",
  howpublished = "\url{http://crcv.ucf.edu/THUMOS14/}",
  Year = {2014}
  }

Installation:

  1. Download ffmpeg from https://www.ffmpeg.org/ to ./lib/preprocess/
  2. Compile 3D CNN:
    • Compile C3D_sample_rate, which is used for the proposal network and classification network
    • Compile C3D_overlap_loss, which is used for the localization network
    • Note that do not need to make unit test cases.
    • Hint: please refer to C3D-v1.0, C3D-v1.1, and Caffe for more details about compilation
  3. Download pre-trained models to ./models/ from Dropbox

Run demo:

  1. change to demo directory: cd ./demo/.
  2. run the demo using the matlab code run_demo.m or the python code run_demo.py.
  3. find the final result in the folder ./pred/final/. either in .mat format (for matlab) or .csv format (for python).
    • Note for the meaning of in seg_swin. Each row stands for one candidate segment. As for each column:
      • 1: video name in THUMOS14 test set
      • 2: sliding window length measured by number of frames
      • 3: start frame index
      • 4: end frame index
      • 5: start time
      • 6: end time
      • 9: confidence score of being the class indicated in the column 11
      • 10: confidence score of being action/non-background
      • 11: the predicted action class (from the 20 action classes [index 1-20] and the background [index 0])
      • 12: sliding window overlap. all 0.25. means using 75% overlap window.
    • Note for the meaning of res:
      • this matrix represents the confidence score on each frame per each class
      • column corresponds to each frame and row corresponds to each action class
      • the size of this matrix: the number of action classes (20 here) by the number of frames

Our pre-trained models and pre-computed results of S-CNN (based on C3D-v1.0) on THUMOS Challenge 2014 action detection task:

  1. Models:
    • ./models/conv3d_deepnetA_sport1m_iter_1900000: C3D model pre-trained on Sports1M dataset by Tran et al;
    • ./models/THUMOS14/proposal/snapshot/SCNN_uniform16_binary_iter_30000: our trained S-CNN proposal network;
    • ./models/THUMOS14/classification/snapshot/SCNN_uniform16_cls20_iter_30000: our trained S-CNN classification network;
    • ./models/THUMOS14/localization/snapshot/SCNN_uniform16_cls20_with_overlap_loss_iter_30000: our trained S-CNN localization network.
  2. Results:
    • ./experiments/THUMOS14/network_proposal/result/res_seg_swin.mat: contains the output results of the proposal network. we keep segment whose confidence score of being action >= 0.7 as the candidate segment to further input into the following localization network;
    • ./experiments/THUMOS14/network_localization/result/res_seg_swin.mat: contains the output results of the localization network;
    • evaluate mAP: run ./experiments/THUMOS14/eval/eval_scnn_thumos14.m and results are stored in ./experiments/THUMOS14/eval/res_scnn_thumos14.mat. we vary the overlap threshold IoU used in evaluation from 0.1 to 0.5

Our pre-trained models and pre-computed results of S-CNN (based on C3D-v1.1) on THUMOS Challenge 2014 action detection task:

  1. Models:
    • ./models/c3d_resnet18_sports1m_r2_iter_2800000.caffemodel: C3D model pre-trained on Sports1M dataset by Tran et al;
    • ./models/THUMOS14/proposal/snapshot/c3d_resnet18_sports1m_r2_iter_27384.caffemodel: our trained S-CNN proposal network;
    • ./models/THUMOS14/classification/snapshot/c3d_resnet18_sports1m_r2_iter_14704.caffemodel: our trained S-CNN classification network;
    • ./models/THUMOS14/localization/snapshot/c3d_resnet18_sports1m_r2_iter_14704.caffemodel: our trained S-CNN localization network.
  2. Results:
    • ./experiments/THUMOS14_Res3D/network_proposal/result/res_seg_swin.mat: contains the output results of the proposal network. we keep segment whose confidence score of being action >= 0.7 as the candidate segment to further input into the following localization network;
    • ./experiments/THUMOS14_Res3D/network_localization/result/res_seg_swin.mat: contains the output results of the localization network;
    • evaluate mAP: run ./experiments/THUMOS14_Res3D/eval/eval_scnn_thumos14.m and results are stored in ./experiments/THUMOS14/eval/res_scnn_thumos14.mat. we vary the overlap threshold IoU used in evaluation from 0.3 to 0.7

Train your own S-CNN model (based on C3D-v1.0):

  1. We provide the parameter settings and the network architecture definition inside ./experiments/THUMOS14/network_proposal/, ./experiments/THUMOS14/network_classification/, ./experiments/THUMOS14/network_localization/ respectively.
  2. We also provide sample input data file to illustrate input data file list format, which is slightly different from C3D:
    • still, each row corresponds to one input segment
    • C3D_sample_rate (used for proposal and classification network):
      • format: video_frame_directory start_frame_index class_label stepsize
      • stepsize: used for adjusting the window length. measure the step between two consecutive frames in one segment. the frame index of the current frame + stepsize = the frame index of the subsequent frame. note that each segment consists of 16 frames in total.
      • example: /dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 3 8
    • C3D_overlap_loss (used for localization network):
      • format: video_frame_directory start_frame_index class_label stepsize overlap
      • overlap: the overlap measured by IoU between the candidate segment and the corresponding ground truth segment
      • example: /dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 3 8 0.70701
  3. NOTE: please refer to C3D-v1.0 and Caffe for more general instructions about how to train 3D CNN model.

Train your own S-CNN model (based on C3D-v1.1):

  1. We provide the parameter settings and the network architecture definition inside ./experiments/THUMOS14_Res3D/network_proposal/, ./experiments/THUMOS14_Res3D/network_classification/, ./experiments/THUMOS14_Res3D/network_localization/ respectively.
  2. We also provide sample input data file to illustrate input data file list format, which is slightly different from C3D:
    • still, each row corresponds to one input segment
    • C3D_sample_rate (used for proposal and classification network):
      • format: video_frame_directory start_frame_index class_label stepsize
      • stepsize: used for adjusting the window length. measure the step between two consecutive frames in one segment. the frame index of the current frame + stepsize = the frame index of the subsequent frame. note that each segment consists of 16 frames in total.
      • example: /dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 3 8
    • C3D_overlap_loss (used for localization network):
      • format: video_frame_directory start_frame_index class_label stepsize overlap
      • overlap: the overlap measured by IoU between the candidate segment and the corresponding ground truth segment
      • example: /dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 3 8 0.70701
  3. NOTE: please refer to C3D-v1.1 and Caffe for more general instructions about how to train 3D CNN model. Res3D uses 8 frames for each clip to produce one label. Because S-CNN samples 16 frames out of multi-scale temporal window which can be up to 512 frames long, we still keep 16 frames for each clip in S-CNN.

scnn's People

Contributors

zhengshou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scnn's Issues

question_score

finetuning my datasets:
I0605 15:54:41.693938 12509 solver.cpp:142] Test score #0: 0.146302
I0605 15:54:41.693992 12509 solver.cpp:142] Test score #1: 15.5048
I0605 15:54:41.694018 12509 net.cpp:355] Serializing 30 layers
I0605 15:54:42.801959 12509 solver.cpp:159] Snapshotting to snapshot/SCNN_uniform16_cls20_iter_30000
I0605 15:54:47.106016 12509 solver.cpp:166] Snapshotting solver state to snapshot/SCNN_uniform16_cls20_iter_30000.solverstate
I0605 15:54:49.264902 12509 net.cpp:355] Serializing 30 layers
I0605 15:54:50.062696 12509 solver.cpp:159] Snapshotting to snapshot/SCNN_uniform16_cls20_iter_30000
I0605 15:54:55.804988 12509 solver.cpp:166] Snapshotting solver state to snapshot/SCNN_uniform16_cls20_iter_30000.solverstate
I0605 15:55:00.132752 12509 solver.cpp:100] Optimization Done.
I0605 15:55:00.132803 12509 finetune_net.cpp:30] Optimization Done.

Test score #0: 0.146302 is accuracy?
Test score #1: 15.5048
Is my result correct?

Detailed instructions on running demo

I want to run the demo using python. Should I also have matlab installed?
Can you also mention the detailed steps to run the demo in CPU
Can you please highlight on how to compile C3D_overlap_loss and C3D_sample_rate.

What is the role of ambiguous class in THUMOS

Hello,

Reading the code, all I can tell is that it is for some kind of reconsideration of detections (you have to do the intervaloverlapvalseconds again for this class), but I still do not get how it would help?

Thank you and best regards

MEXaction2

can you provide a link to download MEXaction2 dataset other than official website

run demo problem about 000001.prob

when i run run_demo.py, the result show no such file or directory:"pred/pro/output/0.00001.prob

I want to ask how to solve this problem.
Many thanks!

The question about eval result

Hi Zheng:
I have read your work carefully.
I attempt to evaluate the test result to 210 test videos base on your demo code, but the result is different to yours:

 |  0.1  | 0.2   |  0.3  |  0.4  |  0.5 |
 |47.7  | 43.5 | 36.3 | 28.7 | 19.0|

my test results used your evaluation code.

 |  0.1  | 0.2   |  0.3  |  0.4  |  0.5 |
 | 26.3 | 26.7 | 24.6 | 22.9 | 18.9|

I don't know why there is such a big different?
Thanks in advance!

prob files in pred/pro/output/ are not created

Hi. I'm running the python demo file and i'm getting this error:

extract frames starts
extract frames done in 10.043 s
init sliding window starts
init sliding window done in 0.0075 s
generate proposal list starts
generate proposal list done in 0.0074 s
run proposal network starts
run proposal network done in 0.0414 s
read proposal results starts
Traceback (most recent call last):
File "run_demo.py", line 277, in
main()
File "run_demo.py", line 95, in main
prob = read_binary_blob(preddir+'pro/output/'+'{0:06}'.format(img_index+1)+'.prob')
File "run_demo.py", line 240, in read_binary_blob
f = open(filename, 'rb')
IOError: [Errno 2] No such file or directory: 'pred/pro/output/000001.prob'

I've seen that it is because the pred/pro/feature_extract.sh file does not write the prob files, but I do not know how to solve it.

Any help will be appreciated

Cannot use C3D v1.1 for demo

Hi Zhengshou,

Thank you very much for your work on S-CNN. I'm currently trying to run the demo using C3D v1.1 but I realized the demo code was written for v1.0.

I have modified ./demo/pred/pro/feature_extract.sh by replacing ../C3D-v1.0/C3D_sample_rate/build/tools/extract_image_features.bin with ../C3D-v1.1/C3D_sample_rate/build/tools/extract_image_features.

However after doing that, the demo_extract.log reflects the following error:

[libprotobuf ERROR google/protobuf/text_format.cc:288] Error parsing text-format caffe.NetParameter: 5:3: Unknown enumeration value of "VIDEO_DATA" for field "type".
F0503 16:05:01.509085 32524 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: pred/pro/demo_finetuning_feature_extract.prototxt
*** Check failure stack trace: ***
    @     0x7fd571fdd0ed  google::LogMessage::Fail()
    @     0x7fd571fdf056  google::LogMessage::SendToLog()
    @     0x7fd571fdcc1d  google::LogMessage::Flush()
    @     0x7fd571fdfa2a  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fd5729869d1  caffe::ReadNetParamsFromTextFileOrDie()
    @     0x7fd57291b3b7  caffe::Net<>::Net()
    @           0x403ce2  feature_extraction_pipeline<>()
    @     0x7fd5716db830  __libc_start_main
    @           0x403079  _start
Aborted (core dumped)

I am aware that C3D-v1.0 prototxt files will not work with v1.1 as stated here. However I do not know how to modify it to work. I have tried using "VideoData" and VideoData as suggested in the official example but with no success.

Reference for NMS

In the paper, you have mentioned about using NMS for post-processing. Can you give me any references which explains NMS in detail.

proposal network 的标签如何确定?

您好!

感谢您的分享

第3.2章节,确定segments标签时,大于0.7为positive,小于0.7但是大于0.5的为positive,小于0.3的为negative.
如果大于0.3小于0.5的情况下,如何确定是positive,还是negative呢?
期待您的回复,感谢!

Share data generation scripts

Hi @zhengshou , thanks for your sharing.

Could you please provide scripts to generate train and test file lists? It would help me a lot. My interest is just for academic purposes.

Thanks in advance.

question about "extract feature on my test dataset"

Hi Zheng,
I have some question about "extract feature on my test dataset"
in the "list_test_uniform16_proposal.lst",whether the label column need to be a ground truth or random setting?
in the "list_test_uniform16_localization.lst",whether the label and overlap column need to be a ground truth or random setting?
because when i run the demo ,I find the label and overlap column are random settiing as all 0 in the "demo_list_test_uniform16_localization.lst" and "demo_list_test_uniform16_proposal.lst"
I would appreciated for some advice,very thanks!

Question of the training details of training my own models

Hi,zhengshou
Thank you for your replying.I'm so sorry to trouble you again. I'm very interested in your work. So I have some questiones of how to prepare the training lst of the proposal networks and the details of the training procedures.
I write the code to generate the lst, but I think there maybe some misunderstanding.
Fisrst of all, in my code, for the all videos (trimmed videos and untrimmed videos), I have conduct the sliding windows,and set the label of every segments of the trimmed videos to 1(positive label), and set the instance of the untrimmed videos to 1(positive label),and the background to 0(negative label). Am I right?
If I am right, there are a lot of rows, at least 100k rows in the lst. But as mentioned in your paper, all three stages, the training itrations are 30k. So I think my code to generate the training lst is wrong.
So, my questiones as follows
1 how to use the trimmed videos, do I need to use the sliding windows to the trimmed windows? If not , how to generate the trianing lst of the trimmed videos?
2 For the untrimmed videoes, as the videos are more longer than the trimmed videos, and there will be a lot of segments for each videos, do I need to use them all? here is my lst sample
BaseballPitch/v_BaseballPitch_g25_c07/ 73 1 2
BaseballPitch/v_BaseballPitch_g25_c07/ 81 1 2
BaseballPitch/v_BaseballPitch_g25_c07/ 1 1 4
BaseballPitch/v_BaseballPitch_g25_c07/ 17 1 4
BaseballPitch/v_BaseballPitch_g25_c07/ 33 1 4
BaseballPitch/v_BaseballPitch_g25_c07/ 49 1 4
BaseballPitch/video_validation_0000266/ 1 0 1
BaseballPitch/video_validation_0000266/ 5 0 1
BaseballPitch/video_validation_0000266/ 9 0 1
BaseballPitch/video_validation_0000266/ 13 0 1
BaseballPitch/video_validation_0000266/ 17 0 1
BaseballPitch/video_validation_0000266/ 21 0 1
BaseballPitch/video_validation_0000266/ 25 0 1
BaseballPitch/video_validation_0000266/ 29 0 1
BaseballPitch/video_validation_0000266/ 33 0 1
BaseballPitch/video_validation_0000266/ 37 0 1
please help me to check out is there some mistakes in my lst.
hope your reply,thanks
best wishes
Tim

finetune_proposal network

Hi,
I have some questions about finetuning S-CNN (proposal network).
list_train_uniform16_proposal.lst:
1./dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 1 8
2./dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2593 1 8
3./dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 1985 1 16
4./dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2049 1 16
/dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 4161 1 16
/dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 4225 1 16
1.2561 is the index of the action start frame , 1 is postive label ,8 is stepsize. 2593-2561=32,why stepsize=8?
2.example:2561 is the index of the action start frame,stepsize is 16 up to 512,computing IOU with ground truth → label is postive or background ?
/dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 1 16
/dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 1 32
/dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 1 64
/dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 1 128
/dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 1 256
/dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 1 512

Is my understanding correct?
Thanks very much.

extract_image_features.bin

I1130 15:35:33.930305 25336 net.cpp:322] Copying source layer fc7-1-convdeconv
I1130 15:35:34.657407 25336 net.cpp:322] Copying source layer relu7
I1130 15:35:34.657439 25336 net.cpp:322] Copying source layer drop7
I1130 15:35:34.657449 25336 net.cpp:322] Copying source layer predict
I1130 15:35:34.661358 25336 net.cpp:322] Copying source layer loss
E1130 15:35:34.663674 25336 extract_image_features.cpp:72] Extracting features for 1 batches
I1130 15:35:34.852460 25502 video_segmentation_data_layer.cpp:204] Restarting data prefetching from start.
E1130 15:35:35.188673 25336 extract_image_features.cpp:108] Extracted features of 4 images.
E1130 15:35:35.188693 25336 extract_image_features.cpp:112] Successfully extracted 4 features!
*** Error in `../CDC/build/tools/extract_image_features.bin': double free or corruption (out): 0x00000000028e12c0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f0ab5a437e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f0ab5a4c37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f0ab5a5053c]
/usr/local/lib/libprotobuf.so.9(_ZN6google8protobuf8internal28DestroyDefaultRepeatedFieldsEv+0x1f)[0x7f0aba6ceb5f]
/usr/local/lib/libprotobuf.so.9(_ZN6google8protobuf23ShutdownProtobufLibraryEv+0x8b)[0x7f0aba6cdf4b]
/usr/lib/x86_64-linux-gnu/libmirprotobuf.so.3(+0x233b9)[0x7f0aa08103b9]
/lib64/ld-linux-x86-64.so.2(+0x10de7)[0x7f0ac1a58de7]
/lib/x86_64-linux-gnu/libc.so.6(+0x39ff8)[0x7f0ab5a05ff8]
/lib/x86_64-linux-gnu/libc.so.6(+0x3a045)[0x7f0ab5a06045]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf7)[0x7f0ab59ec837]
../CDC/build/tools/extract_image_features.bin[0x40d399]

Anyone have a suggestion?

Bug in run_demo.py at line 194.

for item in range(row[2]-1, row[3]+1):

It appears that the loop should be for item in range(row[2]-1, row[3]) instead of for item in range(row[2]-1, row[3]+1) since, at boundary cases (end of video file) it gives an error of invalid index. The segment seems to be increased by a size of one. Eg. if the values are starting frame = 1, end frame = 16 with window size = 16, the number of values written to res list are 17 instead of 16.

Please, help!

您好!

感谢您的分享

第3.2章节,确定segments标签时,大于0.7为positive,小于0.7但是大于0.5的为positive,小于0.3的为negative.
如果大于0.3小于0.5的情况下,如何确定是positive,还是negative呢?
期待您的回复,感谢!

issue about demo result

Hi, Thanks for your wonderful job!
When I run demo.py for video_0000131.mp4 and video_0000173.mp4, the proposal result differs from yours, I make comparison based on your result in 'scnn/experiments/THUMOS14/network_proposal/result/res_seg_swin.mat', does this comparison is right? Could you give me some suggestion? maybe extract feature process goes wrong?

question about run "run_demo.m"

when I run the "run_demo.m" ,the following question is:

init sliding window startsinit sliding window done in 2.4115 s
generate proposal startsgenerate proposal list done in 1.06 s
run proposal network starts
run proposal network done in 0.11494 s
read proposal results startsread proposal results done in 0.21835 s
generate localization list starts
generate localization list done in 0.000827 s
run localization results starts
run localization results done in 0.011421 s
read localization results starts
Improper assignment with rectangular empty matrix.

Error in run_demo (line 165)
seg_swin(:,9) = a;

Tried many ways, but still no solution.I would appreciated for some advice,very thanks!

Compile error while trying to compile C3D_overlap_loss

I get the following error while trying to run make all in C3D_overlap_loss:
I am using CPU and changed the Makefile.config accordingly
Also, I have already compiled c3d .

In file included from src/caffe/net.cpp:8:0:
./include/caffe/common.hpp:107:17: error: 'cublasHandle_t' does not name a type
inline static cublasHandle_t cublas_handle() { return Get().cublas_handle_; }
^
./include/caffe/common.hpp:108:17: error: 'curandGenerator_t' does not name a type
inline static curandGenerator_t curand_generator() {
^
./include/caffe/common.hpp:133:3: error: 'cublasHandle_t' does not name a type
cublasHandle_t cublas_handle_;
^
./include/caffe/common.hpp:134:3: error: 'curandGenerator_t' does not name a type
curandGenerator_t curand_generator_;
^
./include/caffe/common.hpp:149:34: error: 'cublasStatus_t' was not declared in this scope
const char* cublasGetErrorString(cublasStatus_t error);
^
./include/caffe/common.hpp:150:34: error: 'curandStatus_t' was not declared in this scope
const char* curandGetErrorString(curandStatus_t error);
^
make: *** [build/src/caffe/net.o] Error 1

Changes to VideoDataLayer break test cases for C3D

This repository just instructs users to follow the C3D installation instructions, and as part of those instructions it tells users to run make runtest to ensure it has been installed correctly. The edits made to the C3D implementations break the test cases and since there was no mention of this in this repository it is very misleading. i.e. it suggests that C3D has not been installed correctly.

Can you either fix the test cases or put a note into the README.md in this repo's root mentioning that the test cases for VideoDataLayer are expected to fail?

Questions about training S-CNN

Hi Zheng,

I have some questions about training S-CNN.

  1. In your S-CNN paper, you are using windows of size 16,32,64,...,512 with 16 frames. How did you generate C3D volume mean files for different window sizes? I know the number of frames are same across window sizes. But you need to adjust the "sampling rate" when you are computing volume means. Then there will be 6 different volume mean files. How do you select a volume mean file from the 6 files for training S-CNN? There is only one volume mean file in the github repository: "train01_16_128_171_mean.binaryproto".

  2. How many positive/negative examples are sampled from each video given a window size? Can you share your "input data file" please? It would be really helpful.

Thank you!

Why not use cls network in prediction?

hey,
as u showed in paper's sec4.3: results fine-tuning on cls network always perform better .
So, why don't use it in prediction. If not, why we train it.

How to determine the train labels in untrimmed videos of proposal network?

@zhengshou
Thank you very much!
I have read it carefully!
For peoposal networks, I was in trouble during the production of the training set. I don't know how to determine the candidate segment label. Is 1 or 0 for candidate segments label?
I just want to know how to determine training set labels for candidate segments in untrimmed videos.
Do you have any script implementation?

Thank you in advance!

Semantic of files in experiments/THUMOS14/annotation and eval/tmp.txt

The first line of BaseballPitch_test.txt reads:
video_test_0000324 49.2 53.5
What is the meaning of the two later numbers? I assume that this is a begin and end second?
Also, the first line in eval/tmp.txt reads
video_test_0000188 518.4 523.5 13 0.48942, which is not very obvious
Thank you and best regards,

metric_of_action_localization

Hi,Thanks for your sharing!
I am confused about the metric of action localization, I have 2 questions:
1, every video should be predicted with the start time and end time , also the video label, I want to know that the label influence the mAP of the action localization? If influence ,how? i see that IoU greater than 0.5
is ok.
2. In the test videos ,some video do not contain the action, what should the predicted result be? I think you did not predict these videos which do not contain the action according to your paper.

Thanks for your kindly help and nice job!

Unable to read .prob files in pred/pro/output/

Hi,

When I execute ipython -i -- run_demo.py -i video/video_test_0000131.mp4 -f 25
I received the following error:

extract frames starts
extract frames done in 2.8206 s
init sliding window starts
init sliding window done in 0.0045 s
generate proposal list starts
generate proposal list done in 0.0028 s
run proposal network starts
run proposal network done in 0.5959 s
read proposal results starts

/home/ubuntu/meet/scnn/demo/run_demo.py in main()
     93     # read proposal results
     94     for img_index in range(len(seg_swin)):
---> 95         prob = read_binary_blob(preddir+'pro/output/'+'{0:06}'.format(img_index+1)+'.prob')
     96         seg_swin[img_index][9] = prob[1]
     97 

/home/ubuntu/meet/scnn/demo/run_demo.py in read_binary_blob(filename)
    238 
    239 def read_binary_blob(filename):
--> 240     f = open(filename, 'rb')
    241     s = struct.unpack('iiiii', f.read(20)) # the first five are integers
    242     length = s[0]*s[1]*s[2]*s[3]*s[4]

IOError: [Errno 2] No such file or directory: 'pred/pro/output/000001.prob'

On further inspection, it seems that C3D isn't writing .prob files in feature_extract.sh

Thank you for help in advance :)

cannot download the pretrain model

hi zhengshou:
I cannot download the pre-trained models from Dropbox, is there something wrong with the link of Dropbox, thanks for your sharing. looking forward to your reply.

Input data file list

Hi @zhengshou , thanks for your sharing.

Could you please provide the input data files list you used to train and test the Proposal network? I saw that you shared some sample files, but I would appreciate a lot if you could provide me the original files you used... with all the rows. It would help me a lot. My interest is just for academic purposes.

Thanks in advance.

The problem of the run_demo.m and the demo_list_test_uniform16_proposal.lst

Hi,zhengshou
Thank you for your hard work and I am very interested in your work.
When I run the run_demo.m,I have some question, so I hope you can give me some suggestions.
I run the code, which is released and I didn't modify any code.
When I run the code as below:
system('./pred/pro/feature_extract.sh');
there was an error said /pred/pro/feature_extract.sh: Aborted
and the log said
I0315 15:31:35.568035 22271 video_data_layer.cpp:318] A total of 1589 video chunks.
I0315 15:31:35.568049 22271 video_data_layer.cpp:345] read video from frame/video_test_0000131/
F0315 15:31:35.568114 22271 video_data_layer.cpp:347] Check failed: ReadImageSequenceToVolumeDatum(file_list_[id].c_str(), 1, label_list_[id], new_length, new_height, new_width, sampling_rate, &datum) , and I guess the reason may be the length of the clips,
and I check the demo_list_test_uniform16_proposal.lst, which is generated by the code and from the test-video. As I used the C3D and I know the each row of the lst is a clip of 16 frames.
But in demo_list_test_uniform16_proposal.lst,for example
frame/video_test_0000131/ 1 0 1
frame/video_test_0000131/ 5 0 1
the 1 is the start frame and the last 1 is the stepsize,which means the next frame index = 1+1, the length of the clips maybe 4,and the start frame of the next clip is 5.
I wander how to sovle the error,and how to ensure the length of clips.
Best
Tim

label tool

Do you know any convenient temporal annotation tool?Thank you.

Questions about training my model

Hi Zheng:
I attempt to train my model,I have some questions.questions as follows:

  1. experiments\THUMOS14\network_proposal\list_train_uniform16_proposal.lst:
    format: video_frame_directory start_frame_index class_label stepsize
    start_frame_index : action_starting_time ?(Is start_frame_index action_starting_time ? )
    stepsize: 8/16/32 ?(Is the stepsize random ?)
  2. experiments\THUMOS14\network_localization\list_train_uniform16_localization.lst:
    overlap ? (How to compute overlap ?)
    thanks in advance.

compile C3D-v1.1/C3D_sample_rate

CXX/LD -o .build_release/examples/siamese/convert_mnist_siamese_data.bin
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/extract_image_features.bin] Error 1
make: *** Waiting for unfinished jobs....
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/upgrade_solver_proto_text.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/compute_image_mean.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/caffe.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/upgrade_net_proto_binary.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/extract_features.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/examples/cifar10/convert_cifar_data.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: .*** [.build_release/tools/upgrade_net_proto_text.bin] Error 1build_release
/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/convert_imageset.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/examples/cpp_classification/classification.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/examples/siamese/convert_mnist_siamese_data.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::set(int, double)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::open(cv::String const&)'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::release()' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::~VideoCapture()'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::isOpened() const' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::get(int) const'
.build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::read(cv::_OutputArray const&)' .build_release/lib/libcaffe.so: undefined reference to cv::VideoCapture::VideoCapture()'
collect2: error: ld returned 1 exit status
make: *** [.build_release/examples/mnist/convert_mnist_data.bin] Error 1

demo problem

When I run the demo, there is a error that show [Errno 2] No such file or directory: 'pred/pro/output/000001.prob'
so I cheak the log of demo_extract.log. there is another problem.
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 5:3: Unknown enumeration value of "VIDEO_DATA" for field "type".
and I found the demo is based on C3D-V1.0, and the demo_finetuning_feature_extract.prototxt file is different with C3D-V1.1. can you tell me how to fix this problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.