Git Product home page Git Product logo

action-detection's Introduction

Temporal Action Detection with Structured Segment Networks

Note: We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for SSN as well as other STOA frameworks for various tasks (action classification, temporal action detection, and spatial-temporal action detection). The lessons we learned in this repo are incorporated into MMAction to make it bettter. We highly recommend you switch to it. This repo will keep on being supported without further notice.


This repo holds the codes and models for the SSN framework presented on ICCV 2017

Temporal Action Detection with Structured Segment Networks Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin, ICCV 2017, Venice, Italy.

[Arxiv Preprint]

A predecessor of the SSN framework was presented in

A Pursuit of Temporal Accuracy in General Activity Detection Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, and Xiaoou Tang, arXiv:1703.02716.

Contents



Usage Guide

Prerequisites

[back to top]

The training and testing in SSN is reimplemented in PyTorch for the ease of use. We need the following software to run SSN.

Other minor Python modules can be installed by running

pip install -r requirements.txt

Actually, we recommend to setup the temporal-segment-networks (TSN) project prior to running SSN. It will help dealing with a lot of dependency issues of DenseFlow However this is optional, because we will be only using the DenseFlow tool.

GPUs are required to for optical flow extraction and running SSN. Usually 4 to 8 GPUs in a node would ensure a smooth training experience.

Code and Data Preparation

[back to top]

Get the code

From now on we assume you have already set up PyTorch and had the DenseFlow tool ready from the TSN project.

Clone this repo with git, please remember to use --recursive

git clone --recursive https://github.com/yjxiong/action-detection

Download Datasets

We support experimenting with two publicly available datasets for temporal action detection: THUMOS14 & ActivityNet v1.2. Here are some steps to download these two datasets.

  • THUMOS14: We need the validation videos for training and testing videos for testing. You can download them from the THUMOS14 challenge website.
  • ActivityNet v1.2: this dataset is provided in the form of YouTube URL list. You can use the official ActivityNet downloader to download videos from the YouTube.

After downloading the videos for each dataset, unzip them in a folder SRC_FOLDER.

Pretrained Models

We provide the pretrained reference models and initialization models in standard PyTorch format. There is no need to manually download the initialization models. They will be downloaded by the torch.model_zoo tool when necessary.

Extract Frames and Optical Flow Images

To run the training and testing, we need to decompose the video into frames. Also the temporal stream networks need optical flow or warped optical flow images for input.

We suggest using the tools provided in the TSN repo for this purpose. Following instructions are from the TSN repo

These can be achieved with the script scripts/extract_optical_flow.sh. The script has three arguments

  • SRC_FOLDER points to the folder where you put the video dataset
  • OUT_FOLDER points to the root folder where the extracted frames and optical images will be put in
  • NUM_WORKER specifies the number of GPU to use in parallel for flow extraction, must be larger than 1

The command for running optical flow extraction is as follows

bash scripts/extract_optical_flow.sh SRC_FOLDER OUT_FOLDER NUM_WORKER

Prepare the Proposal Lists

Training and testing of SSN models rely on the files call "proposal lists". It records the information of temporal action proposals for videos together with the that of the groundtruth action instances.

In the sense that decoders on different machines may output different number of frames. We provide the proposal lists in a normalized form. To start training and testing, one needs to adapt the proposal lists to the actual number of frames extracted for each video. To do this, run

python gen_proposal_list.py DATASET FRAMES_PATH

Train TAG models

Due to a large amount of inquiry for the training of TAG, we provide the following procedures to train binary actionness classifiers and generate proposals.

Generate sliding window proposals

First of all, we generate a series of sliding-window proposals.

  • THUMOS14
python gen_sliding_window_proposals.py validation rgb FRAME_PATH data/thumos14_sw_val_proposal_list.txt --dataset thumos14 
python gen_sliding_window_proposals.py testing rgb FRAME_PATH data/thumos14_sw_test_proposal_list.txt --dataset thumos14 
  • ActivityNet v1.2
python gen_sliding_window_proposals.py training rgb FRAME_PATH data/activitynet1.2_sw_train_proposal_list.txt --dataset activitynet --version 1.2
python gen_sliding_window_proposals.py validation rgb FRAME_PATH data/activitynet1.2_sw_val_proposal_list.txt --dataset activitynet --version 1.2

Training binary actionness classifier

Using the above proposals, we can train a binary actionness classifier.

python binary_train.py thumos14 MODALITY -b 16 --lr_steps 20 40 --epochs 45 

or

python binary_train.py activitynet1.2 MODALITY -b 16 --lr_steps 3 6 --epochs 7 

Obtaining actionness score

Pretrained actionness classifier on THUMOS14 can be downloaded from RGB Actionness Model and Flow Actionness Model

python binary_test.py DATASET MODALITY SUBSET TRAINING_CHECKPOINT ACTIONNESS_RESULT_PICKLE 

Generating TAG proposals

THUMOS14

python gen_bottom_up_proposals.py ACTIONNESS_RESULT_PICKLE --dataset thumos14 --subset validation  --write_proposals data/thumos14_tag_val_proposal_list.txt  --frame_path FRAME_PATH
python gen_bottom_up_proposals.py ACTIONNESS_RESULT_PICKLE --dataset thumos14 --subset testing  --write_proposals data/thumos14_tag_test_proposal_list.txt  --frame_path FRAME_PATH

ActivityNet1.2

python gen_bottom_up_proposals.py ACTIONNESS_RESULT_PICKLE --dataset activitynet --subset training  --write_proposals data/activitynet1.2_tag_train_proposal_list.txt  --frame_path FRAME_PATH
python gen_bottom_up_proposals.py ACTIONNESS_RESULT_PICKLE --dataset activitynet --subset validation  --write_proposals data/activitynet1.2_tag_val_proposal_list.txt  --frame_path FRAME_PATH

where ACTIONNESS_RESULTS_PICKLE can be multiple (e.g. actionness predicted from both streams)

Testing Trained Models

[back to top]

Evaluating on benchmark datasets

There are two steps to evaluate temporal action detection with our pretrained models.

First, we will extract the detection scores for all the proposals by running

python ssn_test.py DATASET MODALITY TRAINING_CHECKPOINT RESULT_PICKLE

Then using the proposal scores we evaluate the detection performance by running

python eval_detection_results.py DATASET RESULT_PICKLE

This script will report the detection performance in terms of mean average precision at different IoU thresholds.

Using reference models for evaluation

We provide the trained models on our machines so you can test them before actual training any model. You can see the performance of the reference models in the performance section.

To use these models, run the following command

python ssn_test.py DATASET MODALITY none RESULT_PICKLE --use_reference

Addtionally, we provide the models trained with Kinetics pretraining, to use them, run

python ssn_test.py DATASET MODALITY none RESULT_PICKLE --use_kinetics_reference

Training SSN

[back to top]

In the paper we report the results using pretraining on ImageNet. So we first iterate through this case.

Training with ImageNet pretrained models

Use the following commands to train SSN

  • THUMOS14
python ssn_train.py thumos14 MODALITY -b 16 --lr_steps 20 40 --epochs 45
  • ActivityNet v1.2
python ssn_train.py activitynet1.2 MODALITY -b 16 --lr_steps 3 6 --epochs 7

Here, MODALITY can be RGB and Flow. DATASET can be thumos14 and activitynet1.2. You can find more details about this script by running

python ssn_train.py -h

After training, there will be a checkpoint file whose name contains the information about dataset, architecture, and modality. This checkpoint file contains the trained model weights and can be used for testing.

Training with Kinetics pretrained models

Additionally, we provide the initialization models pretrained on the Kinetics dataset. This pretraining process is known to boost the detection performance. More details can be found on the pretrained model website.

To use these pretrained models, append an option --kin to the training command, like

python ssn_train.py thumos14 MODALITY -b 16 --lr_steps 20 40 --epochs 45 --kin

and

python ssn_train.py activitynet1.2 MODALITY -b 16 --lr_steps 3 6 --epochs 7 --kin

The system will use PyTorch's model_zoo utilities to download the pretrained models for you.

Temporal Action Detection Performance

[back to top]

We provide a set of reference temporal action detection models. Their performance on benchmark datasets are as follow. These results can also be found on the project website. You can download

THUMOS14

[email protected] (%) RGB Flow RGB+Flow
BNInception 16.18 22.50 27.36
BNInception (Kinetics Pretrained) 21.31 27.93 32.50
InceptionV3 18.28 23.30 28.00 (29.8*)
InceptionV3 (Kinetics Pretrained) 22.12 30.51 33.15 (34.3*)

* We filter the detection results with the classification model from UntrimmedNets to keep only those from the top-2 predicted action classes.

ActivityNet v1.2

Average mAP RGB Flow RGB+Flow
BNInception 24.85 21.69 26.75
BNInception (Kinetics Pretrained) 27.53 28.0 28.57
InceptionV3 25.75 22.44 27.82
InceptionV3 (Kinetics Pretrained)

Other Info

[back to top]

Citation

Please cite the following paper if you feel SSN useful to your research

@inproceedings{SSN2017ICCV,
  author    = {Yue Zhao and
               Yuanjun Xiong and
               Limin Wang and
               Zhirong Wu and
               Xiaoou Tang and
               Dahua Lin},
  title     = {Temporal Action Detection with Structured Segment Networks},
  booktitle   = {ICCV},
  year      = {2017},
}

Related Projects

  • UntrimmmedNets: Our latest framework for learning action recognition models from untrimmed videos. (CVPR'17).
  • Kinetics Pretrained Models : TSN action recognition models trained on the Kinetics dataset.
  • TSN : state of the art action recognition framework for trimmed videos. (ECCV'16).
  • CES-STAR@ActivityNet : winning solution for ActivityNet challenge 2016, based on TSN.
  • EnhancedMV: real-time action recognition using motion vectors in video encodings.

Contact

For any question, please file an issue or contact

Yue Zhao: [email protected]
Yuanjun Xiong: [email protected]

action-detection's People

Contributors

kaidic avatar lyy-zz avatar tangshixiang avatar yjxiong avatar zhaoyue-zephyrus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

action-detection's Issues

own trained check file dose not contain 'arch', 'best_loss', 'epoch', 'reg_stats'

I trained the weights using ssn_train.py and got the weight pth.tar file.
But, when I try to test with this weight file, error occured.
The weight file saved by ssn_train.py is not correctly configured for ssn_test requiring 'arch', best_loss', 'epoch', 'reg_stats', 'state_dict' as key in dictionary.
Do I need to manually change configuration of the weight file?
And what is 'reg_stats'?

pytorch weight file

I'm trying to run 'ssn_test.py'.

I get the arguments in the example script below, but where can I get TRAINING_CHECKPOINT?

python ssn_test.py DATASET MODALITY TRAINING_CHECKPOINT RESULT_PICKL

There is no way to download 'the trained pytorch weights file' from authors.

issue about the thumos14_test_normalized_proposal_list.txt

hello guy,I just try to reproduce your amazing work, and for convenience(computational cost),I just use 213 videos instead which are used later for test in thumos14 dataset eval toolkit.But I find that there might be something wrong about your groundtruth annotation in thumos14_tag_test_normalized_proposal_list.txt file.For example, you can check your groundtruth annotations in following three videos:video_test_0001292、video_test_0000270、video_test_0001496.
In your .txt file,these three videos are negative with 0 gt instance,however in thumos14 dataset test annotation , all of them include several groundtruth action instances.So when i run the ssn test python file,my video numbers decrease from 213 to 210,and the final reproducing results tend to be lower than yours listed in the paper(about 1.5% difference).WAITING FOR YOUR REPLY, thx so much!

Results for single stream network

Hi, I am implementing your model on my own. Could you please release the results for single stream (e.g RGB, optical flow) on THUMOS14? It will be very helpful to check the intermediate results.

By the way, about the two-stream networks, does the fusion process apply on the frame-wise scores when testing?

Thanks a lot!

issue about the annotation(start frame and ending frame)calculation

hello,another question plz...
I wonder how you calculate the start frame and end frame of each groundtruth instance given the start time and end time by thumos14 dataset.
for example,in the video_test_0000004, the start and end time is 0.2 and 1.1 etc. . I multipy it by 30(frame rate) respectively.But the result(
6 | 33
342 | 366
558 | 624
849 | 891
30 | 45
624 | 669
909 | 951
)
is different from yours(
4 32
340 364
555 621
845 887
29 44
621 666
905 947
),so i want to know how you calculate it~thx very much!

details about the temporal actionness grouping?

Hi @yjxiong ,
Could you please tell me about the details about the temporal actionness grouping if possible? What is the duration of each snippet and how many RGB frames and optical flow frames are sampled from each snippet? Thanks.

Binary actioness classifier training in TAG

Hi, @yjxiong

Would you release your actioness classifier network arch?
When i train the Binary actioness classifier, It's difficult to converge on big dataset(just over 20W extracted frame) with some simple CNN arch. RGB modality and flow modality suffer similar problem.

The method of organising train dataset as below:
Positive : frames in all ActivityNet1.3 action instance, sampling interval set to 6, then random choosing some frames.
Negative : frames in all ActivityNet1.3 backgroud , sampling interval set to 6, then random choosing some frames.

Is there any problem in dataset organising ? Should I use all Positive/Negative frames?

thanks a lot.

average recall rates of ActivityNet v1.3

Hi, I have read your paper "Temporal Action Detection with Structured Segment Networks" and I have a question about experiments. Can you give the average recall rate of your TAG method in Anet v1.3 ? Thanks.

Interval of snippets ?

Hi
I am reading your CVPR2017 paper about action-detection recently, really a wonderful work.
I have a problem about paper. You said in paper :" Given a video, a sequence of snippets will be extracted with a regular interval in between." I want to know the size of interval you used in both THUMOS14 and ActivityNet.
Thank you so much.

Length and content of videos

Is there a guideline to how long the videos should be and how much activity they should contain?
e.g.

  • If I have a long video which is mostly background, should I cut the background parts out?
  • If I have a long video with lots of activity, should I split it into several short ones? I noticed that there is a number of proposals specified in dataset config:
    prop_per_video: 8

    Does that mean that only 8 proposals are used per video?

probelm of overlapped and unseen THUMOS14 categories

Hi Yuanjun,

I also want to evaluate the generaliability of our new proposal method as Table 2. in "A Pursuit of Temporal Accuracy in General Activity Detection". But I am not sure about which 10 classes are overlapped in THUMOS14. Can you list these classes? Thanks.

No module model_zoo

Hello, @yjxiong
I train the network with 'ssn_train.py',and then come across the error :import model_zoo,and no module named ' model_zoo' in 'ssn_models.py'

val_proposal_list about THUMOS14

Hi,I run the code gen_proposal_list.py, and then get the data about that:

0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0
13 0.4760 0.4823 0 0
0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0

in above proposal data,the are many invalid data,is that correct??

problem of detection result

Hi, @yjxiong when I use your pretrained model to test, my result is worse than paper as follows:

+Detection Performance on thumos14------+--------+--------+--------+--------+--------+--------+---------+
| IoU thresh | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 | Average |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+
| mean AP | 0.5606 | 0.5084 | 0.4379 | 0.3422 | 0.2466 | 0.1589 | 0.0926 | 0.0399 | 0.0052 | 0.2658 |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+

Do you know why please?

how to get best_iou and overlap_self value in thumos14_tag_val_normalized_proposal_list.txt

hello @yjxiong ,i am trying to reimplement your TAG method,for now i can get actionness score and do the grouping scheme,and after the NMS,i can get 82 proposals for video_validation_0000201.mp4, and
i want to replace the proposals in your thumos14_tag_val_normalized_proposal_list.txt with mine to check if my reimplement is all right , but yout TXT file not only contain the proposal itself,but also the best_iou and overlap_self items to calculate the incomplete proposal as well as the background proposal , but i wonder how to calculate these two items,because there is no description in your paper about it.

e.g: thumos14_tag_val_normalized_proposal_list.txt

59

video_validation_0000201
1
1
1
8 0.8631 0.8926
32
8 0.7603 0.8995 0.8684 0.8956
8 0.6497 0.6497 0.8593 0.9047
0 0.0000 0.0000 0.1354 0.1626
##format
(label,best_iou,overlap_self,start_frame_norm,end_frame_norm)

THANK YOU .

About proposal list (no CliffDiving class in val list, missing video_test_0001496 in test list)

  1. Why all validation proposals (used for training) you generated for CliffDiving class videos are labelled with 8 (Diving) but not 5 (CliffDiving)? Wouldn't this deteriorate the performance of CliffDiving classifier?
    For example, in thumos14_tag_val_normalized_proposal_list.txt
# 380
video_validation_0000161
1
1
8
8 0.1826 0.2280
8 0.3393 0.3863
8 0.4268 0.4754
8 0.8228 0.8957
5 0.1826 0.2280
5 0.3393 0.3863
5 0.4268 0.4754
5 0.8228 0.8957
86
8 0.3272 0.3438 0.3458 0.4689
8 0.1725 0.1725 0.1972 0.4689
8 0.2917 0.3254 0.3522 0.4625
8 0.2999 0.2999 0.3139 0.4754
8 0.1530 0.1530 0.1777 0.4949
8 0.2458 0.2458 0.2977 0.4949
8 0.7752 0.8887 0.3458 0.3911
8 0.7883 0.8401 0.3425 0.3944
8 0.8001 0.8023 0.3393 0.3976
8 0.7602 0.9535 0.3490 0.3879
......
  1. thumos14_tag_test_normalized_proposal_list.txt has 200 videos while there are 213 videos in TH14_Temporal_Annotations_Test\xgtf_renamed. Two reasonable missing videos are video_test_0000270 (its annotationa are HammerThrow but its ground truth in video is HairCut which doesn't belong to the 20 classes) and video_test_0001292 (it only has ambiguous annotations).
    It seems that another missing video video_test_0001496 can be included into test list after modifying the annotations (annotations are CricketShot while ground truth is FrisbeeCatch).

proposal generation code

would you please release the code for training binary classifier based on TAG and generating proposals by
watershed algorithm?

Problem about recall performance of TAG

Hi Yuanjun,

I generate proposal list of THUMOS14 using "gen_proposal_list.py" and got "thumos14_tag_test_proposal_list.txt". Then I use evaluation codes in https://github.com/escorciav/daps/wiki/FAQs , where random score is used for proposals retrieving.

However, I only got AR 39.14% ( AR is calculated with [0.5:0.05:1.0] ), which is lower than 48.9% in SSN paper.
And using this evaluation code, the performance of following method is (with 200 proposals)
DAPs: 33.96%
TAP( sparseprop): 23.13%
SCNN-prop: 37.01% (for performance of scnn-prop, I guess you need to read this: https://github.com/escorciav/daps/wiki )

So how did you evaluate these models? And how can I re-produce TAG performance reported in SSN paper? Thanks!

How does DataParallel work?

Sorry to bother.
I want to ask a question about how DataLoader work. I find that no matter what value the batchsize is, every gpu kernel at leat process (1+6+1)=8 proposal. Even if I set batchsize=2, then only 2 of 8 gpu kernels will be used and each gpu kernel process 8 proposal. Why?

RuntimeError: out of memory when batch size is bigger than 2

Hi, when I train SSN on thumos14 with the command:
python ssn_train.py thumos14 MODALITY -b 16 --lr_steps 20 40 --epochs 45
, more specifically, sudo python3 ssn_train.py thumos14 RGB -b 16 --lr_steps 20 40 --epochs 45,
I got the RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58 error.
Only when I reduced the batch size to 2 the error disappeared. The memory usage is 9256MiB/12207MiB
of one GPU when batch size is 2. But I quickly got nan loss most of the time. I didn't modify the release code except some small changes of name and path. So I think it's unreasonable that the batch size is so small. Does anyone know why?

My environment info is as follows,
Python3.5, Cuda 7.5, nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:04:00.0 Off |                  N/A |
| 22%   58C    P2    72W / 250W |    325MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 00000000:05:00.0 Off |                  N/A |
| 26%   40C    P8    13W / 250W |     11MiB /  6081MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

I installed PyTorch by the following commands:

pip3 install http://download.pytorch.org/whl/cu75/torch-0.3.0.post4-cp35-cp35m-linux_x86_64.whl 
pip3 install torchvision

The full log is as follows.

administrator@xxxx:xxxx/action-detection$ sudo python3 ssn_train.py thumos14 RGB -b 16 --lr_steps 20 40 --epochs 45

    Initializing SSN with base model: BNInception.
    SSN Configurations:
        input_modality:     RGB
        starting_segments:  2
        course_segments:    5
        ending_segments:    2
        num_segments:       9
        new_length:         1
        dropout_ratio:      0.8
        loc. regression:    ON
        bn_mode:            frozen

        stpp_configs:       (1, 1, 1)

/home/administrator/.local/lib/python3.5/site-packages/torch/nn/modules/module.py:482: UserWarning: src is not broadcastable to dst, but they have the same number of elements.  Falling back to deprecated pointwise behavior.
  own_state[name].copy_(param)
Freezing all BatchNorm2D layers
computing regression target normalizing constants


            SSNDataset: Proposal file data/thumos14_tag_val_proposal_list.txt parsed.

            There are 28231 usable proposals from 200 videos.
            6676 foreground proposals
            17950 incomplete_proposals
            3605 background_proposals

            Sampling config:
            FG/BG/INC: 1/1/6
            Video Centric: True

            Epoch size multiplier: 10

            Regression Stats:
            Location: mean -0.02322 std 0.08391
            Duration: mean -0.00504 std 0.19560

/usr/local/lib/python3.5/dist-packages/torchvision/transforms/transforms.py:156: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  "please use transforms.Resize instead.")


            SSNDataset: Proposal file data/thumos14_tag_test_proposal_list.txt parsed.

            There are 33634 usable proposals from 210 videos.
            7298 foreground proposals
            21316 incomplete_proposals
            5020 background_proposals

            Sampling config:
            FG/BG/INC: 1/1/6
            Video Centric: True

            Epoch size multiplier: 1

            Regression Stats:
            Location: mean -0.02322 std 0.08391
            Duration: mean -0.00504 std 0.19560

group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1
group: first_conv_bias has 1 params, lr_mult: 2, decay_mult: 0
group: normal_weight has 71 params, lr_mult: 1, decay_mult: 1
group: normal_bias has 71 params, lr_mult: 2, decay_mult: 0
group: BN scale/shift has 0 params, lr_mult: 1, decay_mult: 0
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
inception_3a_1x1_bn torch.Size([576, 64, 28, 28])
inception_3a_3x3_bn torch.Size([576, 64, 28, 28])
inception_3a_double_3x3_2_bn torch.Size([576, 96, 28, 28])
inception_3a_pool_proj_bn torch.Size([576, 32, 28, 28])
Traceback (most recent call last):
  File "ssn_train.py", line 418, in <module>
    main()
  File "ssn_train.py", line 154, in main
    train(train_loader, model, activity_criterion, completeness_criterion, regression_criterion, optimizer, epoch)
  File "ssn_train.py", line 208, in train
    reg_target_var, prop_type_var)
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 68, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 78, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
    raise output
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
    output = module(*input, **kwargs)
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/share/v-yuphu/action-detection/ssn_models.py", line 255, in forward
    return self.train_forward(input, aug_scaling, target, reg_target, prop_type)
  File "/mnt/share/v-yuphu/action-detection/ssn_models.py", line 266, in train_forward
    base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:]))
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/share/v-yuphu/action-detection/model_zoo/bninception/pytorch_load.py", line 56, in forward
    data_dict[op[2]] = torch.cat(tuple(data_dict[x] for x in op[-1]), 1)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
THCudaCheckWarn FAIL file=/pytorch/torch/lib/THC/THCStream.cpp line=50 error=29 : driver shutting down
THCudaCheckWarn FAIL file=/pytorch/torch/lib/THC/THCStream.cpp line=50 error=29 : driver shutting down

How to run ssn on custom videos

Hi @yjxiong ,
Thanks for your cool work, it's really useful. I wonder how to run ssn action detection on custom videos(which are not included in ActvityNet and Thumos) using pretrained model, can you give me some hints? Thank you in advance.

Questions about dataloader

First of all I want to thank you.
I have read the main code of SSN and have a questions about training code.
the model and criterion is moved to GPU with '.cuda()', while I found no '.cuda()' for input data of the model, which means the data is on cpu memory. Why?

What's the size of extracted frames and optical flow images?

The authors suggest using the tools provided in the TSN repo to extract frames and optical flow images. I wonder should I resize the images the same way as TSN repo? What's the image size the authors used?
One dataset used in TSN repo is UCF101. The original videos' size is 320*256. The command in TSN repo resize images to 340*256 as follows,

python tools/build_of.py ${SRC_FOLDER} ${OUT_FOLDER} --num_worker ${NUM_WORKER} --new_width 340 --new_height 256 2>local/errors.log

And in the *_train_val.prototxt the crop_size is 224.
The size of THUMOS14 videos is 320*180, in resnet/vgg/BNInception models input_size = 224, in InceptionV3/inception models input_size = 299.

I got worse results when I used reference models to evaluate my extracted images, which haven't been resized (320*180). (The image number is the same as the image number of denseflow extracted images)

action-detection$ python3 ssn_test.py thumos14 RGB none score_thumos14_rgb_reference.npz --use_reference
action-detection$ python3 eval_detection_results.py thumos14 score_thumos14_rgb_reference.npz

rgb reference model
+Detection Performance on thumos14------+--------+--------+--------+--------+--------+--------+---------+
| IoU thresh | 0.10   | 0.20   | 0.30   | 0.40   | 0.50   | 0.60   | 0.70   | 0.80   | 0.90   | Average |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+
| mean AP    | 0.4375 | 0.3839 | 0.3266 | 0.2430 | 0.1639 | 0.1051 | 0.0588 | 0.0244 | 0.0059 | 0.1943  |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+


action-detection$ python3 ssn_test.py thumos14 Flow none score_thumos14_flow_reference.npz --use_reference
action-detection$ python3 eval_detection_results.py thumos14 score_thumos14_flow_reference.npz

+Detection Performance on thumos14------+--------+--------+--------+--------+--------+--------+---------+
| IoU thresh | 0.10   | 0.20   | 0.30   | 0.40   | 0.50   | 0.60   | 0.70   | 0.80   | 0.90   | Average |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+
| mean AP    | 0.4166 | 0.3769 | 0.3183 | 0.2538 | 0.1907 | 0.1208 | 0.0698 | 0.0250 | 0.0045 | 0.1974  |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+

How to train using RGB and flow modality?

Hi!
Thanks for your great works!
I read your README file and notice that I can train your model by using
python ssn_train.py thumos14 MODALITY -b 16 --lr_steps 20 40 --epochs 45
I can choose MODALITY to be RGB or flow
But how can I train the model using both RGB and flow?
Or I should train them separately and put them together only in testing?
Thanks for your help!

HDD training speed

Hi, I'm trying to train the model.
My data is stored on HDD and the script takes a long time to load data into memory, how can I speed it up?

Proposal list

Hi, in your proposal_list.txt,what does each line stand for?
something like below:

1

video_validation_0000354
1
1
0
226
0 0.0000 0.0000 0.7439 0.7713
0 0.0000 0.0000 0.7533 0.7698
0 0.0000 0.0000 0.7259 0.7682
0 0.0000 0.0000 0.7125 0.7698
0 0.0000 0.0000 0.7408 0.7737
0 0.0000 0.0000 0.7408 0.7792

How can I get the proposal list like this ?can you release the code to produce the proposal list? Thank you so much! Looking forward to your reply~

train and test video numbers in THUMOS 2014 Dataset

@yjxiong hello,thanks for your excellent again,but for now i got confused about your thumos14_tag_test_normalized_proposal_list.txt file.

as far as i know, the THUMOS 2014's validation and test set contain 1010 and 1574
untrimmed videos separately. but there are only 20 action categories are involved and annotated temporally.,which is 200 validation set videos and 213 test set videos are used for temporal action detection task,but in your thumos14_tag_test_normalized_proposal_list.txt ,it contains all 1574 videos proposal start time and end time,since it is impossible to train for those videos which does not have annotation information,so during training or testing,did you really train and test on all videos(1010 and 1574) or you just train on(200 +213) and the other videos in thumos14_tag_test_normalized_proposal_list.txt is not used at all?

Question about sample balance

In your paper, the ratio of positive, background, and incomplete proposals is 1:1:6. So when you train the action classifier, background proposal : foreground proposal is 1:1, but there are 100 class(in activitynet1.2), which means that the sample of foreground of each class is much more less than backgound sample. Should this problem be ignored? Would it bring some problem?

I tried training SSN model without any modification of your code, and find that after several backward and update operation, the model tends to predict all the proposals as background. How come?

mdoel size not match

hello ,
I run the code and then come across the problem about size that model size does not match parameter size.
RuntimeError: While copying the parameter named inception_4d_pool_proj_bn.running_var, whose dimensions in the model are torch.Size([128]) and whose dimensions in the checkpoint are torch.Size([1, 128]).

Performance of THUMOS‘14 IoU = 0.6,0.7

Hi Yuanjun,
We want to cite your paper in our work, but we need the mAP of THUMOS14 when IoU = 0.6 and 0.7 , which is not provided in your paper.
Do you have the results at hand for this setting?
Thanks!

problem of proposal_list

Hi, @yjxiong can you release your proposal_list? When I use my generated proposal_list file, testing is always wrong, it says that math domain error, maybe because gt_size < prop_size in

def compute_regression_targets(self, gt_list, fg_thresh):

in ssn_dataset.py

IndexError: The advanced indexing objects could not be broadcast

When I train the model of activitynet 1.2,
train(train_loader, model, activity_criterion, completeness_criterion, regression_criterion, optimizer, epoch)

activity_out, activity_target,
completeness_out, completeness_target,
regression_out, regression_labels, regression_target = model(input_var, scaling_var, target_var,
reg_target_var, prop_type_var)

in ss_train.py raised an error:IndexError: The advanced indexing objects could not be broadcast.
How to fix this problem?

Would you share the detection result json file on ActivityNet?

Dear Author:

I am reading your papers:

  1. A Pursuit of Temporal Accuracy in General Activity Detection
  2. Temporal Action Detection with Structured Segment Networks

Thanks for your great works!

Currently, I am evaluating my detection result on one out of five subsets of ActivityNet 1.2 validation set. However, on one has ever release the results on these five subsets respectively.

I am wondering would you share your detection result json file of these two works on ActivityNet 1.2 or 1.3 validation set with me?

If yes, can you email the file to [email protected] or just paste down below? That means a lot!

Thanks a lot!

AN < classid , category name > mapping

Dear all,

I am working on AN v1.2 now and trying to map the classid used in ssn (which is indexed from 0 to 99) to its original category name. I am wondering does anyone have such mapping file for AN v1.2 and v1.3? Thanks much!

Best,
Zheng

Window scales for sliding window

Hi, I just read your papers and notice that the method can also achieve good performance with proposals generated by sliding windows. In your paper, "we generate windows in 20 exponential scales starting from 0.3 second long". Could you provide the details of 20 scales? Thanks a lot!

ssn_train.py

when I run: python ssn_train.py thumos14 RGB -b 16 --lr_steps 20 40 --epochs 45
something wrong,i donot know why ?nobody else have met this issue?
File "ssn_train.py", line 103
modality=args.modality, exclude_empty=True, **sampling_configs,
^

SyntaxError: invalid syntax

How to get temporal region proposals by myself?

As it provides the proposal file, I don't need it to reproduce the results. However, I want to know how to train the actionness classifier to evaluate the actionness for snippets.
BTW, how to define a snippet, i.e. how many frames within a snippet? Thanks!

confused about ssn_op.py

class CompletenessLoss(torch.nn.Module):
def init(self, ohem_ratio=0.17):
super(CompletenessLoss, self).init()
self.ohem_ratio = ohem_ratio

    self.sigmoid = nn.Sigmoid()

def forward(self, pred, labels, sample_split, sample_group_size):
    pred_dim = pred.size()[1]
    pred = pred.view(-1, sample_group_size, pred_dim)
    labels = labels.view(-1, sample_group_size)

    pos_group_size = sample_split
    neg_group_size = sample_group_size - sample_split
    pos_prob = pred[:, :sample_split, :].contiguous().view(-1, pred_dim)
    neg_prob = pred[:, sample_split:, :].contiguous().view(-1, pred_dim)
    pos_ls = OHEMHingeLoss.apply(pos_prob, labels[:, :sample_split].contiguous().view(-1), 1,
                                 1.0, pos_group_size)
    neg_ls = OHEMHingeLoss.apply(neg_prob, labels[:, sample_split:].contiguous().view(-1), -1,
                                 self.ohem_ratio, neg_group_size)
    pos_cnt = pos_prob.size(0)
    neg_cnt = int(neg_prob.size()[0] * self.ohem_ratio)

    return pos_ls / float(pos_cnt + neg_cnt) + neg_ls / float(pos_cnt + neg_cnt)

why not apply sigmoid function to pred ?
pred is a fc layer output without normalized to [0,1]

Or I get something wrong?
thank you

Coverage Threshold

During dataset creation, bg_coverage of the whole video is checked in:

if tag[i] == 0 and \
self.proposals[i].best_iou < bg_iou_thresh and \
self.proposals[i].coverage > bg_coverage_thresh:

Why is this done? Can I remove this condition?

My videos are very long, with lots of activity. Therefore backgrounds are quite short, therefore most of the proposals fail.

Could you please release TSN features?

Could you please release a set of TSN features of public dataset(e.g. THUMOS'14)?
Extracting these features is so time and computing resource consuming.
This must be very helpful to the community.
Thank you in advance!

Flow + RGB result

Are the results reported for 'Flow + RGB' come from 2 model ensemble?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.