Git Product home page Git Product logo

dcan's Introduction

DCAN

The official implementation of the paper "DCAN: Improving Temporal Action Detection via Dual Context Aggregation".

PWC PWC

News

(2022/09/01) Preliminary code of our end-to-end training framework BasicTAD has been released in this repo.

(2022/06/21) Code and models are released.

Abstract

Temporal action detection aims to locate the boundaries of action in the video. The current method based on boundary matching enumerates and calculates all possible boundary matchings to generate proposals. However, these methods neglect the long-range context aggregation in boundary prediction. At the same time, due to the similar semantics of adjacent matchings, local semantic aggregation of densely-generated matchings cannot improve semantic richness and discrimination. In this paper, we propose the endto-end proposal generation method named Dual Context Aggregation Network (DCAN) to aggregate context on two levels, namely, boundary level and proposal level, for generating high-quality action proposals, thereby improving the performance of temporal action detection. Specifically, we design the Multi-Path Temporal Context Aggregation (MTCA) to achieve smooth context aggregation on boundary level and precise evaluation of boundaries. For matching evaluation, Coarse-to-fine Matching (CFM) is designed to aggregate context on the proposal level and refine the matching map from coarse to fine. We conduct extensive experiments on ActivityNet v1.3 and THUMOS-14. DCAN obtains an average mAP of 35.39% on ActivityNet v1.3 and reaches mAP 54.14% at [email protected] on THUMOS-14, which demonstrates DCAN can generate high-quality proposals and achieve state-of-the-art performance.

Usage

Environment

We use Miniconda3 to manage our python environments.

conda create -n dcan python=3.7
conda activate dcan
conda install matplotlib tqdm joblib h5py
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch

Data preparation

ActivityNet v1.3: We use the TSN feature extracted by the two-stream network. The frame interval is set to 16. Using linear interpolation, each video feature sequence is rescaled to L = 100 snippets.

THUMOS-14: We use the TSN feature extracted by the TSN network. The FPS of the feature is consistent with the original videos. This feature is stored through HDF5 for rapid reading.


The TSN features for both datasets are available on the website. We also provide our download links to long-term support.

Baidu Netdisk:

ActivityNet v1.3 and THUMOS-14, code: uvpk

Zenodo:

ActivityNet v1.3 , THUMOS-14


In order to use the downloaded video features, the feature_path attributes in the two opt.py need to be modified respectively.

Evaluate our model

Download the checkpoints in the release page.

For example on THUMOS-14, run the followed code.

cd anet_thumos14
python test.py --checkpoint_path ./save/thumos_4_param.pth.tar

Then, you can get the evaluation results:

mAP at tIoU 0.1 is 0.7425103702614687
mAP at tIoU 0.2 is 0.7168985921400817
mAP at tIoU 0.3 is 0.6794231339444753
mAP at tIoU 0.4 is 0.6248513734821018
mAP at tIoU 0.5 is 0.5399065183118822
mAP at tIoU 0.6 is 0.4393354468419945
mAP at tIoU 0.7 is 0.3242030486843789
mAP at tIoU 0.8 is 0.1953019732952785
mAP at tIoU 0.9 is 0.06211617080697596

For ActivityNet v1.3, the evaluation procedure is similar to THUMOS-14.

Training and Testing

For example on THUMOS-14.

cd anet_thumos

Training model using 4 GPUS (ids=0,1,2,3).

Python train.py --gpus 0,1,2,3

Testing model on a sepific epoch.

python test.py --checkpoint_path ./save/xxx-xx/ --test_epoch 4

The 'xxx-xx' is the training work directory in the 'save' directory named by a timestamp, such as "./save/20220116-1417".

For ActivityNet v1.3, the training and testing procedure is similar to THUMOS-14.

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{2022dcan,
  author    = {Guo Chen and
               Yin{-}Dong Zheng and
               Limin Wang and
               Tong Lu},
  title     = {{DCAN:} Improving Temporal Action Detection via Dual Context Aggregation},
  booktitle = {Thirty-Sixth {AAAI} Conference on Artificial Intelligence, {AAAI}
               2022, Thirty-Fourth Conference on Innovative Applications of Artificial
               Intelligence, {IAAI} 2022, The Twelveth Symposium on Educational Advances
               in Artificial Intelligence, {EAAI} 2022 Virtual Event, February 22
               - March 1, 2022},
  pages     = {248--257},
  publisher = {{AAAI} Press},
  year      = {2022}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

dcan's People

Contributors

cg1177 avatar misanthrope-goth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dcan's Issues

code

hello,could you please upload the code and features? thank you

For citing

Could you give me citing for AAAI2022 about BibTeX . We will cite your paper

TSN features

How to build TSN features for our own datasets

Too low accuracy

Excuse me, why did I get a low accuracy after training and testing according to your method

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.