Git Product home page Git Product logo

dmcl's Introduction

Overview

codebase/preproc/make_lists: scripts to produce list of train/val/test files. Corresponds to step 1 of the Data Pipeline section described below.
codebase/preproc/convert*: scripts to produce tfrecords. Corresponds to step 2 and 3 of the Data Pipeline section described below.

codebase/utils.py: essentially contains functions to deal with the infrastructure, like building filepaths and read lists of files. This file contains the paths to data and lists that should be modified to match your setup.
codebase/restorers.py: functions to restore checkpoints.
codebase/parsers.py: functions to parse tfrecords, i.e. to read tfrecords and build training and testing batches.

nets/resnet_official/: forked from tensorflow resnet official repo. It is modified to reflect the model proposed in A Closer Look at Spatiotemporal Convolutions for Action Recognition, CVPR 2018.

dmcl.py: implements the method presented in the paper.

Prerequisites

This work was developed using Python 3 and Tensorflow 1.12

Data Pipeline

The input to this model are batches of frames of three modalities: RGB, optical flow, and depth. Each modality is associated with a different deep neural network. This is how we implemented the data pipeline for this experiment.

  1. write a json file that contains three lists of video_ids: training, validation, and testing.

  2. training tfrecords: write one tfrecord for each of the training video_ids.
    Each tfrecord, corresponding to a video_id, contains three lists: paths to all the RGB frames, paths to all the depth frames, and paths to all the optical flow frames.

  3. test/val tfrecords: write ten tfrecords for each of the testing and validation video_ids.
    Each tfrecord contains three lists: paths to the RGB frames, paths to the depth frames, and paths to the optical flow frames.
    Each tfrecord corresponds to a clip of eight randomly sampled frames of a video. The indices are the same across modalities.
    These frames are samped once when writing these tfrecords and then re-used for all the experiments. The final prediction for a validation/test video_id is the average of the predictions for each of the ten clips.

Steps 1,2,3 are run only once.

For training: dmcl.py reads the list of training video_ids produced in step 1. For each training step, it randomly samples batch_sz videos (default is 5). For each of these videos, it is randomly sampled a clip of eight frames. The indices of the sampled frames are the same across modalities. Thus, one training batch is composed of batch_sz clips from different videos, each of eight frames. Each modality network is served with the corresponding modality batch of samples.

For validation and testing: dmcl.py reads all the validation and testing video_ids produced in step 1. It then proceeds to build the paths to each of the ten clips per video, produced in step 3. The frames of these clips are used as input to the corresponding modality network.

Suggestions to improve the usability of the data pipeline are very welcome.

Training

python dmcl.py --dset=nwucla --eval=cross_view --temp=2 

Other options are available, please check ./codebase/utils.get_arguments().
Each experiment run will create two folders: one for logging, with checkpoints and a .txt file with the output, and one for saving checkpoints.
At the end of the training, the model restores the checkpoint that had the best validation accuracy during training, and runs the test set.

Reference

DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition - PDF
Nuno Cruz Garcia, Sarah Adel Bargal, Vitaly Ablavsky, Pietro Morerio, Vittorio Murino, Stan Sclaroff

  @article{garcia2019dmcl,
  title={DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition},
  author={Garcia, Nuno C and Bargal, Sarah Adel and Ablavsky, Vitaly and Morerio, Pietro and Murino, Vittorio and Sclaroff, Stan},
  journal={arXiv preprint arXiv:1912.10982},
  year={2019}
}

dmcl's People

Contributors

ncgarcia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

huangzushu

dmcl's Issues

Request for report to DMCL performance on NTU RGB+D 120 cross-setup

Thanks to your great work, I have followed well action recognition with
privileged modalities.

I am researching the topic on some ablation studies, especially results
evaluated on NTU RGB+D 120 benchmarks in cross-setup.

I report the performance of your work evaluated on the benchmarks in
cross-setup on my ongoing paper.

So, I would appreciated it if you let me know the results of your work
on the evaluation protocol.

Thank you regards.

Requiring text file including the list of subjects for validation

Thanks for your great work.
I am converting framework of DMCL to pytorch now. Almost done with this work, I figure out that the list of subject for validation was not reported in the paper. So, I would appreciate it if you could provide the text file about validation subjects list.

How to generate flow data?

There's no flow data in NWUCLA dataset. Could you please provide the code to generate the flow data that meets your format requirement?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.