This repo holds codes of the solution for the Sign Spotting Challenge at ECCV (Multi-Shot Supervised Learning track)

Our team ranked 3rd in the final test phase.

Team leader: Xilin Chen

Team member: Yuecong Min, Peiqi Jiao, Aiming Hao

Reproduce Result and Conduct Relevant Experiments

We produce the extracted features from multuple modalities for sign spotting, which can be trained in ten minutes and achieve acceptable performance. To reproduce the result, you need:

1. Preparation

Our solution is based on the basic opencv and pytorch and provide requirements for conda environment:

conda create --name <env> --file requirements.txt

The extracted features and the trained model can be downloaded from Google Drive. After download the extracted features, unzip them to the dataset folder:

unzip extracted features.zip -d ./dataset/

The main desired directory tree is expected as follows,

.
├── configs
│   ├── test
│   └── train
├── dataset
│   ├── data_preprocess.sh
│   └── MSSL_dataset
│        ├── final_train_input.txt
│        ├── train_input.txt
│        └── valid_input.txt
│        ├── test_input.txt
│        ├── TRAIN
│        │   ├── MSSL_TRAIN_SET_GT.pkl
│        │   └── MSSL_TRAIN_SET_GT_TXT
│        ├── VALIDATION
│        │   ├── MSSL_VAL_SET_GT.pkl
│        │   └── MSSL_VAL_SET_GT_TXT
│        └── processed
│             ├── features
│             │   ├── flow
│             │   ├── mask_video
│             │   ├── skeleton
│             │   └── video
│             ├── test
│             │   ├── clipwise_label
│             │   └── framewise_label
│             ├── train
│             │   ├── clipwise_label
│             │   └── framewise_label
│             └── valid
│                 ├── clipwise_label
│                 └── framewise_label
├── weights 
│        ├──final_model.pth
└── submission
         ├── ref
         │   └── ground_truth.pkl
         └── res
              └── predictions.pkl

2. Evaluation

For evaluation with the provided model (./final_model.pth), simply run:

python generate_predictions.py

The final prediction can be found in submission/prediction_validate/res/predictions.pkl.

For training the final spotting model with the extracted features, simply run (it takes about 10 minutes):

python main.py --config ./configs/train/fusion_detector.yml

3. Feature Extraction and Training (if needed)

The Feature Extraction process generate cropped video, optical flow, skeleton and masked video. To obtain skeleton data, we adopt mediapipe for pose and hands estimation, which should be installed first.

Download the data set provided in challenge and put them in ./dataset/, then run the script:

cd dataset
bash data_preprocess.sh

The organization is expected as follows:

dataset
├── data_preprocess.sh
└── MSSL_dataset
   ├── final_train_input.txt
   ├── train_input.txt
   ├── valid_input.txt
   ├── test_input.txt
   ├── TRAIN
   ├── VALIDATION
   ├── MSSL_TEST_SET_VIDEOS
   └── processed
        ├── train_pose.pkl
        ├── valid_pose.pkl
        ├── test_pose.pkl
        ├── train
        │   ├── original_video
        │   ├── video
        │   ├── flow
        │   ├── pose
        │   ├── clipwise_label
        │   └── framewise_label
        ├── valid
        │   ├── original_video
        │   ├── video
        │   ├── flow
        │   ├── pose
        │   ├── clipwise_label
        │   └── framewise_label
        └── test
             ├── original_video
             ├── video
             ├── flow
             ├── pose
             ├── clipwise_label
             └── framewise_label

We adopt a two-round training scheme for feature extraction, in the first round, only a subset that contains clips of query signs is built to increase the discriminative ability of the backbone. On the second round, all clips are used for training.

For the first round training, run the command:

python main.py --config ./configs/train/video_config.yml
python main.py --config ./configs/train/mask_video_config.yml
python main.py --config ./configs/train/skeleton_config.yml
python main.py --config ./configs/train/skeleton_config.yml

Then select the best (validation) or the last (test) weight for the next round training, by modifying the remove_bg=False and weights=<path_to_best_weight>in the config files, and run the above command again.

For the feature extraction, modify the weight path in feature_extraction, and run

python feature_extraction.py

which will generate feats as step 1 shown.

Relevant Repos

I3D code and pretrained model
P3D code and pretrained model
ST-GCN

For more information, please contact Yuecong Min (yuecong.min [AT] vipl.ict.ac.cn)

ycmin95 / chalearn_2022_sign_spotting_mssl_track Goto Github PK

chalearn_2022_sign_spotting_mssl_track's Introduction

This repo holds codes of the solution for the Sign Spotting Challenge at ECCV (Multi-Shot Supervised Learning track)

Reproduce Result and Conduct Relevant Experiments

1. Preparation

2. Evaluation

3. Feature Extraction and Training (if needed)

Relevant Repos

chalearn_2022_sign_spotting_mssl_track's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent