Git Product home page Git Product logo

vidvrd-tracklets's Introduction

Update

We proposed a new VidSGG framework: Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs, which is accepted by CVPR2022. code is released here.

Update

We updated the code for taining MEGA on VidVRD-dataset, including following files:

  • MEGA/datasets/vidvrd-dataset/img_index
  • MEGA/mega_core/data/datasets/vidvrd.py
  • MEGA/tools/extract_coco_vidvrd.py

See if these files can help in case you need to train the MEGA by yourself.

VidOR-tracklets here

We won the 1st place of Video Relation Understanding (VRU) Grand Challenge in ACM Multimedia 2021. The corresponding technical report: here or arXiv version.

This repository contains codes for Video Visual Relation Detection (VidVRD) tracklets generation based on MEGA and deepSORT. These tracklets are also suitable for ACM MM Visual Relation Understanding (VRU) Grand Challenge (which is base on the VidOR dataset).

If you are only interested in the generated tracklets, ​you can ignore the code and download them directly from here

Download generated tracklets directly

We release the object tracklets for VidOR train/validation/test set. You can download the tracklets here, and put them in the following folder as

├── deepSORT
│   ├── ...
│   ├── tracking_results
│   │   ├── VidORtrain_freq1_m60s0.3_part01
│   │   ├── ...
│   │   ├── VidORtrain_freq1_m60s0.3_part14
│   │   ├── VidORval_freq1_m60s0.3
│   │   ├── VidORtest_freq1_m60s0.3
│   │   ├── readme.md
│   │   └── format_demo.py
│   └── ...
├── MEGA
│   ├── ... 
│   └── ...

Please refer to deepSORT/tracking_results/readme.md for more details

Evaluate the tracklets mAP

Run python deepSORT/eval_traj_mAP.py to evaluate the tracklets mAP. (you might need to change some args in deepSORT/eval_traj_mAP.py)

Generate object tracklets by yourself

The object tracklets generation pipeline mainly consists of two parts: MEGA (for video object detection), and deepSORT (for video object tracking).

Quick Start

  1. Install MEGA as the official instructions MEGA/INSTALL.md (Note that the folder path may be different when installing).

    • If you have any trouble when installing MEGA, you can try to clone the official MEGA repository and install it, and then replace the official mega.pytorch/mega_core with our modified MEGA/mega_core. Refer to MEGA/modification_details.md for the details of our modifications.
  2. Download the VidOR dataset and the pre-trained weight of MEGA. Put them in the following folder as

├── deepSORT/
│   ├── ...
├── MEGA/
│   ├── ... 
│   ├── datasets/
│   │   ├── COCOdataset/        # used for MEGA training
│   │   ├── COCOinVidOR/        # used for MEGA training
│   │   ├── vidor-dataset/
│   │   │   ├── annotation/
│   │   │   │   ├── training/
│   │   │   │   └── validation/
│   │   │   ├── img_index/ 
│   │   │   │   ├── VidORval_freq1_0024.txt
│   │   │   │   ├── ...
│   │   │   ├── val_frames/
│   │   │   │   ├── 0001_2793806282/
│   │   │   │   │   ├── 000000.JPEG
│   │   │   │   │   ├── ...
│   │   │   │   ├── ...
│   │   │   ├── val_videos/
│   │   │   │   ├── 0001/
│   │   │   │   │   ├── 2793806282.mp4
│   │   │   │   │   ├── ...
│   │   │   │   ├── ...
│   │   │   ├── train_frames/
│   │   │   ├── train_videos/
│   │   │   ├── test_frames/
│   │   │   ├── test_videos/
│   │   │   └── video2img_vidor.py
│   │   └── construct_img_idx.py
│   ├── training_dir/
│   │   ├── COCO34ORfreq32_4gpu/
│   │   │   ├── inference/
│   │   │   │   ├── VidORval_freq1_0024/
│   │   │   │   │   ├── predictions.pth
│   │   │   │   │   └── result.txt
│   │   │   │   ├── ...
│   │   │   └── model_0180000.pth
│   │   ├── ...
  1. Run python MEGA/datasets/vidor-dataset/video2img_vidor.py (note that you may need to change some args) to extract frames from videos (This causes a lot of data redundancy, but we have to do this, because MEGA takes image data as input).

  2. Run python MEGA/datasets/construct_img_idx.py (note that you may need to change some args) to generate the img_index used in MEGA inference.

    • The generated .txt files will be saved in MEGA/datasets/vidor-dataset/img_index/. You can use VidORval_freq1_0024.txt as a demo for the following commands.
  3. Run the following command to detect frame-level object proposals with bbox features (RoI pooled features).

    CUDA_VISIBLE_DEVICES=0   python  \
        MEGA/tools/test_net.py \
        --config-file MEGA/configs/MEGA/inference/VidORval_freq1_0024.yaml \
        MODEL.WEIGHT MEGA/training_dir/COCO34ORfreq32_4gpu/model_0180000.pth \
        OUTPUT_DIR MEGA/training_dir/COCO34ORfreq32_4gpu/inference
    
    • The above command will generate a predictions.pth file for this VidORval_freq1_0024 demo. We also release this predictions.pth here.

    • the config files for VidOR train set are in MEGA/configs/MEGA/partxx

    • The predictions.pth contains frame-level box positions and features (RoI features) for each object. For RoI features, they can be accessed through roifeats = boxlist.get_field("roi_feats"), if you are familiar with MEGA or maskrcnn-benchmark

  4. Run python MEGA/mega_boxfeatures/cvt_proposal_result.py (note that you may need to change some args) to convert predictions.pth to a .pkl file for the following deepSORT stage.

    • We also provide VidORval_freq1_0024.pkl here
  5. Run python deepSORT/deepSORT_tracking_v2.py (note that you may need to change some args) to perform deepSORT tracking. The results will be saved in deepSORT/tracking_results/

Train MEGA for VidOR by yourself

  1. Download MS-COCO and put them as shown in above.

  2. Run python MEGA/tools/extract_coco.py to extract annotations for COCO in VidOR, which results in COCO_train_34classes.pkl and COCO_valmini_34classes.pkl

  3. train MEGA by the following commands:

    python -m torch.distributed.launch \
        --nproc_per_node=4 \
        tools/train_net.py \
        --master_port=$((RANDOM + 10000)) \
        --config-file MEGA/configs/MEGA/vidor_R_101_C4_MEGA_1x_4gpu.yaml \
        OUTPUT_DIR MEGA/training_dir/COCO34ORfreq32_4gpu

Extract the bbox features based on the given bbox positions

  • refer to MEGA/tools/extract_gt_boxfeatures.py

  • some notes

    The MEGA code is so fucking complex. When I extracting the bbox features, I just run MEGA inference to obtain the bbox positions and store the bbox features (RoI features) for each frame simultaneously.

    Again, the MEGA code is so fucking complex, Although I wrote the code extract_gt_boxfeatures.py, I can not make sure the feature extracted based on given bbox are the same as stored simultaneously by MEGA inference. There are some random shuffle in the memory bank of MEGA.

If our work is helpful for your research, please cite our publication:

@inproceedings{gao2021classification,
  title={Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs},
  author={Gao, Kaifeng and Chen, Long and Niu, Yulei and Shao, Jian and Xiao, Jun},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2022}
}

vidvrd-tracklets's People

Contributors

dawn-lx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.