BoxMOTS

This is the official pytorch implementation for our weakly supervised MOTS work: Towards High Quality Multi-Object Tracking and Segmentation without Mask Supervision. This project includes four parts: main model, data association method, optical flow model, and shadow detection model.

Highlights

Box supervised multi-object tracking and segmentation model. Only bounding box labels are used in the training stage.
Superior performance than previous works. 12.4% improvement on sMOTSA, 7.3% improvement on MOTSA, and 8.2% improvement on MOTSP on the KITTI MOTS dataset.
Flexible modules. Optical flow model and shadow detection model are used on-demand. They can also be replaced by more advanced optical flow/shadow detection models to achieve better performance.

Visualization Results

BoxMOTS visualization results on KITTI MOTS, BDD100K MOTS, MOSE (VOS dataset), and YouTube-VIS 2019 (VIS dataset) (from top to bottom). For results on MOSE and YouTube-VIS 2019, the BoxMOTS model trained on KITTI MOTS is used to directly make predictions on MOSE and YouTube-VIS 2019, without training on these two datasets.

Abstract

Recently studies have shown the potential of weakly supervised multi-object tracking and segmentation, but the drawbacks of coarse pseudo mask label and limited utilization of temporal information remain to be unresolved. To address these issues, we present a framework that directly uses box label to supervise the segmentation network without resorting to pseudo mask label. In addition, we propose to fully exploit the temporal information from two perspectives. Firstly, we integrate optical flow-based pairwise consistency to ensure mask consistency across frames, thereby improving mask quality for segmentation. Secondly, we propose a temporally adjacent pair-based sampling strategy to adapt instance embedding learning for data association in tracking. We combine these techniques into an end-to-end deep model, named BoxMOTS, which requires only box annotation without mask supervision. Extensive experiments demonstrate that our model surpasses current state-of-the-art by a large margin, and produces promising results on KITTI MOTS and BDD100K MOTS.

Main Model

Main model generates detection, segmentation, and object embedding results. This part is contained in the boxmots folder. Please go to the README file under that folder for usage details.

Data Association Method

We use DeepSORT for data association, based on both motion and appearance information. This part is contained in the StrongSORT folder. Please go to the the README file under that folder for usage details.

Optical Flow Model

We use the GMA method to generate optical flow results for the KITTI MOTS and BDD100K MOTS training sets. Optical flow results are used to train the main model. This part is contained in the GMA folder. Please go to the the README file under that folder for usage details.

Shadow Detection Model

We use the SSIS method to detect the shadow and remove it from the car-like object's segmentation result. Shadow detection results are used in the inference process. This part is contained in the SSIS folder. Please go to the README file under that folder for usage details.

TODO

Citation

If you find this project helpful, feel free to cite our work.

@article{cheng2024towards,
  title={Towards High Quality Multi-Object Tracking and Segmentation without Mask Supervision},
  author={Cheng, Wensheng and Wu, Yi and Wu, Zhenyu and Ling, Haibin and Hua, Gang},
  journal={IEEE Transactions on Image Processing},
  year={2024},
  publisher={IEEE}
}

Acknowledgements

Thanks AdelaiDet for the BoxInst implementation.
Thanks StrongSORT for the DeepSORT implementation.
Thanks GMA for the optical flow model.
Thanks SSIS for the shadow detection model.

spritea / boxmots Goto Github PK

boxmots's Introduction

BoxMOTS

Highlights

Visualization Results

Abstract

Main Model

Data Association Method

Optical Flow Model

Shadow Detection Model

TODO

Citation

Acknowledgements

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent