Git Product home page Git Product logo

boxmots's Introduction

BoxMOTS

This is the official pytorch implementation for our weakly supervised MOTS work: Towards High Quality Multi-Object Tracking and Segmentation without Mask Supervision. This project includes four parts: main model, data association method, optical flow model, and shadow detection model.

Highlights

  • Box supervised multi-object tracking and segmentation model. Only bounding box labels are used in the training stage.
  • Superior performance than previous works. 12.4% improvement on sMOTSA, 7.3% improvement on MOTSA, and 8.2% improvement on MOTSP on the KITTI MOTS dataset.
  • Flexible modules. Optical flow model and shadow detection model are used on-demand. They can also be replaced by more advanced optical flow/shadow detection models to achieve better performance.

Visualization Results

BoxMOTS visualization results on KITTI MOTS, BDD100K MOTS, MOSE (VOS dataset), and YouTube-VIS 2019 (VIS dataset) (from top to bottom). For results on MOSE and YouTube-VIS 2019, the BoxMOTS model trained on KITTI MOTS is used to directly make predictions on MOSE and YouTube-VIS 2019, without training on these two datasets.

Abstract

Recently studies have shown the potential of weakly supervised multi-object tracking and segmentation, but the drawbacks of coarse pseudo mask label and limited utilization of temporal information remain to be unresolved. To address these issues, we present a framework that directly uses box label to supervise the segmentation network without resorting to pseudo mask label. In addition, we propose to fully exploit the temporal information from two perspectives. Firstly, we integrate optical flow-based pairwise consistency to ensure mask consistency across frames, thereby improving mask quality for segmentation. Secondly, we propose a temporally adjacent pair-based sampling strategy to adapt instance embedding learning for data association in tracking. We combine these techniques into an end-to-end deep model, named BoxMOTS, which requires only box annotation without mask supervision. Extensive experiments demonstrate that our model surpasses current state-of-the-art by a large margin, and produces promising results on KITTI MOTS and BDD100K MOTS.

Main Model

Main model generates detection, segmentation, and object embedding results. This part is contained in the boxmots folder. Please go to the README file under that folder for usage details.

Data Association Method

We use DeepSORT for data association, based on both motion and appearance information. This part is contained in the StrongSORT folder. Please go to the the README file under that folder for usage details.

Optical Flow Model

We use the GMA method to generate optical flow results for the KITTI MOTS and BDD100K MOTS training sets. Optical flow results are used to train the main model. This part is contained in the GMA folder. Please go to the the README file under that folder for usage details.

Shadow Detection Model

We use the SSIS method to detect the shadow and remove it from the car-like object's segmentation result. Shadow detection results are used in the inference process. This part is contained in the SSIS folder. Please go to the README file under that folder for usage details.

TODO

  • Repo setup.
  • Add code of main model.
  • Add code of data association.
  • Add code of the optical flow model.
  • Add code of the shadow detection model.
  • Complete the full pipeline.

Citation

If you find this project helpful, feel free to cite our work.

@article{cheng2024towards,
  title={Towards High Quality Multi-Object Tracking and Segmentation without Mask Supervision},
  author={Cheng, Wensheng and Wu, Yi and Wu, Zhenyu and Ling, Haibin and Hua, Gang},
  journal={IEEE Transactions on Image Processing},
  year={2024},
  publisher={IEEE}
}

Acknowledgements

  • Thanks AdelaiDet for the BoxInst implementation.
  • Thanks StrongSORT for the DeepSORT implementation.
  • Thanks GMA for the optical flow model.
  • Thanks SSIS for the shadow detection model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.