Git Product home page Git Product logo

mvanet's Introduction

MVANet

The official repo of the CVPR 2024 paper (Highlight), Multi-view Aggregation Network for Dichotomous Image Segmentation

PWC PWC PWC PWC PWC

Introduction

Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.

Human visual system captures regions of interest by observing them from multiple views. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet), which unifies the feature fusion of the distant view and close-up view into a single stream with one encoder-decoder structure. Specifically, we split the high-resolution input images from the original view into the distant view images with global information and close-up view images with local details. Thus, they can constitute a set of complementary multi-view low-resolution input patches.

image

Moreover, two efficient transformer-based multi-view complementary localization and refinement modules (MCLM & MCRM) are proposed to jointly capturing the localization and restoring the boundary details of the targets.

image

We achieves state-of-the-art performance in terms of almost all metrics on the DIS benchmark dataset.

image

We have optimized the code and achieved an enhanced FPS performance, reaching 15.2.

image

Here are some of our visual results:

image

I. Requiremets

  • python==3.7
  • torch==1.10.0
  • torchvision==0.11.0
  • mmcv-full==1.3.17
  • mmdet==2.17.0
  • mmengine==0.8.1
  • mmsegmentation==0.19.0
  • numpy
  • ttach
  • einops
  • timm
  • scipy

II. Training

  1. Download the pretrained model at Google Drive.
  2. Then, you can start training by simply run:
python train.py

III. Testing

  1. Update the data path in config file ./utils/config.py (line 4~8)

  2. Replace the existing path with the path to your saved model in ./predict.py (line 14)

    You can also download our trained model at Google Drive.

  3. Start predicting by:

python predict.py
  1. Change the predicted map path in ./test.py (line 17) and start testing:
python test.py

You can get our prediction maps at Google Drive.

To Do List

  • Release our camere-ready paper on arxiv (done)
  • Release our training code (done)
  • Release our model checkpoints (done)
  • Release our prediction maps (done)

Citations

@article{yu2024multi,
  title={Multi-view Aggregation Network for Dichotomous Image Segmentation},
  author={Yu, Qian and Zhao, Xiaoqi and Pang, Youwei and Zhang, Lihe and Lu, Huchuan},
  journal={arXiv preprint arXiv:2404.07445},
  year={2024}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.