Git Product home page Git Product logo

internimage's Introduction

InternImage

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

This repository is an official implementation of the InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

Paper | Blog in Chinese

News

  • Feb 28, 2023: InternImage is accepted to CVPR 2023!
  • Nov 18, 2022: ๐Ÿš€ InternImage-XL merged into BEVFormer v2 achieves state-of-the-art performance of 63.4 NDS on nuScenes Camera Only.
  • Nov 10, 2022: ๐Ÿš€๐Ÿš€ InternImage-H achieves a new record 65.4 mAP on COCO detection test-dev and 62.9 mIoU on ADE20K, outperforming previous models by a large margin.

Coming soon

  • Other downstream tasks.
  • TensorRT inference.
  • Classification code of the InternImage series.
  • InternImage-T/S/B/L/XL ImageNet-1k pretrained model.
  • InternImage-L/XL ImageNet-22k pretrained model.
  • InternImage-T/S/B/L/XL detection and instance segmentation model.
  • InternImage-T/S/B/L/XL semantic segmentation model.

Introduction

InternImage, initially described in arxiv, can be a general backbone for computer vision. It takes deformable convolution as the core operator to obtain large effective receptive fields, and introducing adaptive spatial aggregation to reduces the strict inductive bias. Our model makes it possible to learn more stronger and robust models with large-scale parameters from massive data.

Main Results on ImageNet with Pretrained Models

ImageNet-1K and ImageNet-22K Pretrained InternImage Models

name pretrain resolution acc@1 #params FLOPs 22K model 1K model
InternImage-T ImageNet-1K 224x224 83.5 30M 5G - ckpt | cfg
InternImage-S ImageNet-1K 224x224 84.2 50M 8G - ckpt | cfg
InternImage-B ImageNet-1K 224x224 84.9 97M 16G - ckpt | cfg
InternImage-L ImageNet-22K 384x384 87.7 223M 108G ckpt ckpt | cfg
InternImage-XL ImageNet-22K 384x384 88.0 335M 163G ckpt ckpt | cfg

Main Results on Downstream Tasks

COCO Object Detection

backbone method schd box mAP mask mAP #params FLOPs Download
InternImage-T Mask R-CNN 1x 47.2 42.5 49M 270G ckpt | cfg
InternImage-T Mask R-CNN 3x 49.1 43.7 49M 270G ckpt | cfg
InternImage-S Mask R-CNN 1x 47.8 43.3 69M 340G ckpt | cfg
InternImage-S Mask R-CNN 3x 49.7 44.5 69M 340G ckpt | cfg
InternImage-B Mask R-CNN 1x 48.8 44.0 115M 501G ckpt | cfg
InternImage-B Mask R-CNN 3x 50.3 44.8 115M 501G ckpt | cfg
InternImage-L Cascade 1x 54.9 47.7 277M 1399G ckpt | cfg
InternImage-L Cascade 3x 56.1 48.5 277M 1399G ckpt | cfg
InternImage-XL Cascade 1x 55.3 48.1 387M 1782G ckpt | cfg
InternImage-XL Cascade 3x 56.2 48.8 387M 1782G ckpt | cfg

ADE20K Semantic Segmentation

backbone resolution single scale multi scale #params FLOPs Download
InternImage-T 512x512 47.9 48.1 59M 944G ckpt | cfg
InternImage-S 512x512 50.1 50.9 80M 1017G ckpt | cfg
InternImage-B 512x512 50.8 51.3 128M 1185G ckpt | cfg
InternImage-L 640x640 53.9 54.1 256M 2526G ckpt | cfg
InternImage-XL 640x640 55.0 55.3 368M 3142G ckpt | cfg

Main Results of FPS

name resolution #params FLOPs Batch 1 FPS(TensorRT)
InternImage-T 224x224 30M 5G 156
InternImage-S 224x224 50M 8G 129
InternImage-B 224x224 97M 16G 116
InternImage-L 384x384 223M 108G 56
InternImage-XL 384x384 335M 163G 47

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{wang2022internimage,
  title={InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions},
  author={Wang, Wenhai and Dai, Jifeng and Chen, Zhe and Huang, Zhenhang and Li, Zhiqi and Zhu, Xizhou and Hu, Xiaowei and Lu, Tong and Lu, Lewei and Li, Hongsheng and others},
  journal={arXiv preprint arXiv:2211.05778},
  year={2022}
}

internimage's People

Contributors

czczup avatar zhenhanghuang avatar wofmanaf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.