Git Product home page Git Product logo

cvpr2022-papers-with-code's Introduction

CVPR 2022 论文和开源项目合集(Papers with Code)

CVPR 2022 论文和开源项目合集(papers with code)!

CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view

注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

【CVPR 2022 论文开源目录】

Backbone

A ConvNet for the 2020s

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

MPViT : Multi-Path Vision Transformer for Dense Prediction

Mobile-Former: Bridging MobileNet and Transformer

MetaFormer is Actually What You Need for Vision

Shunted Self-Attention via Multi-Scale Token Aggregation

TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing

CLIP

HairCLIP: Design Your Hair by Text and Reference Image

PointCLIP: Point Cloud Understanding by CLIP

Blended Diffusion for Text-driven Editing of Natural Images

GAN

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Style Transformer for Image Inversion and Editing

NAS

β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

OCR

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

NeRF

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Point-NeRF: Point-based Neural Radiance Fields

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images

Urban Radiance Fields

Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

Visual Transformer

Backbone

MPViT : Multi-Path Vision Transformer for Dense Prediction

MetaFormer is Actually What You Need for Vision

Mobile-Former: Bridging MobileNet and Transformer

Shunted Self-Attention via Multi-Scale Token Aggregation

应用(Application)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Embracing Single Stride 3D Object Detector with Sparse Transformer

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Spatio-temporal Relation Modeling for Few-shot Action Recognition

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

GroupViT: Semantic Segmentation Emerges from Text Supervision

Restormer: Efficient Transformer for High-Resolution Image Restoration

Splicing ViT Features for Semantic Appearance Transfer

Self-supervised Video Transformer

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Accelerating DETR Convergence via Semantic-Aligned Matching

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Style Transformer for Image Inversion and Editing

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Mask Transfiner for High-Quality Instance Segmentation

Language as Queries for Referring Video Object Segmentation

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

AdaMixer: A Fast-Converging Query-Based Object Detector

Omni-DETR: Omni-Supervised Object Detection with Transformers

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

视觉和语言(Vision-Language)

Conditional Prompt Learning for Vision-Language Models

Bridging Video-text Retrieval with Multiple Choice Question

自监督学习(Self-supervised Learning)

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

Crafting Better Contrastive Views for Siamese Representation Learning

HCSC: Hierarchical Contrastive Selective Coding

数据增强(Data Augmentation)

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

AlignMix: Improving representation by interpolating aligned features

目标检测(Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Accelerating DETR Convergence via Semantic-Aligned Matching

Localization Distillation for Dense Object Detection

Focal and Global Knowledge Distillation for Detectors

A Dual Weighting Label Assignment Scheme for Object Detection

AdaMixer: A Fast-Converging Query-Based Object Detector

Omni-DETR: Omni-Supervised Object Detection with Transformers

目标跟踪(Visual Tracking)

Correlation-Aware Deep Tracking

TCTrack: Temporal Contexts for Aerial Tracking

多模态目标跟踪

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

多目标跟踪(Multi-Object Tracking)

Learning of Global Objective for Network Flow in Multi-Object Tracking

语义分割(Semantic Segmentation)

弱监督语义分割

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

半监督语义分割

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

域自适应语义分割

Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation

无监督语义分割

GroupViT: Semantic Segmentation Emerges from Text Supervision

实例分割(Instance Segmentation)

BoxeR: Box-Attention for 2D and 3D Transformers

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Mask Transfiner for High-Quality Instance Segmentation

自监督实例分割

FreeSOLO: Learning to Segment Objects without Annotations

视频实例分割

Efficient Video Instance Segmentation via Tracklet Query and Proposal

小样本分割(Few-Shot Segmentation)

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

视频理解(Video Understanding)

Self-supervised Video Transformer

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

行为识别(Action Recognition)

Spatio-temporal Relation Modeling for Few-shot Action Recognition

动作检测(Action Detection)

End-to-End Semi-Supervised Learning for Video Action Detection

图像编辑(Image Editing)

Style Transformer for Image Inversion and Editing

Blended Diffusion for Text-driven Editing of Natural Images

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Low-level Vision

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Restormer: Efficient Transformer for High-Resolution Image Restoration

超分辨率(Super-Resolution)

图像超分辨率(Image Super-Resolution)

Learning the Degradation Distribution for Blind Image Super-Resolution

视频超分辨率(Video Super-Resolution)

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

去模糊(Deblur)

图像去模糊(Image Deblur)

Learning to Deblur using Light Field Generated and Real Defocus Images

3D点云(3D Point Cloud)

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

A Unified Query-based Paradigm for Point Cloud Understanding

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

PointCLIP: Point Cloud Understanding by CLIP

3D目标检测(3D Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

Embracing Single Stride 3D Object Detector with Sparse Transformer

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

3D语义分割(3D Semantic Segmentation)

Scribble-Supervised LiDAR Semantic Segmentation

3D目标跟踪(3D Object Tracking)

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

3D人体姿态估计(3D Human Pose Estimation)

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

3D语义场景补全(3D Semantic Scene Completion)

MonoScene: Monocular 3D Semantic Scene Completion

3D重建(3D Reconstruction)

BANMo: Building Animatable 3D Neural Models from Many Casual Videos

伪装物体检测(Camouflaged Object Detection)

Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection

深度估计(Depth Estimation)

单目深度估计

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior

立体匹配(Stereo Matching)

ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching

车道线检测(Lane Detection)

Rethinking Efficient Lane Detection via Curve Modeling

图像修复(Image Inpainting)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding

图像检索(Image Retrieval)

Correlation Verification for Image Retrieval

人脸识别(Face Recognition)

AdaFace: Quality Adaptive Margin for Face Recognition

人群计数(Crowd Counting)

Leveraging Self-Supervision for Cross-Domain Crowd Counting

医学图像(Medical Image)

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification

场景图生成(Scene Graph Generation)

SGTR: End-to-end Scene Graph Generation with Transformer

参考视频目标分割(Referring Video Object Segmentation)

Language as Queries for Referring Video Object Segmentation

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

风格迁移(Style Transfer)

StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions

Adversarial Examples(对抗样本)

Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon

弱监督物体检测(Weakly Supervised Object Localization)

Weakly Supervised Object Localization as Domain Adaption

雷达目标检测(Radar Object Detection)

Exploiting Temporal Relations on Radar Perception for Autonomous Driving

高光谱图像重建(Hyperspectral Image Reconstruction)

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

图像拼接(Image Stitching)

Deep Rectangling for Image Stitching: A Learning Baseline

水印(Watermarking)

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings

Grounded Situation Recognition

Collaborative Transformers for Grounded Situation Recognition

Zero-shot Learning

Unseen Classes at a Later Time? No Problem

数据集(Datasets)

It's About Time: Analog Clock Reading in the Wild

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

Kubric: A scalable dataset generator

Scribble-Supervised LiDAR Semantic Segmentation

Deep Rectangling for Image Stitching: A Learning Baseline

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Shape from Polarization for Complex Scenes in the Wild

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

新任务(New Task)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

It's About Time: Analog Clock Reading in the Wild

Splicing ViT Features for Semantic Appearance Transfer

其他(Others)

Kubric: A scalable dataset generator

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Balanced MSE for Imbalanced Visual Regression

SNUG: Self-Supervised Neural Dynamic Garments

Shape from Polarization for Complex Scenes in the Wild

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.