Git Product home page Git Product logo

cvpr-2024-highlight-oral's Introduction

CVPR-2024-Oral

Geometry

Rethinking Inductive Biases for Surface Normal Estimation

Scene Understanding

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Reconstruction

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

Embodied AI

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Multi-Modal

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Video

FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring

CVPR-2024-Highlight

3D Presentation(GS, NeRF)

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

HashPoint: Accelerated Point Searching and Sampling for Neural Rendering

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Diffusion

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

Presentation

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

LLM

VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

3D Generation

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

2D Generation

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Digital Human

A 4D Dataset ofReal-World Human Clothing With Semantic Annotations

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

3D Human Pose Perception from Egocentric Stereo Videos

Relightable and Animatable Neural Avatar from Sparse-View Video

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Human-Scene

Scaling Up Dynamic Human-Scene Interaction Modeling

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Multi-Modal

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Video

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Putting the Object Back into Video Object Segmentation

Boosting Neural Representations for Videos with a Conditional Decoder

Enhancing Video Super-Resolution via Implicit Resampling-based Alignment

VTimeLLM: Empower LLM to Grasp Video Moments

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Image

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Novel View Synthesis

CoPoNeRF: Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs

SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image

Reconstruction

IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Segmentation

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

GraCo: Granularity-Controllable Interactive Segmentation

OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies

Frequency-Adaptive Dilated Convolution for Semantic Segmentation

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Embodied AI

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

Pose Estimation

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Autonomous Driving

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Generalized Predictive Model for Autonomous Driving

Dynamic LiDAR Re-simulation using Compositional Neural Fields

SLAM

Gaussian Splatting SLAM

SFM

VGGSfM: Visual Geometry Grounded Deep Structure From Motion

Camera Pose

Map-Relative Pose Regression for Visual Re-Localization

FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

Medical Image Analysis

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

Diversified and Personalized Multi-rater Medical Image Segmentation

Robotic Manipulation

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

Knowledge Distillation

Logit Standardization in Knowledge Distillation

Feature Matching

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

Stereo Matching

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Brain Decoding

MindBridge: A Cross-Subject Brain Decoding Framework

cvpr-2024-highlight-oral's People

Contributors

monad-cube avatar

Stargazers

ppTanya avatar Clarence avatar Yong Zu avatar  avatar  avatar Yongchang Zhang avatar zcs avatar  avatar Shi Guo avatar Chaofeng Chen avatar chenhaomingbob avatar  avatar ChanChan avatar

Watchers

 avatar

Forkers

zhangzw12319

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.