Git Product home page Git Product logo

jianghaojun / awesome-3d-vision-and-language Goto Github PK

View Code? Open in Web Editor NEW
95.0 4.0 5.0 34 KB

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.

License: MIT License

3d-deep-learning computer-vision multimodal-deep-learning natural-language-processing point-cloud visual-grounding awesome 3d-vision-and-language

awesome-3d-vision-and-language's Introduction

Awesome-3D-Vision-and-Language Awesome

A curated list of research papers in 3D visual grounding. (Contact: jhj20 at mails.tsinghua.edu.cn)

๐Ÿ’ฌ News

[2022/04/15]: Create this repository.
[2022/05/25]: Expend the scope to 3D-Vision-and-Language, e.g., 3D Visual Grounding, 3D Dense Caption and 3D Question Answering.

Table of Contents

3D Visual Grounding

3D VG Paper Roadmap (Chronological Order)

ECCV 2020

  1. Achlioptas, Panos, et al. ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes. ECCV 2020, Oral. [Paper] [Code] [Website]
  2. Chen, Dave Zhenyu, et al. ScanRefer 3D Object Localization in RGB-D Scans Using Natural Language. ECCV 2020. [Paper] [Code] [Website]

AAAI 2021

  1. Huang, Pin-Hao, et al. Text-guided graph neural networks for referring 3d instance segmentation. AAAI 2021. [Paper] [Code]

CVPR 2021

  1. Feng, Mingtao, et al. Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud. CVPR 2021. [Paper] [Code]
  2. Liu, Haolin, et al. Refer-It-in-RGBD: A Bottom-Up Approach for 3D Visual Grounding in RGBD Images. CVPR 2021. [Paper] [Code] [Website]

ICCV 2021

  1. Yang, Zhengyuan, et al. SAT: 2D Semantics Assisted Training for 3D Visual Grounding. ICCV 2021, Oral. [Paper] [Code]

    Personal Notes:

    • Use corresponding 2D image data(ROI feature, label, bbox coordinates and camera pose) to assist 3D grounding.
    • Very solid experiments.
  2. Yuan, Zhihao, et al. InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring . ICCV 2021. [Paper] [Code]

  3. Zhao, Lichen, et al. 3DVG-Transformer: Relation modeling for visual grounding on point clouds. ICCV 2021. [Paper] [Code]

    Personal Notes:

    • The novelty of this paper comes from the coordinate-guied contextual aggregation module.

ACM-MM 2021

  1. He, Dailan, et al. TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding. ACM-MM 2021. [Paper]

CVPR 2022

  1. Huang, Shijia, et al. Multi-View Transformer for 3D Visual Grounding. CVPR 2022. [Paper] [Code]

    Personal Notes:

    • Rotating the center xyz of objects to provide view-related positional information before going through a Tranformer decoder.
    • SOTA results on Nr3D and Sr3D, good reuslts on ScanRefer.
  2. Luo, Junyu, et al. 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection. CVPR 2022, Oral. [Paper] [Code]

    Personal Notes:

    • First single stage work in 3D Visual Grounding !!!
    • The general idea is similar to the iterative shrinking work in 2D Visual Grounding, but the design is more elegant.
  3. Cai, Daigang, et al. 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds. CVPR 2022.

3D VG Datasets

  1. ReferIt3D(Nr3D, Sr3D/Sr3D+): Achlioptas, Panos, et al. ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes. ECCV 2020, Oral. [Paper] [Code] [Website] [Leaderboard]

    Dataset Statistics:

    • Natural Reference in 3D (Nr3D)
    • Spatial Reference in 3D (Sr3D)
  2. ScanRefer: Chen, Dave Zhenyu, et al. ScanRefer 3D Object Localization in RGB-D Scans Using Natural Language. ECCV 2020. [Paper] [Code] [Website] [Leaderboard]

    Dataset Statistics:

    • On average, there are 13.81 objects, 64.48 descriptions per scene, and 4.67 descriptions per object.
    • Average length of descriptions is 20.27. Frequency of object attributes: spatial language (98.7%), color (74.7%), shape terms (64.9%), and size information (14.2%).

3D VG Workshops

  1. CVPR 2021 1st Workshop on Language for 3D Scenes. [Website]

3D Question Answering

3D QA Paper Roadmap (Chronological Order)

CVPR 2022

  1. Azuma, Daichi, et al. ScanQA: 3D Question Answering for Spatial Scene Understanding. CVPR 2022. [Paper] [Code]

ICLR 2023

  1. Ma, Xiaojian and Yong, Silong, et al. SQA3D: Situated Question Answering in 3D Scenes. ICLR 2023. [Paper] [Data & Code]

3D QA Datasets

  1. ScanQA: Azuma, Daichi, et al. ScanQA: 3D Question Answering for Spatial Scene Understanding. CVPR 2022. [Paper] [Data Preparation]

  2. SQA3D: Ma, Xiaojian and Yong, Silong, et al. SQA3D: Situated Question Answering in 3D Scenes. ICLR 2023. [Paper] [Data & Code]

3D Dense Caption

Pending...

awesome-3d-vision-and-language's People

Contributors

jeasinema avatar jianghaojun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

awesome-3d-vision-and-language's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.