Git Product home page Git Product logo

awesom-embodied-navigation's Introduction

Survey_EmbodiedAI

Paperlist

  • Simple but Effective: CLIP Embeddings for Embodied AI. 2022 CVPR
  • ZSON: Zero-shot Object-Goal Navigation using MultiModal Goal Embeddings. 2022
  • CLIP on Wheels: Zero-shot Object Navigation as Object Localization and Exploration. 2022
  • ViNG: Learning Open-World Navigation with Visual Goals. 2021 ICRA
  • Pre-Trained Language Models for Interactive Decision-Making. 2022
  • R3M: A Universal Visual Representation for Robot Manipulation. 2022 CoRL
  • BC-Z: Zero-shot Task Generalization with Robotic Imitation Learning. 2021 CoRL
  • Grounding Language with Visual Affordance over Unstructured Data. 2022
  • What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data. 2022
  • LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision and Action. 2022 CoRL
  • Visual Language Maps for Robot Navigation.
  • Do as I Can, Not as I Say: Grounding Language in Robotics Affordances.
  • Open-vocabulary Queryable Scene Representations for Real World Planning
  • Language Models as Zero-shot Planners: Extracting Actionable Knowledge for Embodied Agents.
  • REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments. 2020 CVPR
  • ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
  • SQA3D: SITUATED QUESTION ANSWERING IN 3D SCENES. 2023 ICLR
  • Episodic Transformer for Vision-and-Language Navigation. 2021 ICCV

Pre-Train for Cross-Modal Representation

  • Lxmert: Learning crossmodality encoder representations from transformers. 2019 EMNLP
  • Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. 2019 NIPS
  • Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks.
  • Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
  • Cross-modal Map Learning for Vision and Language Navigation
  • Airbert: In-domain Pretraining for Vision-and-Language Navigation
  • Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models

LLM for Embodied AI

  • LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
  • LEBP โ€” Language Expectation & Binding Policy: A Two-Stream Framework for Embodied Vision-and-Language Interaction Task Learning Agents
  • Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

ImageGoal Navigation

  • Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning, 2017 ICRA
  • Semi-Parametric Topological Memory for Navigation, 2018 ICLR
  • Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks, 2019 CVPR
  • Neural Topological SLAM for Visual Navigation, 2020 CVPR
  • Visual Graph Memory with Unsupervised Representation for Visual Navigation, 2021 ICCV
  • No RL, No Simulation: Learning to Navigate without Navigating, 2021 NIPS
  • Topological Semantic Graph Memory for Image-Goal Navigation, 2022 CoRL
  • Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation, 2022 CVPR
  • Memory-Augmented Reinforcement Learning for Image-Goal Navigation, 2022 IROS
  • Last-Mile Embodied Visual Navigation, 2022 CoRL
  • ViNG: Learning Open-World Navigation with Visual Goals, 2021 ICRA
  • Lifelong Topological Visual Navigation, 2022 RA-L

Multi-Modal Manipulation

  • Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models. 2022 NIPS workshop
  • Scaling Robot Learning with Semantically Imagined Experience. 2023 \
  • Learning Universal Policies via Text-Guided Video Generation, 2023
  • Policy Adaptation from Foundation Model Feedback, 2023 CVPR
  • CLIPort: What and Where Pathways for Robotic Manipulation, 2021 CoRL
  • RT-1: Robotics Transformer for Real-World Control at Scale. 2022
  • Open-World Object Manipulation using Pre-trained Vision-Language Models. 2023
  • R3M: A Universal Visual Representation for Robot Manipulation. 2022 CoRL

awesom-embodied-navigation's People

Contributors

wzcai99 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.