Git Product home page Git Product logo

drivelm's Introduction

DriveLM: Driving with Graph Visual Question Answering

Download dataset HERE (serves as Official source for Autonomous Driving Challenge 2024)

License: Apache2.0 arXiv Hugging Face

drivelm_nus_demo_v2_1.mp4

Highlights

๐Ÿ”ฅ We instantiate datasets (DriveLM-Data) built upon nuScenes and CARLA, and propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.

๐Ÿ DriveLM will serve as a main track in the CVPR 2024 Autonomous Driving Challenge. For further details, please stay tuned!

Table of Contents

  1. Highlights
  2. Getting Started
  3. Current Endeavors and Future Horizons
  4. News and TODO List
  5. DriveLM-Data
  6. License and Citation
  7. Other Resources

Getting Started

To get started with DriveLM:

(back to top)

Current Endeavors and Future Directions

  • The advent of GPT-style multimodal models in real-world applications motivates the study of the role of language in driving.
  • Date below reflects the arXiv submission date.
  • If there is any missing work, please reach out to us!

DriveLM attempts to address some of the challenges faced by the community.

  • Lack of data: DriveLM-Data serves as a comprehensive benchmark for driving with language.
  • Embodiment: GVQA provides a potential direction for embodied applications of LLMs / VLMs.
  • Closed-loop: DriveLM-CARLA attempts to explore closed-loop planning with language.

(back to top)

News and TODO List

News

  • [2023/08/25] DriveLM-nuScenes demo released.
  • [2023/12/22] DriveLM-nuScenes full v1.0 and paper released.
  • [Early 2024] DriveLM-Agent inference code.
  • Note: We plan to release a simple, flexible training code that supports multi-view inputs as a starter kit for the AD challenge (stay tuned for details).

TODO List

  • DriveLM-Data
    • DriveLM-nuScenes
    • DriveLM-CARLA
  • DriveLM-Metrics
    • GPT-score
  • DriveLM-Agent
    • Inference code on DriveLM-nuScenes
    • Inference code on DriveLM-CARLA

(back to top)

DriveLM-Data

We facilitate the Perception, Prediction, Planning, Behavior, Motion tasks with human-written reasoning logic as a connection between them. We propose the task of GVQA on the DriveLM-Data.

๐Ÿ“Š Comparison and Stats

DriveLM-Data is the first language-driving dataset facilitating the full stack of driving tasks with graph-structured logical dependencies.

Links to details about GVQA task, Dataset Features, and Annotation.

(back to top)

License and Citation

All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.

@article{sima2023drivelm,
  title={DriveLM: Driving with Graph Visual Question Answering},
  author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
  journal={arXiv preprint arXiv:2312.14150},
  year={2023}
}
@misc{contributors2023drivelmrepo,
  title={DriveLM: Driving with Graph Visual Question Answering},
  author={DriveLM contributors},
  howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
  year={2023}
}

(back to top)

Other Resources

Twitter Follow

OpenDriveLab

Twitter Follow

Autonomous Vision Group

(back to top)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.