E3AD: An Ethical End-to-End Autonomous Driving Framework Leveraging Large Language Models

Anonymous Author(s),
Anonymous Institution Name
^{*Indicates Equal Contribution}

Motivation

The transition to end-to-end systems in autonomous vehicle (AV) technology necessitates the integration of ethical principles to ensure societal trust and acceptance. This paper presents an innovative framework, E3AD, which leverages Large Language Models (LLMs) and multimodal deep learning techniques to embed ethical decision-making within end-to-end AV systems. Our approach employs LLMs for multimodal perception and command interpretation, generating trajectory planning candidates that are evaluated against ethical guidelines from the European Commission. The trajectory with the highest ethical score is selected, ensuring decisions that are both technically sound and ethically justified. To enhance transparency and trust, the AV communicates its chosen trajectory and the rationale behind it to passengers. Our contributions include:

Demonstrating how ethical principles can be seamlessly integrated into AV decision-making using LLMs and deep learning.
Enhancing the AV's ability to understand and respond to complex driving scenarios through chain-of-thought reasoning and multimodal data.

Developing a user-friendly interface that provides clear explanations of AV actions to passengers, thereby building trust and ensuring informed decision-making.

Introducing a novel dataset, DrivePilot, with multi-view inputs, high temporal dynamics, and enriched scene annotations to improve the training and evaluation of AV models.

The E3AD framework bridges the gap between ethical theory and practical application, advancing the development of AV systems that are intelligent, efficient, and ethically responsible.

Methods

The E3AD framework features a comprehensive pipeline designed to foster a human-centered, ethically guided navigation experience. This pipeline encompasses four pivotal steps: (1) Multimodal Input Fusion, (2) Visual Grounding, (3) Ethical Trajectory Planning, and (4) Linguistic Feedback.

Multimodal Input Fusion: The initial phase transforms raw sensor data from multi-view cameras and command inputs into highly representative vectors. This transformation is achieved through the integration of sophisticated vision, text, and semantic encoders.
Visual Grounding: The second stage employs a discriminative verification fusion mechanism that assigns inhibitory scores to filter out non-essential visual features. This innovative mechanism focuses on identifying irrelevant 3D regions within frontal-view images, based on the command, thereby ensuring that the attention is concentrated on the target object delineated by the command. Subsequently, a cross-modal decoder dynamically analyzes and weighs the data to pinpoint the target object that best corresponds to the given command.
Ethical Trajectory Planning: Utilizing the identified target object and the multimodal vectors, this step generates candidate trajectories. These trajectories undergo an ethical evaluation, informed by guidelines from the European Union Commission expert group, to select a pathway that fairly distributes risks among all road users.
Linguistic Feedback: The final step enhances human-AI interaction by providing a verbal response to passengers, thereby enriching the driving experience with responsive and context-aware communication.

Overall framework of E3AD. It is an ethical end-to-end autonomous driving framework and includes four steps: Multimodal Input Fusion, Visual Grounding, Ethical Trajectory Planning, and Linguistic Feedback.

Ethics Analysis and Linguistic Response

It is imperative that autonomous vehicles engage with other road users in an ethical manner. Our comprehensive ethical analysis framework is crucial for ensuring that selected trajectories meet rigorous standards of legality, safety, and equality. Leveraging the advanced context understanding capabilities of GPT-4V, we transform this ethical analysis framework into a prompt engineering process, enabling GPT-4V to perform ethical analysis and filtering.

Illustration of the Ethics Analysis and Linguistic Response. Based on multi-modal inputs, the ethics analysis is formulated around three considerations: legality, safety, and equality. GPT-4V then generates a linguistic response for passengers, informing them of the selected plan and relevant recommendations.

DrivePilot Dataset

Our study introduces the DrivePilot dataset, significantly advancing AV research. This dataset builds upon the nuScenes dataset DrivePilot contains 11,959 natural language commands, 9,217 bird’s-eye view (BEV) images, and 55,302 multi-view camera images. These images are captured across diverse urban environments in Singapore and Boston, offering a comprehensive view of various driving conditions, weather scenarios, and times of day.

DrivePilot is pioneering in leveraging the linguistic capabilities of GPT-4V for generating detailed scene semantic annotations. As depicted in Fig. \ref{cot}, we employ an innovative zero-shot Chain of Thought (CoT) prompting approach. This method guides GPT-4V through a progressive interpretation of traffic scenarios using step-by-step prompts. These prompts enable the model to learn context and infer meanings in traffic scenes without additional fine-tuning, categorizing scenes across 14 semantic dimensions, including weather, traffic light status, and emotional context. This method sets new standards for depth and contextual richness in visual grounding datasets.

Illustration of the chain-of-thought prompting used in DrivePilot to generate semantic annotations for a given traffic scene.

Comparison against state-of-the-art

How to get Start

Create Environment

1.Creating the Conda Environment for E3AD:

For optimal use of the E3AD, follow these setup guidelines:

conda create -name  E3AD python=3.7
conda activate  E3AD

2.Installing PyTorch:

Install PyTorch and associated libraries compatible with CUDA 11.7:

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

Installing Additional Requirements:

Complete the environment setup by installing the necessary packages from requirements.txt:

pip install -r requirements.txt

Downloading the Dataset

Talk2Car Dataset

Experiments were conducted using the Talk2Car dataset. Should you utilize this dataset in your work, please ensure to cite the original paper.

Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Luc Van Gool, Marie-Francine Moens:
Talk2Car: Taking Control of Your Self-Driving Car. EMNLP 2019

Activate the E3AD environment and install gdown for downloading the dataset:

conda activate E3AD
pip install gdown

2.Download the Talk2Car images:

gdown --id 1bhcdej7IFj5GqfvXGrHGPk2Knxe77pek

3.Unzip and organize the images:

unzip imgs.zip && mv imgs/ ./data/images
rm imgs.zip

Refcoco/Refcoco+/Refcocog Dataset

Prepare the datasets with the download_data.sh script.
```
bash data/download_data.sh --path ./data
```

Download the dataset index files from Google Drive to the split/ folder and then extract them. We use the index files provided by VLTVG.

cd split
tar -xf data.tar

The folder structure for these datasets is shown below.

Dataset
├── data
│   ├── Flickr30k
│   │   ├── flickr30k-images
│   ├── other
│   │   ├── images
│   ├── referit
│   │   ├── images
│   │   ├── masks
├── split
│   ├── data
│   │   ├── flickr
│   │   ├── gref
│   │   ├── gref_umd
│   │   ├── referit
│   │   ├── unc
│   │   ├── unc+

Train

Simply run the following command in your terminal:

bash talk2car/script/train.sh

Visualization

Leadboard

One can find the current Talk2Car leaderboard here. The models on Talk2Car are evaluated by checking if the Intersection over Union of the predicted object bounding box and the ground truth bounding box is above 0.5. This metric can be referred to in many ways i.e. IoU_0.5, AP50.

Model	AP50 / IoU_0.5	Code
STACK-NMN	33.71
SCRC	38.7
OSM	35.31
Bi-Directional retr.	44.1
MAC	50.51
MSRR	60.04
VL-Bert (Base)	63.1	Code
AttnGrounder	63.3	Code
ASSMR	66.0
CMSVG	68.6	Code
Vilbert (Base)	68.9	Code
CMRT	69.1
Sentence-BERT+FCOS3D	70.1
Stacked VLBert	71.0
FA	73.51

hearum / bbbtest23 Goto Github PK

bbbtest23's Introduction

E3AD: An Ethical End-to-End Autonomous Driving Framework Leveraging Large Language Models

Motivation

Methods

Ethics Analysis and Linguistic Response

DrivePilot Dataset

Comparison against state-of-the-art

How to get Start

Create Environment

Downloading the Dataset

Train

Visualization

Leadboard

Links

bbbtest23's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent