opendrivelab / drivelm Goto Github PK

DriveLM: Driving with Graph Visual Question Answering

Home Page: https://opendrivelab.com/DriveLM/

License: Apache License 2.0

HTML 96.95% Python 3.03% Shell 0.02%

autonomous-driving chain-of-thought graph-of-thoughts large-language-models llm prompt-engineering prompting tree-of-thoughts vision-language

drivelm's People

Contributors

Stargazers

Watchers

drivelm's Issues

Got error when running evaluation.py

when running evaluation.py I encountered a TypeError related to the multiprocessing process.

evaluation start! Exception in thread Thread-3: Traceback (most recent call last): File "/Users/unizhuan/anaconda3/envs/llama_adapter_v2/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/Users/unizhuan/anaconda3/envs/llama_adapter_v2/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/Users/unizhuan/anaconda3/envs/llama_adapter_v2/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results task = get() File "/Users/unizhuan/anaconda3/envs/llama_adapter_v2/lib/python3.8/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) TypeError: __init__() takes 1 positional argument but 2 were given Process SpawnPoolWorker-24: Process SpawnPoolWorker-20: Process SpawnPoolWorker-21: Process SpawnPoolWorker-30: Process SpawnPoolWorker-8: Process SpawnPoolWorker-22: ....

wondering why and how to solve this🤔

Image in the llama sample is in the wrong fold?

Please check the following image. May be it should be in the fold CAM_BACK_RIGHT?
challenge/llama_adapter_v2_multimodal7b/data/nuscenes/samples/CAM_FRONT_RIGHT/n008-2018-09-18-13-10-39-0400__CAM_BACK_RIGHT__1537291002278113.jpg

Rule-based generation

Hello,
Thank you for this work, it is very interesting and helpful.
I have read carefully the paper you put on arxiv, and there is not detailed explanation on the type of rules you used for the dataset building (Particularly NuScenes ). Can you please provide me with some details or implementations?
Best,
Nassim.

Wrong format of DriveLM-nuScenes version-1.1 val

During the Inference, the DriveLM-nuScenes version-1.1 val should have the same format as test_llama.json.
But it seems the DriveLM-nuScenes version-1.1 val and test_eval.json have the same format.

So the model infer of mine on the DriveLM-nuScenes version-1.1 val led to a wrong format error.

Inference baseline stuck at the end with no output.json

Running the demo.py to inference lead to stuck at the end, with python demo.py --llama_dir ./weights --checkpoint ./finetune_output/checkpoint-0.pth --data ../test_llama.json --output ../output.json --batch_size 8 --num_processes 8.
The tqdm bar end with 100%|████████| 461/461 [1:28:23<00:00, 11.50s/it] *2 (GPUs 4090*2), but the code stuck for a long time and didn't genarate the output.json. I tried to add a 'print' in demo.py after processes join (p.join()), but it doesn't show.

I had tried many times with different --batch_size and --num_processes but it still stuck there.
The test_llama.json comes from the DriveLM dataset, And with the small sample test_llama.json from the repo does't lead to the stuck.

Hope you can help!

How do I get this 'v1_0_train_nus_llama.json' ?

finetune_data_config.yaml reference 'v1_0_train_nus_llama.json' but how do I get this?

DriveLM/challenge/llama_adapter_v2_multimodal7b/finetune_data_config.yaml

Line 2 in 4e730ec

- 'v1_0_train_nus_llama.json'

Following is my train script

bash exps/finetune.sh /home/junho/workspace/LLaMa2-weight ./1bcbffc43484332672092e0024a8699a6eb5f558161aebf98a7c6b1db67224d1_LORA-BIAS-7B.pth ./finetune_data_config.yaml   output/path`

output log

[18:40:41.330088] Load checkpoint ./1bcbffc43484332672092e0024a8699a6eb5f558161aebf98a7c6b1db67224d1_LORA-BIAS-7B.pth
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
[18:40:41.330920] read dataset config from ./finetune_data_config.yaml
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
[18:40:41.331437] DATASET CONFIG:
[18:40:41.331459] {'META': ['v1_0_train_nus_llama.json']}
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'

what is the dataset during pretraining llama_adapter_v2_multimodal7b?

Could you tell us what is the dataset was used when training the parameters of visual block? And how much its data size?

Test data

It really confusing me that:
what‘s the difference between the datasets you mentioned in the results of baseline?

The zero-shot results of baseline on the sampled data
The zero-shot results of baseline on the test data
the dataset provided on the test server (DriveLM-nuScenes version-1.1 val)

Could you give detailed information between these datasets?

Thanks!

Inference Baseline的结果很奇怪

按照README.md里的步骤，最后推理的结果部分如下：

    {
        "id": "f0f120e4d4b0441da90ec53b16ee169d_d9075c2a5f864a2b8abf41e703f4cf1c_3",
        "question": "<image>\nIs <c1,CAM_FRONT_LEFT,231.5,472.1> a traffic sign or a road barrier?",
        "gt_answer": "No.",
        "answer": "Response\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
    },
    {
        "id": "f0f120e4d4b0441da90ec53b16ee169d_d9075c2a5f864a2b8abf41e703f4cf1c_4",
        "question": "<image>\nWhat actions could the ego vehicle take based on <c1,CAM_FRONT_LEFT,231.5,472.1>? Why take this action and what's the probability?",
        "gt_answer": "The action is to keep going at the same speed, the reason is that there is no safety issue. The probability of this action is high.",
        "answer": "///\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Hinweis\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
    },
    {
        "id": "f0f120e4d4b0441da90ec53b16ee169d_d9075c2a5f864a2b8abf41e703f4cf1c_5",
        "question": "<image>\nWhat actions taken by the ego vehicle can lead to a collision with <c1,CAM_FRONT_LEFT,231.5,472.1>?",
        "gt_answer": "No such action will lead to a collision.",
        "answer": "Response\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
    },

得到了毫无意义的answer，想问一下大佬有遇到过这种情况吗？可能是什么原因呢

Inclusion of Important Object IDs in test/validation set

Hi there, will the test/validation set include important object IDs (the "key_object_infos" dict)? If not, how will questions of subsequent steps (prediction/planning, etc.) be composed with the ID?

Some issues for ChatGPT api in evaluation.py

Hi organizers, thank you for your recent timely replies. We run into an issue when we run evaluation,py. Detailedly, in the line 27 "scores = self.chatgpt_eval.forward(answer, GT)", sometimes we will receive error from open-ai server "openai.error.ServiceUnavailableError: The server is overloaded or not ready yet". I guess this issue is mainly due to open-ai server. However, if we participants often run into this question, it will cost extra time and expense to use the open-ai api. We have tried to slow down the request rate (for example, sleep for 1 second after sending one remote service request), but similar issue still exitis. Is there any possible tips?

Got error when running the code "sudo add-apt-repository ppa:webupd8team/java" during the evaluation stage.

The specific value of the metric？

The value of 2.463 for the CIREr metric in Table 10 of the paper, is it converted? Is it 2.463 or 0.02463?

Get "hashes" error when run "pip install -r requirements.txt" command

When i run pip install -r requrements.txt, I get the following error
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. torch==2.0.0+cu117 from https://download.pytorch.org/whl/cu117/torch-2.0.0%2Bcu117-cp38-cp38-linux_x86_64.whl#sha256=c4dbc3f7f3eff6576473c3711d5d99adaaef733490b39de4970980d6edf4f0c2 (from -r requirements.txt (line 2)): Expected sha256 c4dbc3f7f3eff6576473c3711d5d99adaaef733490b39de4970980d6edf4f0c2 Got 24f2904c7d84dc64995c74a9130dee6aa83486212c5a09b703e82a083ef67278

I have also tried pip cache purge and then run pip install --no-cache-dir -r requirements.txt , but get the same error. How can i fixed the peoblem ?

The release time of DriveLM-CARLA

Hi,

Very appreciate this impressive work.
I was wondering when will the DriveLM-CARLA dataset be released? It would be grateful to know the timeline.
Many Thanks.

Bests,
Yi

finetuning resource needed for challenge

Hello, I checked your contest docs, it shows that the RAM required for reasoning and fine-tuning is 34/35Gb, but I don't have that high graphics card, can I reduce the batch_size to reduce the amount of video memory required?

How do you build this dataset? When will you fully release the data?

Nice job! This work really sheds light on the future of autonomous driving using the power of LLMs or LMMs!

May I ask how do you build this dataset? Besides, when can we get the full dataset?

Inquiry about Full or Extended Dataset Release Timeline

Hi!
I would like to extend my gratitude to your team for publishing this valuable demo dataset. May I inquire if there are plans to release a complete or more extensive version of the dataset in the near future? Do you have any plans to release this dataset before the upcoming CVPR conference? It would be extremely helpful for preparing related research work.

best

Unable to run finetuning

I am running srun python -u -m torch.distributed.launch --master_port=1112 --nproc_per_node=2 --nodes=1 --use_env
main_finetune.py --data_config "$CONFIG" --batch_size 4
--epochs 4 --warmup_epochs 1 --blr 10e-4 --weight_decay 0.02
--llama_path "$LLAMA_PATH"
--output_dir "$OUTPUT_DIR"
--pretrained_path "$PRETRAINED_PATH"
&>> "$OUTPUT_DIR"/output.log &

and my output is

[W socket.cpp:426] [c10d] The server socket has failed to listen on [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
Traceback (most recent call last):
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 241, in launch_agent
    result = agent.run()
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 723, in run
    result = self._invoke_run(role)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 858, in _invoke_run
    self._initialize_workers(self._worker_group)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 692, in _initialize_workers
    self._rendezvous(worker_group)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 546, in _rendezvous
    store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 55, in next_rendezvous
    self._store = TCPStore(  # type: ignore[call-arg]
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
srun: error: gpu06: tasks 0-4,6-11: Exited with exit code 1
Traceback (most recent call last):
  File "main_finetune.py", line 206, in <module>
    main(args)
  File "main_finetune.py", line 89, in main
    misc.init_distributed_mode(args)
  File "/mnt/data1/users/tianle/DriveLM/challenge/llama_adapter_v2_multimodal7b/util/misc.py", line 251, in init_distributed_mode
    torch.distributed.barrier()
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 3313, in barrier
    work = default_pg.barrier(opts=opts)
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1207, internal error, NCCL version 2.14.3
ncclInternalError: Internal check failed.
Last error:
Bootstrap : no socket interface found
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25585) of binary: /users/tianle/volatile/miniconda3/envs/llama_adapter_v2/bin/python
Traceback (most recent call last):
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
main_finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-16_21:19:21
  host      : gpu06.pri.barkla.alces.network
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 25585)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
srun: error: gpu06: task 5: Exited with exit code 1

Any idea what's happening here? I confirmed the port was not occupied by other jobs.

Cannot find the "P1-P2-P3-B" process of DriveLM paper in the source code.

Is there any relations bewteen this competition and the GVQA methodology proposed in the DriveLM paper?

Can the baseline model be replaced?

In this competition, we can replace the llama_adapter_v2 with other version llama_adapter or other LLM? I have same question in the visual encoder. Or do we just need to consider how to use fine-tuning techniques and other tricks to improve the performance of the model?
This may be a very basic question, sorry, it's my first time participating in this kind of competition.

Can you provide more information about DriveLM-CARLA?

How to get answers to the questions in the full graph of DriveLM-CARLA？

Inconsistent Annotations and Image Bounds

Description:

It has been observed that for the nuscenes dataset, all camera images (whether front, back, left, or right) have a fixed resolution of (1600, 900). However, there are instances where labels in the annotations exceed these bounds.

Examples:

Key Frame with token 1537296390262404:
- Annotation: "<c5,CAM_FRONT_RIGHT,2343.0,298.0>":
  This denotes a <moving> <car> positioned to the <front right> of the ego car.
- Issue: The x-coordinate 2343.0 is clearly out of the 1600x900 bounds.
Key Frame with token 1531883988362460:
- Visual representation:
- Annotations:
  - 1486.5, 320.5
  - 1544.0, 304.0
  - 1317.5, 310.0
- Issue: Despite these coordinates suggesting that the objects should be located at the right-top corner of the image, the visual representation shows them in the center. Although their relative positions appear correct, there seems to be a mismatch in the coordinate system.

Concern:
There might be a misalignment or inconsistency in how the coordinates are being represented and annotated.

Dataset request

Thank you very much for your team's outstanding contribution to the field. However, I tried to fill in the Google form three times to apply for all LM annotations, but I didn't get any reply, and I could only get demo annotations. Can you help me? Thanks again.

Questions about GPU and VRAM usage for Baseline finetune.

Discussed in #56

^{Originally posted by piqiuni March 22, 2024}
In the Table in Finetune, it says when Batch size = 4, the required VRAM is 34GB.
We are using RTX4090*2 for finetune, but we can only run with Batch size = 1, and the VRAM usage is about 20GB per card. Why it take so much VRAM?

Do you have any suggestions on the GPU usage? Should we try to get more GPUs for training? Maybe give me some detailed suggestions on GPU types and numbers?

Thanks for your suggestions! Forgive me for being a beginner~

The meaning of C_loss and M_loss

DriveLM/challenge/llama_adapter_v2_multimodal7b/engine_finetune.py

Line 39 in 2f4ccfe

loss = c_loss + m_loss * 0

It seems that C_loss and M_loss are the same variable.
what exactly do C_loss and M_loss mean? and why do we distinguish between them?

Thank you.

The purpose of using key objects

Hi, thanks for your great work! What's the purpose of the defination of key objects. It seems that the key objects is both in the questions and answers part. Do you expect the definition of Key objects to appear in pre prompt or do you need post-processing to replace it with a standardized format?

[Bug] ValueError: 'Back up.' is not in list

Thanks for the excellent work and auto-driving challenge. I have a bug when I run convert_data.py under "./challenge".

ValueError: 'Back up.' is not in list

I check the function and find that in line 7 of "challenge/convert_data.py". There is no "Back up" option. What should I do about it?

def rule_based1(question, answer):
rule = ["Going ahead.", "Turn right.", "Turn left.", "Stopped."]
question += f" Please select the correct answer from the following options: A. {rule[0]} B. {rule[1]} C. {rule[2]} D. {rule[3]}"
idx = rule.index(answer)
mapping = {0: "A", 1: "B", 2: "C", 3: "D"}
return {"Q": question, "A": mapping[idx]}

dataloader example

Hi, I am new in this area, can u provide a example .ipynb for drivelm dataloader? I dont know how to build the map from frame name in .json like "4a0798f849ca477ab18009c3a20b7df2" to filename like "n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915244512465".

Thank you

Question about the baseline model

Thank you for the great work! I would like to ask a few questions about the baseline model (finetuned LLaMA Adapter V2).

Is the result from a model trained on full train split of DriveLM? In this context, what does "zero-shot result" mean?
Are you planning to release the finetuned model?

Thank you!

evaluation error

Hi! For the challenge, when we do evaluation using the example output.json and tesdt_eval.json, the chatgpt evaluate part encounter an error
Exception in thread Thread-3: Traceback (most recent call last): File "/root/anaconda3/envs/llama_adapter_v2/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/root/anaconda3/envs/llama_adapter_v2/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/root/anaconda3/envs/llama_adapter_v2/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results task = get() File "/root/anaconda3/envs/llama_adapter_v2/lib/python3.8/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) TypeError: __init__() takes 1 positional argument but 2 were given
Could you help us solve this problem?

about the validation set

Thanks for your great work! May I ask when you plan to release the validation set? Or can you provide the tokens of the validation samples?

Evaluation error on my model as it outputs <c3,CAM_FRONT,1100.0,500.0,50.0>. (3 digit coords)

my model output wrong instance as <c3,CAM_FRONT,1100.0,500.0,50.0>. (3 coords)

import re, numpy as np 
answer="<AI answer>"

answer_nums = re.findall(r'\d+\.\d+', answer)
print(answer_nums)
answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2)
print(answer_nums)

above example script described in evaluation.py

DriveLM/challenge/evaluation.py

Lines 62 to 66 in f3a760b

 answer_nums = re.findall(r'\d+\.\d+', answer) 

 GT_nums = re.findall(r'\d+\.\d+', GT) 

 # transform string into float 

 answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2) 

 GT_nums = np.array([list(map(float, x.split()))[0] for x in GT_nums]).reshape(-1, 2)

seems to extract coordinates.

if answer contains correct format (2 coords per object), they are parsed well.

answer = "There is a white truck to the front of the ego vehicle, a white sedan to the back of the ego vehicle, and a white sedan to the front of the ego vehicle. The IDs of these objects are <c1,CAM_FRONT,1000.0,500.0>, <c2,CAM_BACK,850.0,500.0>, and <c3,CAM_FRONT,1000.0,500.0>."
>> printed
['1000.0', '500.0', '850.0', '500.0', '1000.0', '500.0']
[[1000.  500.]
 [ 850.  500.]
 [1000.  500.]]

but if some object has 3 coords, the code fails.

answer = "There is a black sedan to the back of the ego vehicle, a black sedan to the front of the ego vehicle, a black sedan to the front of the ego vehicle, and a black sedan to the front of the ego vehicle. The IDs of these objects are <c1,CAM_BACK,1000.0,500.0>, <c2,CAM_FRONT,1000.0,500.0>, and <c3,CAM_FRONT,1100.0,500.0,50.0>."
>>
Traceback (most recent call last):
  File "evaluation.py", line 134, in <module>
    evaluation.set_graph(predict, GT)
  File "evaluation.py", line 82, in set_graph
    self.graph = self.match_result(answer, GT)
  File "evaluation.py", line 68, in match_result
    answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2)
ValueError: cannot reshape array of size 7 into shape (2)

I would fix evaluation to only extract two digits per < object > rather than using regexp.

My point is, am I allowed to correct evaluation.py to correct model output format?
if so, generally how far am I allowed to correct AI output?
what would be violations to the challenge rule?

got error when i want to pretrian the model

The PretrainDataset class in dataset.py is right? it cannot used for load data from test_llama.json. I read the code, then found the data form of PretrainDataset class needed and FinetuneDataset is different.

Bug Report: max_batch_size set to 32 in llama_adapter.load lead to OutOfMemoryError in Inference Baseline

Related to #69.
Line 316 in llama_adapter.py, set the default value to 32 lead to OutOfMemoryError in Inference Baseline.

Got wrong reply from ChatGPT

With the output.json and test_eval.json in this repo, we got the wrong reply from ChatGPT

[{'role': 'system', 'content': 'an evaluator who rates my answer based on the correct answer'}, {'role': 'user', 'content': 'Rate my answer based on the correct answer out of 100, with higher scores indicating that the answer is closer to the correct answer, and you should be accurate to single digits like 62, 78, 41,etc. Output the number only

This is the correct answer: Firstly, notice <c3,CAM_FRONT_LEFT,1075.5,382.8>. The object is a traffic sign, so the ego vehicle should continue at the same speed. Secondly, notice <c2,CAM_FRONT,836.3,398.3>. The object is a traffic sign, so the ego vehicle should accelerate and continue ahead. Thirdly, notice <c1,CAM_BACK,991.7,603.0>. The object is stationary, so the ego vehicle should continue ahead at the same speed.

This is my answer: 1. The ego vehicle should notice the traffic light first, as it is the first object in the image. The traffic light is in the process of changing from red to green, indicating that the vehicle should proceed with caution.2. The ego vehicle should notice the truck next, as it is the second object in the image. The truck is stopped at the intersection, and the ego vehicle should be cautious when approaching the intersection to ensure it does not collide with the truck.3. The ego vehicle should notice the bus next, as it is the third object in the image. The bus is stopped at the intersection, and the ego vehicle should be cautious when approaching the intersection to ensure it does not collide with the bus.4. The ego vehicle should notice the traffic light next, as it is the fourth object in the image. The traffic light is in the process of changing from green to red, indicating that the vehicle should stop at the intersection.5. The ego vehicle should notice the truck next, as it is the fifth object in the image. The truck is stopped at the intersection, and the ego vehicle should be cautious when approaching the intersection to ensure it'}]

Sorry, but I cannot evaluate your answer based on the correct answer as the two responses are completely different.

Please help to figure this problem.
Should we add a "is completely different, give a '0' reply" in the content?

is it ok to use nuScenes mini dataset？

is it ok to use nuScenes mini dataset or it have to work with nuScenes full dataset?
Because full nuScenes data is about 500G and the downloading often crush at www.nuscenes.org/download.
i just want explore some mini set data in driveLM.
Or can you tell me how to get nuScenes dataset quickly?

Inference OutOfMemoryError with the same params of Finetune.

I got the torch.cuda.OutOfMemoryError in demo.py. But the params of --batch_size and --num_processes are the same with Finetune. (--batch_size 1 --num_processes 2 with 4090*2)

Is that caused by the total Vmemory usage (weights 13GB + checkpoint-3.pth 14GB + else) over the 24GB of 4090?
How can I run the Inference using the VRAM of two GPUs?
Or should we quantize the model to reduce the VRAM usage to run on a single 4090?

Can you give us advice on dealing with the problem?
Thanks a lot!

Which split should be submitted to the test server

Thanks for the great work for AD LLM. The test server has opened, but it seems there is no new test split provided. So which split is used for test server, is test_llama.json?

submission error

When I submit a Hugging Face model (submission file) to driving-with-language-2024, an error with the message "Invalid token" is returned.

How to run demo.py on multi-GPUs?

Hi, appreciate the great work!
I noticed that the demo.py used for inference is only suitable for single GPU, which make it really slow to get an output.json file. How can I run the inference on multi-GPUs? Do you have scrips for multi-GPUs inference?

ValueError in evaluation.py, the answer got from infer has a wrong number.

Running evaluation.py with error:

File "evaluation.py", line 93, in match_result
    answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2)
ValueError: cannot reshape array of size 13 into shape (2)

The accurate answer format should be [880.0, 500.0, 1000.0, 500.0, 1000.0, 500.0] and reshape to [[ 880. 500.], [1000. 500.], [1000. 500.]]
But I got a [1055.5, 510.0, 1055.5, 510.0, 1055.5, 510.0, 1055.5, 510.0, 1055.5, 510.0, 1055.5, 510.0, 1055.5] with 13 numbers in my answer.

I have no idea why it happens.
Please help

请问llama adapter的forward函数为什么会返回两个c_loss

如标题

Recommended Experimental Settings

Hi DriveLM organizers, thank you for this hard and meaningful work! I'm wondering the recommended experimental settings (e.g., V100 (32G) * 4 will be enough?) for reimplementing this baseline.

Can not install language-evaluation package

https://github.com/OpenDriveLab/DriveLM/tree/main/challenge#setup-1

This is my system_info

Please help me, Thank you.

Question About GVQA Data

Hi， it's great work for AD's development with LLM.
When I got test.json by running exract_data.py, I found it was not like the GVQA structure mentioned in paper. and the training doesn't contain the Context(C) QA. So I wanna know if it's the final version about training or will be updated ?
thanks!

About the image of demo data.

Hi,
great work. I'd like to know which part of Nusence you have used. And What are the images of your demo data. Thanks

What would be perfect score for perfect answer?

Use following script to generate perfect output

import json

with open('./output.json', 'r') as f:
    output = json.load(f)

perfect_output = []
for sample in output:
    new_sample={
            'id':sample['id'],
            'question':sample['question'],
            'gt_answer':sample['gt_answer'],
            'answer':sample['gt_answer'],
            }
    perfect_output.append(new_sample)

with open(f"./perfect_output.json", 'w') as f:
    json.dump(perfect_output, f, indent=4)

Then run,
python evaluation.py --root_path1 ./perfect_output.json --root_path2 ./test_eval.json

And it gives me

accuracy:  1.0
chatgpt:  100.0
match:  100.0
language score:  {'val/Bleu_1': 0.9999999999920699, 'val/Bleu_2': 0.0009999999999940523, 'val/Bleu_3': 9.999999999947137e-05, 'val/Bleu_4': 3.162277660152707e-05, 'val/ROUGE_L': 1.0, 'val/CIDEr': 1.9156274954912038}
final score:  0.8961230436827525

Please note than I am not familiar with language score metrics.
But there seems to be something wrong on Bleu_2, 3, 4?
They seem to be too low.

Reasoning speed is too slow

In the code, I need to run llama every time I reply to a word, is this a normal phenomenon? By the way, is the final assessment a validation collection or a test collection? Is it possible to finally evaluate the selected test set? I doubt that the evaluation of too many samples will bring a lot of computational burden.

Leveraging additional information from nuscenes

Hi there, we're considering if the agent could benefit from more than just keyframe images to generate the right answers. Is that possible to use some extra infomations from nuscenes in this task, like continuous frame images or radar points, which can be used to obtain more precise velocity information?

	answer_nums = re.findall(r'\d+\.\d+', answer)
	GT_nums = re.findall(r'\d+\.\d+', GT)
	# transform string into float
	answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2)
	GT_nums = np.array([list(map(float, x.split()))[0] for x in GT_nums]).reshape(-1, 2)

opendrivelab / drivelm Goto Github PK

drivelm's People

Contributors

Stargazers

Watchers

Forkers

drivelm's Issues

Discussed in #56

Recommend Projects

Recommend Topics

Recommend Org