Git Product home page Git Product logo

drivelm's People

Contributors

chengenxie avatar chonghaosima avatar devlinyan avatar eltociear avatar faikit avatar hli2020 avatar ilnehc avatar jeremyxu1998 avatar jjxjiaxue avatar kashyap7x avatar kxstd avatar renzka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

drivelm's Issues

Got error when running evaluation.py

when running evaluation.py I encountered a TypeError related to the multiprocessing process.

evaluation start! Exception in thread Thread-3: Traceback (most recent call last): File "/Users/unizhuan/anaconda3/envs/llama_adapter_v2/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/Users/unizhuan/anaconda3/envs/llama_adapter_v2/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/Users/unizhuan/anaconda3/envs/llama_adapter_v2/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results task = get() File "/Users/unizhuan/anaconda3/envs/llama_adapter_v2/lib/python3.8/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) TypeError: __init__() takes 1 positional argument but 2 were given Process SpawnPoolWorker-24: Process SpawnPoolWorker-20: Process SpawnPoolWorker-21: Process SpawnPoolWorker-30: Process SpawnPoolWorker-8: Process SpawnPoolWorker-22: ....

wondering why and how to solve this🤔

Image in the llama sample is in the wrong fold?

Please check the following image. May be it should be in the fold CAM_BACK_RIGHT?
challenge/llama_adapter_v2_multimodal7b/data/nuscenes/samples/CAM_FRONT_RIGHT/n008-2018-09-18-13-10-39-0400__CAM_BACK_RIGHT__1537291002278113.jpg

Rule-based generation

Hello,
Thank you for this work, it is very interesting and helpful.
I have read carefully the paper you put on arxiv, and there is not detailed explanation on the type of rules you used for the dataset building (Particularly NuScenes ). Can you please provide me with some details or implementations?
Best,
Nassim.

Wrong format of DriveLM-nuScenes version-1.1 val

During the Inference, the DriveLM-nuScenes version-1.1 val should have the same format as test_llama.json.
But it seems the DriveLM-nuScenes version-1.1 val and test_eval.json have the same format.

So the model infer of mine on the DriveLM-nuScenes version-1.1 val led to a wrong format error.

image

Inference baseline stuck at the end with no output.json

Running the demo.py to inference lead to stuck at the end, with python demo.py --llama_dir ./weights --checkpoint ./finetune_output/checkpoint-0.pth --data ../test_llama.json --output ../output.json --batch_size 8 --num_processes 8.
The tqdm bar end with 100%|████████| 461/461 [1:28:23<00:00, 11.50s/it] *2 (GPUs 4090*2), but the code stuck for a long time and didn't genarate the output.json. I tried to add a 'print' in demo.py after processes join (p.join()), but it doesn't show.

I had tried many times with different --batch_size and --num_processes but it still stuck there.
The test_llama.json comes from the DriveLM dataset, And with the small sample test_llama.json from the repo does't lead to the stuck.

Hope you can help!

How do I get this 'v1_0_train_nus_llama.json' ?

finetune_data_config.yaml reference 'v1_0_train_nus_llama.json' but how do I get this?

Following is my train script

bash exps/finetune.sh /home/junho/workspace/LLaMa2-weight ./1bcbffc43484332672092e0024a8699a6eb5f558161aebf98a7c6b1db67224d1_LORA-BIAS-7B.pth ./finetune_data_config.yaml   output/path`

output log

[18:40:41.330088] Load checkpoint ./1bcbffc43484332672092e0024a8699a6eb5f558161aebf98a7c6b1db67224d1_LORA-BIAS-7B.pth
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
[18:40:41.330920] read dataset config from ./finetune_data_config.yaml
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
[18:40:41.331437] DATASET CONFIG:
[18:40:41.331459] {'META': ['v1_0_train_nus_llama.json']}
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'
Traceback (most recent call last):
  File "main_finetune.py", line 205, in <module>
    main(args)
  File "main_finetune.py", line 141, in main
    dataset_train = FinetuneDataset(args.data_config, transform=transform_train,
  File "/home/junho/workspace/DriveLM/challenge/llama_adapter_v2_multimodal7b/data/dataset.py", line 52, in __init__
    meta_l = json.load(open(meta_path))
FileNotFoundError: [Errno 2] No such file or directory: 'v1_0_train_nus_llama.json'

Test data

It really confusing me that:
what‘s the difference between the datasets you mentioned in the results of baseline?

  1. The zero-shot results of baseline on the sampled data
  2. The zero-shot results of baseline on the test data
  3. the dataset provided on the test server (DriveLM-nuScenes version-1.1 val)

Could you give detailed information between these datasets?

Thanks!

Inference Baseline的结果很奇怪

按照README.md里的步骤,最后推理的结果部分如下:

    {
        "id": "f0f120e4d4b0441da90ec53b16ee169d_d9075c2a5f864a2b8abf41e703f4cf1c_3",
        "question": "<image>\nIs <c1,CAM_FRONT_LEFT,231.5,472.1> a traffic sign or a road barrier?",
        "gt_answer": "No.",
        "answer": "Response\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
    },
    {
        "id": "f0f120e4d4b0441da90ec53b16ee169d_d9075c2a5f864a2b8abf41e703f4cf1c_4",
        "question": "<image>\nWhat actions could the ego vehicle take based on <c1,CAM_FRONT_LEFT,231.5,472.1>? Why take this action and what's the probability?",
        "gt_answer": "The action is to keep going at the same speed, the reason is that there is no safety issue. The probability of this action is high.",
        "answer": "///\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Hinweis\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
    },
    {
        "id": "f0f120e4d4b0441da90ec53b16ee169d_d9075c2a5f864a2b8abf41e703f4cf1c_5",
        "question": "<image>\nWhat actions taken by the ego vehicle can lead to a collision with <c1,CAM_FRONT_LEFT,231.5,472.1>?",
        "gt_answer": "No such action will lead to a collision.",
        "answer": "Response\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
    },

得到了毫无意义的answer,想问一下大佬有遇到过这种情况吗?可能是什么原因呢

Some issues for ChatGPT api in evaluation.py

Hi organizers, thank you for your recent timely replies. We run into an issue when we run evaluation,py. Detailedly, in the line 27 "scores = self.chatgpt_eval.forward(answer, GT)", sometimes we will receive error from open-ai server "openai.error.ServiceUnavailableError: The server is overloaded or not ready yet". I guess this issue is mainly due to open-ai server. However, if we participants often run into this question, it will cost extra time and expense to use the open-ai api. We have tried to slow down the request rate (for example, sleep for 1 second after sending one remote service request), but similar issue still exitis. Is there any possible tips?

Get "hashes" error when run "pip install -r requirements.txt" command

When i run pip install -r requrements.txt, I get the following error
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. torch==2.0.0+cu117 from https://download.pytorch.org/whl/cu117/torch-2.0.0%2Bcu117-cp38-cp38-linux_x86_64.whl#sha256=c4dbc3f7f3eff6576473c3711d5d99adaaef733490b39de4970980d6edf4f0c2 (from -r requirements.txt (line 2)): Expected sha256 c4dbc3f7f3eff6576473c3711d5d99adaaef733490b39de4970980d6edf4f0c2 Got 24f2904c7d84dc64995c74a9130dee6aa83486212c5a09b703e82a083ef67278

I have also tried pip cache purge and then run pip install --no-cache-dir -r requirements.txt , but get the same error. How can i fixed the peoblem ?

The release time of DriveLM-CARLA

Hi,

Very appreciate this impressive work.
I was wondering when will the DriveLM-CARLA dataset be released? It would be grateful to know the timeline.
Many Thanks.

Bests,
Yi

finetuning resource needed for challenge

Hello, I checked your contest docs, it shows that the RAM required for reasoning and fine-tuning is 34/35Gb, but I don't have that high graphics card, can I reduce the batch_size to reduce the amount of video memory required?

Inquiry about Full or Extended Dataset Release Timeline

Hi!
I would like to extend my gratitude to your team for publishing this valuable demo dataset. May I inquire if there are plans to release a complete or more extensive version of the dataset in the near future? Do you have any plans to release this dataset before the upcoming CVPR conference? It would be extremely helpful for preparing related research work.

best

Unable to run finetuning

I am running srun python -u -m torch.distributed.launch --master_port=1112 --nproc_per_node=2 --nodes=1 --use_env
main_finetune.py --data_config "$CONFIG" --batch_size 4
--epochs 4 --warmup_epochs 1 --blr 10e-4 --weight_decay 0.02
--llama_path "$LLAMA_PATH"
--output_dir "$OUTPUT_DIR"
--pretrained_path "$PRETRAINED_PATH"
&>> "$OUTPUT_DIR"/output.log &

and my output is

[W socket.cpp:426] [c10d] The server socket has failed to listen on [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
Traceback (most recent call last):
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 241, in launch_agent
    result = agent.run()
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 723, in run
    result = self._invoke_run(role)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 858, in _invoke_run
    self._initialize_workers(self._worker_group)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 692, in _initialize_workers
    self._rendezvous(worker_group)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 546, in _rendezvous
    store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 55, in next_rendezvous
    self._store = TCPStore(  # type: ignore[call-arg]
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:38429 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:38429 (errno: 98 - Address already in use).
srun: error: gpu06: tasks 0-4,6-11: Exited with exit code 1
Traceback (most recent call last):
  File "main_finetune.py", line 206, in <module>
    main(args)
  File "main_finetune.py", line 89, in main
    misc.init_distributed_mode(args)
  File "/mnt/data1/users/tianle/DriveLM/challenge/llama_adapter_v2_multimodal7b/util/misc.py", line 251, in init_distributed_mode
    torch.distributed.barrier()
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 3313, in barrier
    work = default_pg.barrier(opts=opts)
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1207, internal error, NCCL version 2.14.3
ncclInternalError: Internal check failed.
Last error:
Bootstrap : no socket interface found
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25585) of binary: /users/tianle/volatile/miniconda3/envs/llama_adapter_v2/bin/python
Traceback (most recent call last):
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/users/tianle/volatile/miniconda3/envs/llama_adapter_v2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
main_finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-16_21:19:21
  host      : gpu06.pri.barkla.alces.network
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 25585)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
srun: error: gpu06: task 5: Exited with exit code 1

Any idea what's happening here? I confirmed the port was not occupied by other jobs.

Can the baseline model be replaced?

In this competition, we can replace the llama_adapter_v2 with other version llama_adapter or other LLM? I have same question in the visual encoder. Or do we just need to consider how to use fine-tuning techniques and other tricks to improve the performance of the model?
This may be a very basic question, sorry, it's my first time participating in this kind of competition.

Inconsistent Annotations and Image Bounds

Description:

It has been observed that for the nuscenes dataset, all camera images (whether front, back, left, or right) have a fixed resolution of (1600, 900). However, there are instances where labels in the annotations exceed these bounds.

Examples:

  1. Key Frame with token 1537296390262404:

    • Annotation: "<c5,CAM_FRONT_RIGHT,2343.0,298.0>":
      This denotes a <moving> <car> positioned to the <front right> of the ego car.
    • Issue: The x-coordinate 2343.0 is clearly out of the 1600x900 bounds.
  2. Key Frame with token 1531883988362460:

    • Visual representation:
      image_1531883988362460_with_points
    • Annotations:
      • 1486.5, 320.5
      • 1544.0, 304.0
      • 1317.5, 310.0
    • Issue: Despite these coordinates suggesting that the objects should be located at the right-top corner of the image, the visual representation shows them in the center. Although their relative positions appear correct, there seems to be a mismatch in the coordinate system.

Concern:
There might be a misalignment or inconsistency in how the coordinates are being represented and annotated.

Dataset request

Thank you very much for your team's outstanding contribution to the field. However, I tried to fill in the Google form three times to apply for all LM annotations, but I didn't get any reply, and I could only get demo annotations. Can you help me? Thanks again.

Questions about GPU and VRAM usage for Baseline finetune.

Discussed in #56

Originally posted by piqiuni March 22, 2024
In the Table in Finetune, it says when Batch size = 4, the required VRAM is 34GB.
We are using RTX4090*2 for finetune, but we can only run with Batch size = 1, and the VRAM usage is about 20GB per card. Why it take so much VRAM?

Do you have any suggestions on the GPU usage? Should we try to get more GPUs for training? Maybe give me some detailed suggestions on GPU types and numbers?

Thanks for your suggestions! Forgive me for being a beginner~

The purpose of using key objects

Hi, thanks for your great work! What's the purpose of the defination of key objects. It seems that the key objects is both in the questions and answers part. Do you expect the definition of Key objects to appear in pre prompt or do you need post-processing to replace it with a standardized format?

[Bug] ValueError: 'Back up.' is not in list

Thanks for the excellent work and auto-driving challenge. I have a bug when I run convert_data.py under "./challenge".

ValueError: 'Back up.' is not in list

I check the function and find that in line 7 of "challenge/convert_data.py". There is no "Back up" option. What should I do about it?

def rule_based1(question, answer):
rule = ["Going ahead.", "Turn right.", "Turn left.", "Stopped."]
question += f" Please select the correct answer from the following options: A. {rule[0]} B. {rule[1]} C. {rule[2]} D. {rule[3]}"
idx = rule.index(answer)
mapping = {0: "A", 1: "B", 2: "C", 3: "D"}
return {"Q": question, "A": mapping[idx]}

dataloader example

Hi, I am new in this area, can u provide a example .ipynb for drivelm dataloader? I dont know how to build the map from frame name in .json like "4a0798f849ca477ab18009c3a20b7df2" to filename like "n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915244512465".

Thank you

Question about the baseline model

Thank you for the great work! I would like to ask a few questions about the baseline model (finetuned LLaMA Adapter V2).

  1. Is the result from a model trained on full train split of DriveLM? In this context, what does "zero-shot result" mean?
  2. Are you planning to release the finetuned model?

Thank you!

evaluation error

Hi! For the challenge, when we do evaluation using the example output.json and tesdt_eval.json, the chatgpt evaluate part encounter an error
Exception in thread Thread-3: Traceback (most recent call last): File "/root/anaconda3/envs/llama_adapter_v2/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/root/anaconda3/envs/llama_adapter_v2/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/root/anaconda3/envs/llama_adapter_v2/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results task = get() File "/root/anaconda3/envs/llama_adapter_v2/lib/python3.8/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) TypeError: __init__() takes 1 positional argument but 2 were given
Could you help us solve this problem?

about the validation set

Thanks for your great work! May I ask when you plan to release the validation set? Or can you provide the tokens of the validation samples?

Evaluation error on my model as it outputs <c3,CAM_FRONT,1100.0,500.0,50.0>. (3 digit coords)

my model output wrong instance as <c3,CAM_FRONT,1100.0,500.0,50.0>. (3 coords)

import re, numpy as np 
answer="<AI answer>"

answer_nums = re.findall(r'\d+\.\d+', answer)
print(answer_nums)
answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2)
print(answer_nums)

above example script described in evaluation.py

answer_nums = re.findall(r'\d+\.\d+', answer)
GT_nums = re.findall(r'\d+\.\d+', GT)
# transform string into float
answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2)
GT_nums = np.array([list(map(float, x.split()))[0] for x in GT_nums]).reshape(-1, 2)

seems to extract coordinates.

if answer contains correct format (2 coords per object), they are parsed well.

answer = "There is a white truck to the front of the ego vehicle, a white sedan to the back of the ego vehicle, and a white sedan to the front of the ego vehicle. The IDs of these objects are <c1,CAM_FRONT,1000.0,500.0>, <c2,CAM_BACK,850.0,500.0>, and <c3,CAM_FRONT,1000.0,500.0>."
>> printed
['1000.0', '500.0', '850.0', '500.0', '1000.0', '500.0']
[[1000.  500.]
 [ 850.  500.]
 [1000.  500.]]

but if some object has 3 coords, the code fails.

answer = "There is a black sedan to the back of the ego vehicle, a black sedan to the front of the ego vehicle, a black sedan to the front of the ego vehicle, and a black sedan to the front of the ego vehicle. The IDs of these objects are <c1,CAM_BACK,1000.0,500.0>, <c2,CAM_FRONT,1000.0,500.0>, and <c3,CAM_FRONT,1100.0,500.0,50.0>."
>>
Traceback (most recent call last):
  File "evaluation.py", line 134, in <module>
    evaluation.set_graph(predict, GT)
  File "evaluation.py", line 82, in set_graph
    self.graph = self.match_result(answer, GT)
  File "evaluation.py", line 68, in match_result
    answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2)
ValueError: cannot reshape array of size 7 into shape (2)

I would fix evaluation to only extract two digits per < object > rather than using regexp.

My point is, am I allowed to correct evaluation.py to correct model output format?
if so, generally how far am I allowed to correct AI output?
what would be violations to the challenge rule?

got error when i want to pretrian the model

The PretrainDataset class in dataset.py is right? it cannot used for load data from test_llama.json. I read the code, then found the data form of PretrainDataset class needed and FinetuneDataset is different.

Got wrong reply from ChatGPT

With the output.json and test_eval.json in this repo, we got the wrong reply from ChatGPT

[{'role': 'system', 'content': 'an evaluator who rates my answer based on the correct answer'}, {'role': 'user', 'content': 'Rate my answer based on the correct answer out of 100, with higher scores indicating that the answer is closer to the correct answer, and you should be accurate to single digits like 62, 78, 41,etc. Output the number only

This is the correct answer: Firstly, notice <c3,CAM_FRONT_LEFT,1075.5,382.8>. The object is a traffic sign, so the ego vehicle should continue at the same speed. Secondly, notice <c2,CAM_FRONT,836.3,398.3>. The object is a traffic sign, so the ego vehicle should accelerate and continue ahead. Thirdly, notice <c1,CAM_BACK,991.7,603.0>. The object is stationary, so the ego vehicle should continue ahead at the same speed.

This is my answer: 1. The ego vehicle should notice the traffic light first, as it is the first object in the image. The traffic light is in the process of changing from red to green, indicating that the vehicle should proceed with caution.2. The ego vehicle should notice the truck next, as it is the second object in the image. The truck is stopped at the intersection, and the ego vehicle should be cautious when approaching the intersection to ensure it does not collide with the truck.3. The ego vehicle should notice the bus next, as it is the third object in the image. The bus is stopped at the intersection, and the ego vehicle should be cautious when approaching the intersection to ensure it does not collide with the bus.4. The ego vehicle should notice the traffic light next, as it is the fourth object in the image. The traffic light is in the process of changing from green to red, indicating that the vehicle should stop at the intersection.5. The ego vehicle should notice the truck next, as it is the fifth object in the image. The truck is stopped at the intersection, and the ego vehicle should be cautious when approaching the intersection to ensure it'}]

Sorry, but I cannot evaluate your answer based on the correct answer as the two responses are completely different.

Please help to figure this problem.
Should we add a "is completely different, give a '0' reply" in the content?

is it ok to use nuScenes mini dataset?

is it ok to use nuScenes mini dataset or it have to work with nuScenes full dataset?
Because full nuScenes data is about 500G and the downloading often crush at www.nuscenes.org/download.
i just want explore some mini set data in driveLM.
Or can you tell me how to get nuScenes dataset quickly?

Inference OutOfMemoryError with the same params of Finetune.

I got the torch.cuda.OutOfMemoryError in demo.py. But the params of --batch_size and --num_processes are the same with Finetune. (--batch_size 1 --num_processes 2 with 4090*2)

Is that caused by the total Vmemory usage (weights 13GB + checkpoint-3.pth 14GB + else) over the 24GB of 4090?
How can I run the Inference using the VRAM of two GPUs?
Or should we quantize the model to reduce the VRAM usage to run on a single 4090?

Can you give us advice on dealing with the problem?
Thanks a lot!

How to run demo.py on multi-GPUs?

Hi, appreciate the great work!
I noticed that the demo.py used for inference is only suitable for single GPU, which make it really slow to get an output.json file. How can I run the inference on multi-GPUs? Do you have scrips for multi-GPUs inference?

ValueError in evaluation.py, the answer got from infer has a wrong number.

Running evaluation.py with error:

File "evaluation.py", line 93, in match_result
    answer_nums = np.array([list(map(float, x.split()))[0] for x in answer_nums]).reshape(-1, 2)
ValueError: cannot reshape array of size 13 into shape (2)

The accurate answer format should be [880.0, 500.0, 1000.0, 500.0, 1000.0, 500.0] and reshape to [[ 880. 500.], [1000. 500.], [1000. 500.]]
But I got a [1055.5, 510.0, 1055.5, 510.0, 1055.5, 510.0, 1055.5, 510.0, 1055.5, 510.0, 1055.5, 510.0, 1055.5] with 13 numbers in my answer.

I have no idea why it happens.
Please help

Recommended Experimental Settings

Hi DriveLM organizers, thank you for this hard and meaningful work! I'm wondering the recommended experimental settings (e.g., V100 (32G) * 4 will be enough?) for reimplementing this baseline.

Question About GVQA Data

Hi, it's great work for AD's development with LLM.
When I got test.json by running exract_data.py, I found it was not like the GVQA structure mentioned in paper. and the training doesn't contain the Context(C) QA. So I wanna know if it's the final version about training or will be updated ?
thanks!

About the image of demo data.

Hi,
great work. I'd like to know which part of Nusence you have used. And What are the images of your demo data. Thanks

What would be perfect score for perfect answer?

Use following script to generate perfect output

import json

with open('./output.json', 'r') as f:
    output = json.load(f)

perfect_output = []
for sample in output:
    new_sample={
            'id':sample['id'],
            'question':sample['question'],
            'gt_answer':sample['gt_answer'],
            'answer':sample['gt_answer'],
            }
    perfect_output.append(new_sample)

with open(f"./perfect_output.json", 'w') as f:
    json.dump(perfect_output, f, indent=4)

Then run,
python evaluation.py --root_path1 ./perfect_output.json --root_path2 ./test_eval.json

And it gives me

accuracy:  1.0
chatgpt:  100.0
match:  100.0
language score:  {'val/Bleu_1': 0.9999999999920699, 'val/Bleu_2': 0.0009999999999940523, 'val/Bleu_3': 9.999999999947137e-05, 'val/Bleu_4': 3.162277660152707e-05, 'val/ROUGE_L': 1.0, 'val/CIDEr': 1.9156274954912038}
final score:  0.8961230436827525

Please note than I am not familiar with language score metrics.
But there seems to be something wrong on Bleu_2, 3, 4?
They seem to be too low.

Reasoning speed is too slow

In the code, I need to run llama every time I reply to a word, is this a normal phenomenon? By the way, is the final assessment a validation collection or a test collection? Is it possible to finally evaluate the selected test set? I doubt that the evaluation of too many samples will bring a lot of computational burden.

Leveraging additional information from nuscenes

Hi there, we're considering if the agent could benefit from more than just keyframe images to generate the right answers. Is that possible to use some extra infomations from nuscenes in this task, like continuous frame images or radar points, which can be used to obtain more precise velocity information?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.