tsinghua-mars-lab / densetnt Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Thank you for your wonderful work!
I'm trying to reproduce your results on Argoverse dataset and encountering the following problem.
Let's say I have a dataset stored in folders /datasets/argoverse/train/data
and /datasets/argoverse/val/data
. I want to optimize minFDE, so I add the following line to my command: --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1
OUTPUT_DIR=models.densetnt.1; GPU_NUM=8; CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python src/run.py --argoverse --future_frame_num 30 --do_train --data_dir /datasets/argoverse/train/data --output_dir ${OUTPUT_DIR} --hidden_size 128 --train_batch_size 64 --sub_graph_batch_size 4096 --use_map --core_num 16 --use_centerline --distributed_training ${GPU_NUM} --other_params semantic_lane direction l1_loss goals_2D enhance_global_graph subdivide lazy_points new laneGCN point_sub_graph stage_one stage_one_dynamic=0.95 laneGCN-4 point_level point_level-4 point_level-4-3 complete_traj complete_traj-3 --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1 --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1
This command will fail due to this assertion https://github.com/Tsinghua-MARS-Lab/DenseTNT/blob/main/src/utils.py#L276 because the models.densetnt.1
folder is not created yet and my validation is not in val/data
folder.
This brings me to three questions:
--do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1
and then run the command with it? What is the intended use case?Thank you!
While a two-stage training schedule is described in the training details section in the paper, the end-to-end training is used in this repo.
Any reasons for the difference?
I read the code on branch argoverse2, could you help me answer some questions about the code?
lane_type_to_int[LaneType.BIKE] = 2
lane_type_to_int[LaneType.BUS] = 3
vectors.append(vector)
not in the end of this for loop?loss[i] += F.nll_loss(pred_probs[i].unsqueeze(0), torch.tensor([argmin], device=device))
, should the second parameter be torch.tensor([1], device=device)
?if len(focal_track.object_states) != 110:
return None
because some object_states of focal_track is less than 110, or the code won't run.
I found that the best model performance is still the default(bz64, epoch16) in the paper by increasing the batch-size 64 to 256 and training epoch from 16 to 64 and changing to cos-lr.
I would like to ask what is a good training strategy to refer to if I want to improve the model performance.
Then it is more doubtful that the model does not become better by increasing the batch-size and epoch training times.
Thanks
I'm confused that the paper said DenseTNT is anchor-free, however, in the code, all the lane boundary point(goals_2D) is the model input, Is this contradictory?
when I trained the Set Predictor, eval results are as follows:
other_errors {'stage_one_k': 3.0803698258030754, 'stage_one_recall': 0.9638398086076706, 'set_MR_pred': 0.0738903979863932, 'set_minFDE_pred': 1.3842987156892936}
{'minADE': 14.246385467933312, 'minFDE': 1.3842987156892916, 'MR': 0.0738903979863932}
ADE 14.316873007381048
DE@1 9.875993647573898
DE@2 19.429086244927365
DE@3 3.496420582410066
the minADE is absolutely wrong. I visualize the pred trajs, they are also not right, I think the complete traj module is not trained sufficient, but I trained the model follow your instructions, so where exactly is the problem?
Hi, @hangzhaomit ,
This work is great! Will the code will be released? Or this repository is only for the website?
Thanks~
Thank you for your codes.
I notice that the goals_2D_mlps does not have gradients even for the training.
Is it expected? And why?
https://github.com/Tsinghua-MARS-Lab/DenseTNT/blob/main/src/modeling/decoder.py#L181-L184
Dear MARS Lab,
Thanks for sharing the excellent work!
By following your instruction, the evaluation result is successfully reproduced.
Just wanna ask, do you guys have a plan to upload the inference code?
Hi @GentleSmile,
In your paper, the sampling strategy is densely sampled lines within 50 meters from initial position of target agent's trajectory. However, in your implementation, it seems that you firstly score the sparse goals and then dense the selected top 150 goals to generate dense goals from this 150 goals
(
DenseTNT/src/modeling/decoder.py
Line 144 in 398b8ce
Hi,
In the ICCV paper,I noticed that you sampled dense goals along the road, as illustrated in your paper:
but in this code repo, you sampled goals along the lane centerline, which is as same as the vanilla TNT:
DenseTNT/src/dataset_argoverse.py
Lines 125 to 137 in 398b8ce
The parameter include_beside
of function get_subdivide_points()
will produce a denser goal set, but it seems you are not using it.
Hello
Thank you for a great job!
I would like to use your code.
What is the license for this code?
Thanks for the sharing this awesome code.
Are you also planning to update code for Waymo Motion Dataset ?
the winning model of the 2021 Waymo Motion Prediction Challenge
Hi,
Thank you for sharing the great work. I have some confusions about the code, and I put these questions in a PDF file, please check it out, thanks for your reply!
Some confusions about this code.pdf
Can you describe the argoverse data structure in your train/data folder?
like command tree train/data
thanks
@GentleSmile
root@18dc3f8e2e1d:/workspace/wangs/DenseTNT# python src/run.py --argoverse --future_frame_num 30 --do_train --data_dir /workspace/datasets/Argoverse/train/data/ --output_dir models.densetnt.1 --hidden_size 128 --train_batch_size 64 --use_map --core_num 16 --use_centerline --distributed_training 8 --other_params semantic_lane direction l1_loss goals_2D enhance_global_graph subdivide goal_scoring laneGCN point_sub_graph lane_scoring complete_traj complete_traj-3
{'add_prefix': None, 'agent_type': None, 'argoverse': True, 'attention_decay': False, 'autoregression': None, 'core_num': 16, 'cuda_visible_device_num': None, 'data_dir': '/workspace/datasets/Argoverse/train/data/', 'data_dir_for_val': 'val/data/', 'debug': False, 'distributed_training': 8, 'do_eval': False, 'do_test': False, 'do_train': True, 'eval_batch_size': 64, 'eval_params': [], 'future_frame_num': 30, 'future_test_frame_num': 16, 'global_graph_depth': 1, 'gpu_split': 0, 'hidden_dropout_prob': 0.1, 'hidden_size': 128, 'initializer_range': 0.02, 'inter_agent_types': None, 'learning_rate': 0.001, 'log_dir': 'models.densetnt.1', 'lstm': False, 'master_port': '12355', 'max_distance': 50.0, 'method_span': [0, 1], 'mode_num': 6, 'model_recover_path': None, 'model_save_dir': 'models.densetnt.1/model_save', 'multi': None, 'nms_threshold': None, 'no_agents': False, 'no_cuda': False, 'no_sub_graph': False, 'not_use_api': False, 'num_train_epochs': 16.0, 'nuscenes': False, 'old_version': False, 'other_params': {'semantic_lane': True, 'direction': True, 'l1_loss': True, 'goals_2D': True, 'enhance_global_graph': True, 'subdivide': True, 'goal_scoring': True, 'laneGCN': True, 'point_sub_graph': True, 'lane_scoring': True, 'complete_traj': True, 'complete_traj-3': True}, 'output_dir': 'models.densetnt.1', 'placeholder': 0.0, 'reuse_temp_file': False, 'seed': 42, 'single_agent': True, 'stage_one_K': None, 'sub_graph_batch_size': 8000, 'sub_graph_depth': 3, 'temp_file_dir': 'models.densetnt.1/temp_file', 'train_batch_size': 64, 'train_extra': False, 'train_params': [], 'use_centerline': True, 'use_map': True, 'visualize': False, 'waymo': False, 'weight_decay': 0.01}
10/21/2022 01:57:04 - INFO - main - ***** args *****
output_dir models.densetnt.1
other_params ['semantic_lane', 'direction', 'l1_loss', 'goals_2D', 'enhance_global_graph', 'subdivide', 'goal_scoring', 'laneGCN', 'point_sub_graph', 'lane_scoring', 'complete_traj', 'complete_traj-3']
10/21/2022 01:57:11 - INFO - main - device: cuda
Loading dataset ['/workspace/datasets/Argoverse/train/data/']
/opt/conda/lib/python3.8/site-packages/scipy/init.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.4)
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
10/21/2022 01:57:12 - INFO - argoverse.data_loading.vector_map_loader - Loaded root: ArgoverseVectorMap
Running DDP on rank 3.
Running DDP on rank 5.
Running DDP on rank 1.
Running DDP on rank 7.
Running DDP on rank 0.
Running DDP on rank 6.
Running DDP on rank 4.
Running DDP on rank 2.
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 6
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 4
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 2
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 3
10/21/2022 01:57:14 - INFO - argoverse.data_loading.vector_map_loader - Loaded root: ArgoverseVectorMap
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 5
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 1
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 0
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 7
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
['/workspace/datasets/Argoverse/train/data/129892.csv', '/workspace/datasets/Argoverse/train/data/179439.csv', '/workspace/datasets/Argoverse/train/data/153379.csv', '/workspace/datasets/Argoverse/train/data/11971.csv', '/workspace/datasets/Argoverse/train/data/181683.csv'] ['/workspace/datasets/Argoverse/train/data/209097.csv', '/workspace/datasets/Argoverse/train/data/102649.csv', '/workspace/datasets/Argoverse/train/data/186077.csv', '/workspace/datasets/Argoverse/train/data/74459.csv', '/workspace/datasets/Argoverse/train/data/89887.csv']
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [06:12<00:00, 552.14it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [00:07<00:00, 27049.96it/s]
valid data size is 205942
Dear author, thanks again for sharing the code.
Now I'm trying to train and test on Argoverse Prediction Dataset.
And I met an error at the beginning of training like below.
I'm not sure if it's related with GPU memory or batch_size. (I'm using a single Titan X Pascal)
Can you give me some advice? Thank you !
['train/data/203014.csv', 'train/data/122663.csv', 'train/data/186083.csv', 'train/data/179329.csv', 'train/data/39652.csv'] ['train/data/1859.csv', 'train/data/99352.csv', 'train/data/31180.csv', 'train/data/175042.csv', 'train/data/79405.csv']
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [23:46<00:00, 144.36it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [00:07<00:00, 26514.47it/s]
valid data size is 205942
Traceback (most recent call last):
File "src/run.py", line 309, in <module>
main()
File "src/run.py", line 298, in main
run(args)
File "src/run.py", line 280, in run
while not spawn_context.join():
File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/home/jaehyeon/conda_ws/denseTNT/src/run.py", line 198, in demo_basic
model = VectorNet(args).to(rank)
File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 607, in to
return self._apply(convert)
File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 376, in _apply
param_applied = fn(param)
File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 605, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
Hi authors,
Congratulations on your excellent work! Can you tell me the total number of model parameters used by your best-performed model on the Argoverse dataset?
Hello,
Thanks for sharing the great work.
Since you evaluated your model with Waymo data, I wonder how you could use Waymo dataloader which is in Tensorflow? Is there a way we can bypass that and still have a Pytorch code?
Thanks!
Your work has been a great help for a beginner, but I don’t understand how to visualize the results obtained? Hope to get your reply, thank you again for your work.
Hello, the above is my command when running prediction, but the following problem always occurs. What is the reason?
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] File "/home/notebook/code/personal/waymo/DenseTNT/src/modeling/vectornet.py", line 32, in forward
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] hidden_states, lengths = utils.merge_tensors(input_list, device)
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] File "/home/notebook/code/personal/S9048717/waymo/DenseTNT/src/utils.py", line 808, in merge_tensors
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] res[i][:tensor.shape[0
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] RuntimeError: The expanded size of the tensor (128) must match the existing size (64) at non-singleton dimension 1. Target sizes: [19, 128]. Tensor sizes: [19, 64]
2022-07-08 14:44:26 2022-07-08 14:44:24.000 [INFO] [Driver] Task task-20220708144211-34515 run failed, Exit code : 1
Hello,
Thanks for sharing the great work.
I wonder how you use densetnt for INTERACTION dataset. Could you release the related part?
Thanks!
Dear authors, thank you for releasing this wonderful code. Great work!
I am trying to run your code for single-machine multi-GPUs. I read PyTorch's docs and tutorials for data parallel training, but did not come up with a method. I am wondering whether you could give me some hints. Thank you.
Thanks for sharing this great work. I don't understand why introducing 'optimizer_2' for optimizing the complete trajectory decoder, could you give some directions? Thanks!
Dear authors,
Thanks for sharing this excellent work!
Can you share the log information about training loss? THX
Argoverse 2 dataset has been released. Have you ever tried to use DenseTNT on it? And what about the result?
Dear authors,
First of all, thank you soooooo much for sharing this terrific work!
I trained and validated the model using the command you kindly provided in the readme file. The performance on the validation split was ADE=0.79 and FDE= 1.25(FYI, I use 8-GPU setting for training). I think the performance matches the "DenseTNT w/ 100ms optimization" mode in the paper. So I was wondering if the command you provided is for "DenseTNT w/ 100ms optimization" mode and if it is possible for us to train the model with the mode "DenseTNT w/ 100ms optimization (minFDE)". If so, which parameters should we adjust in the training/testing command?
Hi! Thanks again for your great work!
I trained and evaluate your model and the performance matches what you mentioned in your paper.
I'm wondering if you could provide me some method to visualize the prediction or any results, like heatmaps or other traj lines.
I'm a little confuse, is there any function in your code for that? and how to use it?
Thanks for the great job.
I have some questions about the definition of outcomes. What are stage_one_k and stage_one_recall in other_errors? What are FDE, ADE, DE@1, DE@2, and DE@3 mean? Are FDE and ADE in outcomes represent the minFED1 and minADE1?
Thanks
Hi,
Thank you for sharing the great work.
Since densetnt need sampling final points from centerline. So, how to deal with the situation when pedestrians are not on the lane?
I wonder if you could give me some advice.
THX
when I trained the Set Predictor, eval results are as follows:
other_errors {'stage_one_k': 3.0803698258030754, 'stage_one_recall': 0.9638398086076706, 'set_MR_pred': 0.0738903979863932, 'set_minFDE_pred': 1.3842987156892936}
{'minADE': 14.246385467933312, 'minFDE': 1.3842987156892916, 'MR': 0.0738903979863932}
ADE 14.316873007381048
DE@1 9.875993647573898
DE@2 19.429086244927365
DE@3 3.496420582410066
the minADE is absolutely wrong. I visualize the pred trajs, they are also not right, I think the complete traj module is not trained sufficient, but I trained the model follow your instructions, so where exactly is the problem?
Thank you for a great repo. Will the pretrained model be released somewhere? Also, how can we recreate the video demos shown on the website (https://tsinghua-mars-lab.github.io/DenseTNT/)
when I follow the README.md running it, an error popped up saying StopIteration. Happended on src/dataset_argoverse.py line 450:
root, dirs, cur_files = os.walk(each_dir).next()
StopIteration
How to fix it? Thanks!
Thank you for releasing your code. And I'm comfused about the offline model to produce pseudo-labels seems not work, or your online or test mode is not released? It seems now only support offline mode, right?
Thank you very much and I'm looking forward to your response.
I'm wondering how much time you train the model.
In fact ,i plan to train the model with two 2080ti but found it takes a lot of time to train(2h /epoch) and i found that it seems that the gpu utility is always low . is that a problem about the code?
Can we accelerate the train progress by preprocessing the data?
Thank you for sharing the great work! I have a confusion about the second stage loss function.
In "def goals_2D_per_example_calc_loss()", there has an nll_loss which caculated by "scores" and "[mapping[i]['goals_2D_labels']]", the "scores" is about dense goals, but the "[mapping[i]['goals_2D_labels']]" seems index from sparse goals. Am I misunderstanding? Why does this work?
Hi,
Thank you for sharing a great work.
It would be great if you could provide some documentation about the meaning of the parameters, especially other_params, such as laneGCN-4.
Thank you!
Hello, I have a question to interrupt you. I reproduced your results experimentally, using a train-before-evaluation approach. But I found that the results after using do_eval (FDE 1.0513439117392331, MR 0.09578942034860154) are far better than when training(FDE: 3.1969465177600322 MR(2m,4m,6m): (0.49478493944897106, 0.23618785871750297, 0.1306969923570714).I found that running do_eval also running recover --model, is the optimization model? If it is a model, the new model generated is not seen.
And the relevant part of the test doesn't seem to be tested after optimization. Evaluate whether the optimization and testing parts are run together. (I just reproduced the results, I didn't study your code, please forgive me if I misunderstood) If you can answer my question, I will be very grateful
Hi,
Thanks for your awesome work.
How long did you train DenseTNT on the Waymo dataset? And how many GPUs (type) did you use?
Thank you for sharing the great work! I have a confusion about the experiments:
The running results with this code seem far from the result in the paper. All params are default, after 16 training epochs:
loss=4.445
FDE: 3.1506308334590956
MR(2m,4m,6m): (0.4871615584819174, 0.2329565318727421, 0.13079283688769763)
What could be the problem?
The results predicted by the set predictor model are not good for val data
Thanks so much for the sharing and congratulations on your great work! I am looking for your suggestions on what to do with an issue I encountered recently. Thank you really much for your attention!
When I was running the training program, I found that sometimes the program would be hung up at random epochs (Not the 1st epoch, and not at the end of an epoch). When it was hung up, the GPU usage is 0% while the GPU memory remains on the same level, far from being full. For the CPU part, the memory is far from being full and the CPU usage is 0%. The program just stopped outputing, and the the training was not continuing. The only way to exit it is to stop it, and the traceback showed that it stopped at:
while not spawn_context.join():
On 'top', the program sometimes show the status of 'Sleeping' after being hung up. I encountered the problem on different settings (Windows10+WSL+Docker(Ubuntu)+Nvidia RTX 3080; Ubuntu 20.04.4+Nvidia RTX A6000). I am looking forward to your suggestions on this issue.
Thanks again for your kind help and I am looking forward to learning more from you!
Dear authors,
Thanks for sharing this excellent work!
I noticed that the training command you provided is for offline optimization mode, could you please provide the command for the goal set predictor training (the second stage, online mode)?
Can you share your model weights or alternatively your final outputs?
你好,TNT: Target-driveN Trajectory Prediction的源代码,可以分享一下吗
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.