tsinghua-mars-lab / densetnt Goto Github PK

View Code? Open in Web Editor NEW

458.0 458.0 119.0 5.44 MB

License: MIT License

Python 90.64% Cython 9.36%

densetnt's People

Contributors

Stargazers

Watchers

Forkers

kikitian healthysong littlefis ttuananh112 mengxingshifen1218 xd-jie kay1794 collector-m h0ngxuanli zyandtom flclain jsor2009 dachengxiaocheng hayoung-kim remoteblue iamweiweishi jiazewang herolin12 manolotis shitianyu-hue liuxy416 dmame huangatlas g-jaqen g316710519 ecustboy yangyongguang qiusuor jarvisisfriday yanwu-ge qiangyuchuan jackflying leepaul009 thirteentj shengchao-y mitchell-lee-93 cram3r95 cuixh113 zhlstone alpha9527 kingwmk xixihaha007 sirupli jialn yuhuang-ca deepbehavier kangxia1990 yumianhuli2 mad-dee sancarlim michaelczhou guoxs yangyin123456 liuhaolan yikangzhang1641 tiaoye chen-h01 buaafish joe12138 xiya888 meegomeng shivampr21 pushaday yzfly techtoker devoe-yun gudks xiaofa-jpg hans-lan alexmcmaster lunlun123 zurzeit piaopiaojie carrotsniper kaiyin77 mniedoba avi9700 linxiyuan likun97 wubonian eagle20111 paulazhang1996 lemyx zxsong999 zhoubinxm cehao1 wangjuenew saferoboticslab mpcheng-zw lisman2015 coco-alen idjoopal jetsql mahmoudel-husseni genowong cheng123123111 zhaozhen2333 teddywesside1 chenyuheng-gif xinchengzelin

densetnt's Issues

MinFDE optimization error

Thank you for your wonderful work!

I'm trying to reproduce your results on Argoverse dataset and encountering the following problem.
Let's say I have a dataset stored in folders /datasets/argoverse/train/data and /datasets/argoverse/val/data. I want to optimize minFDE, so I add the following line to my command: --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1

OUTPUT_DIR=models.densetnt.1; GPU_NUM=8; CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python src/run.py --argoverse --future_frame_num 30 --do_train --data_dir /datasets/argoverse/train/data --output_dir ${OUTPUT_DIR} --hidden_size 128 --train_batch_size 64 --sub_graph_batch_size 4096 --use_map --core_num 16 --use_centerline --distributed_training ${GPU_NUM} --other_params semantic_lane direction l1_loss goals_2D enhance_global_graph subdivide lazy_points new laneGCN point_sub_graph stage_one stage_one_dynamic=0.95 laneGCN-4 point_level point_level-4 point_level-4-3 complete_traj complete_traj-3 --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1 --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1

This command will fail due to this assertion https://github.com/Tsinghua-MARS-Lab/DenseTNT/blob/main/src/utils.py#L276 because the models.densetnt.1 folder is not created yet and my validation is not in val/data folder.

This brings me to three questions:

Which command should I use to reproduce the experiment?
Should I firstly run the command without --do_eval --eval_params optimization MRminFDE=0.0 cnt_sample=9 opti_time=0.1 and then run the command with it? What is the intended use case?
Is there a way to specify folders for train and validation in one command? It seems that the path for validation is just hard-coded in the line https://github.com/Tsinghua-MARS-Lab/DenseTNT/blob/main/src/utils.py#L318

Thank you!

End-to-End training or Two-Stage training ?

While a two-stage training schedule is described in the training details section in the paper, the end-to-end training is used in this repo.

Any reasons for the difference?

Some questions about code details

I read the code on branch argoverse2, could you help me answer some questions about the code?

dataset_argoverse.py L501 , the code should be like in the following?

lane_type_to_int[LaneType.BIKE] = 2 
lane_type_to_int[LaneType.BUS] = 3

dataset_argoverse.py L527, why the code vectors.append(vector) not in the end of this for loop?
decoder.py L339 loss[i] += F.nll_loss(pred_probs[i].unsqueeze(0), torch.tensor([argmin], device=device)), should the second parameter be torch.tensor([1], device=device)?
dataset_argoverse.py L428 should add:

if len(focal_track.object_states) != 110:
            return None

because some object_states of focal_track is less than 110, or the code won't run.

argoverse2 didn't train the set_predictor, have you try to train set_predictor? If the performance could be closet to that of offline optimization?

Some Training strategy

I found that the best model performance is still the default(bz64, epoch16) in the paper by increasing the batch-size 64 to 256 and training epoch from 16 to 64 and changing to cos-lr.

I would like to ask what is a good training strategy to refer to if I want to improve the model performance.

Then it is more doubtful that the model does not become better by increasing the batch-size and epoch training times.

Thanks

The doubt about anchor-free

I'm confused that the paper said DenseTNT is anchor-free, however, in the code, all the lane boundary point(goals_2D) is the model input, Is this contradictory？

minADE is not right

when I trained the Set Predictor, eval results are as follows:

other_errors {'stage_one_k': 3.0803698258030754, 'stage_one_recall': 0.9638398086076706, 'set_MR_pred': 0.0738903979863932, 'set_minFDE_pred': 1.3842987156892936}
{'minADE': 14.246385467933312, 'minFDE': 1.3842987156892916, 'MR': 0.0738903979863932}
ADE 14.316873007381048
DE@1 9.875993647573898
DE@2 19.429086244927365
DE@3 3.496420582410066

the minADE is absolutely wrong. I visualize the pred trajs, they are also not right, I think the complete traj module is not trained sufficient, but I trained the model follow your instructions, so where exactly is the problem?

about code release

Hi, @hangzhaomit ,

This work is great! Will the code will be released? Or this repository is only for the website?

Thanks~

goals_2D_mlps does not have gradients

Thank you for your codes.
I notice that the goals_2D_mlps does not have gradients even for the training.
Is it expected? And why?

https://github.com/Tsinghua-MARS-Lab/DenseTNT/blob/main/src/modeling/decoder.py#L181-L184

arg.visualize

inference question

Dear MARS Lab,

Thanks for sharing the excellent work!
By following your instruction, the evaluation result is successfully reproduced.

Just wanna ask, do you guys have a plan to upload the inference code?

Would you release trained Waymo model??

How to convert model.bin to ONNX format?

Question About the dense goal sampling

Hi @GentleSmile,

In your paper, the sampling strategy is densely sampled lines within 50 meters from initial position of target agent's trajectory. However, in your implementation, it seems that you firstly score the sparse goals and then dense the selected top 150 goals to generate dense goals from this 150 goals
(

DenseTNT/src/modeling/decoder.py

Line 144 in 398b8ce

 goals_2D_new = utils.get_neighbour_points(goals_2D[topk_ids], topk_ids=topk_ids, mapping=mapping[i]) 

),
which is different from your illustration in the paper.
Is this true? If so, why change it?
I am looking forward to your reply. Thank you in advance

Question about goals sampling

Hi,

In the ICCV paper,I noticed that you sampled dense goals along the road, as illustrated in your paper:

but in this code repo, you sampled goals along the lane centerline, which is as same as the vanilla TNT:

DenseTNT/src/dataset_argoverse.py

Lines 125 to 137 in 398b8ce

 for index_polygon, polygon in enumerate(polygons): 

 for i, point in enumerate(polygon): 

 hash = get_hash(point) 

 if hash not in visit: 

 visit[hash] = True 

 points.append(point) 

 if 'subdivide' in args.other_params: 

 subdivide_points = get_subdivide_points(polygon) 

 points.extend(subdivide_points) 

 subdivide_points = get_subdivide_points(polygon, include_self=True) 

 mapping['goals_2D'] = np.array(points)

The parameter include_beside of function get_subdivide_points() will produce a denser goal set, but it seems you are not using it.

License

Hello

Thank you for a great job！

I would like to use your code.
What is the license for this code?

Model for Waymo Motion Dataset ?

Thanks for the sharing this awesome code.

Are you also planning to update code for Waymo Motion Dataset ?
the winning model of the 2021 Waymo Motion Prediction Challenge

Some confusions about this code

Hi,
Thank you for sharing the great work. I have some confusions about the code, and I put these questions in a PDF file, please check it out, thanks for your reply！
Some confusions about this code.pdf

structure of train/data

Can you describe the argoverse data structure in your train/data folder?
like command tree train/data
thanks
@GentleSmile

Multi-GPU training does not move on this interface

root@18dc3f8e2e1d:/workspace/wangs/DenseTNT# python src/run.py --argoverse --future_frame_num 30 --do_train --data_dir /workspace/datasets/Argoverse/train/data/ --output_dir models.densetnt.1 --hidden_size 128 --train_batch_size 64 --use_map --core_num 16 --use_centerline --distributed_training 8 --other_params semantic_lane direction l1_loss goals_2D enhance_global_graph subdivide goal_scoring laneGCN point_sub_graph lane_scoring complete_traj complete_traj-3
{'add_prefix': None, 'agent_type': None, 'argoverse': True, 'attention_decay': False, 'autoregression': None, 'core_num': 16, 'cuda_visible_device_num': None, 'data_dir': '/workspace/datasets/Argoverse/train/data/', 'data_dir_for_val': 'val/data/', 'debug': False, 'distributed_training': 8, 'do_eval': False, 'do_test': False, 'do_train': True, 'eval_batch_size': 64, 'eval_params': [], 'future_frame_num': 30, 'future_test_frame_num': 16, 'global_graph_depth': 1, 'gpu_split': 0, 'hidden_dropout_prob': 0.1, 'hidden_size': 128, 'initializer_range': 0.02, 'inter_agent_types': None, 'learning_rate': 0.001, 'log_dir': 'models.densetnt.1', 'lstm': False, 'master_port': '12355', 'max_distance': 50.0, 'method_span': [0, 1], 'mode_num': 6, 'model_recover_path': None, 'model_save_dir': 'models.densetnt.1/model_save', 'multi': None, 'nms_threshold': None, 'no_agents': False, 'no_cuda': False, 'no_sub_graph': False, 'not_use_api': False, 'num_train_epochs': 16.0, 'nuscenes': False, 'old_version': False, 'other_params': {'semantic_lane': True, 'direction': True, 'l1_loss': True, 'goals_2D': True, 'enhance_global_graph': True, 'subdivide': True, 'goal_scoring': True, 'laneGCN': True, 'point_sub_graph': True, 'lane_scoring': True, 'complete_traj': True, 'complete_traj-3': True}, 'output_dir': 'models.densetnt.1', 'placeholder': 0.0, 'reuse_temp_file': False, 'seed': 42, 'single_agent': True, 'stage_one_K': None, 'sub_graph_batch_size': 8000, 'sub_graph_depth': 3, 'temp_file_dir': 'models.densetnt.1/temp_file', 'train_batch_size': 64, 'train_extra': False, 'train_params': [], 'use_centerline': True, 'use_map': True, 'visualize': False, 'waymo': False, 'weight_decay': 0.01}

10/21/2022 01:57:04 - INFO - main - ***** args *****
output_dir models.densetnt.1
other_params ['semantic_lane', 'direction', 'l1_loss', 'goals_2D', 'enhance_global_graph', 'subdivide', 'goal_scoring', 'laneGCN', 'point_sub_graph', 'lane_scoring', 'complete_traj', 'complete_traj-3']
10/21/2022 01:57:11 - INFO - main - device: cuda
Loading dataset ['/workspace/datasets/Argoverse/train/data/']
/opt/conda/lib/python3.8/site-packages/scipy/init.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.4)
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
10/21/2022 01:57:12 - INFO - argoverse.data_loading.vector_map_loader - Loaded root: ArgoverseVectorMap
Running DDP on rank 3.
Running DDP on rank 5.
Running DDP on rank 1.
Running DDP on rank 7.
Running DDP on rank 0.
Running DDP on rank 6.
Running DDP on rank 4.
Running DDP on rank 2.
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 6
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 4
10/21/2022 01:57:13 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 2
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 3
10/21/2022 01:57:14 - INFO - argoverse.data_loading.vector_map_loader - Loaded root: ArgoverseVectorMap
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 5
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 1
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 0
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 7
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
10/21/2022 01:57:14 - INFO - torch.distributed.distributed_c10d - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
['/workspace/datasets/Argoverse/train/data/129892.csv', '/workspace/datasets/Argoverse/train/data/179439.csv', '/workspace/datasets/Argoverse/train/data/153379.csv', '/workspace/datasets/Argoverse/train/data/11971.csv', '/workspace/datasets/Argoverse/train/data/181683.csv'] ['/workspace/datasets/Argoverse/train/data/209097.csv', '/workspace/datasets/Argoverse/train/data/102649.csv', '/workspace/datasets/Argoverse/train/data/186077.csv', '/workspace/datasets/Argoverse/train/data/74459.csv', '/workspace/datasets/Argoverse/train/data/89887.csv']
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [06:12<00:00, 552.14it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [00:07<00:00, 27049.96it/s]
valid data size is 205942

CUDA error during training

Dear author, thanks again for sharing the code.

Now I'm trying to train and test on Argoverse Prediction Dataset.

And I met an error at the beginning of training like below.

I'm not sure if it's related with GPU memory or batch_size. (I'm using a single Titan X Pascal)

Can you give me some advice? Thank you !

['train/data/203014.csv', 'train/data/122663.csv', 'train/data/186083.csv', 'train/data/179329.csv', 'train/data/39652.csv'] ['train/data/1859.csv', 'train/data/99352.csv', 'train/data/31180.csv', 'train/data/175042.csv', 'train/data/79405.csv']
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [23:46<00:00, 144.36it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205942/205942 [00:07<00:00, 26514.47it/s]
valid data size is 205942
Traceback (most recent call last):
  File "src/run.py", line 309, in <module>
    main()
  File "src/run.py", line 298, in main
    run(args)
  File "src/run.py", line 280, in run
    while not spawn_context.join():
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/jaehyeon/conda_ws/denseTNT/src/run.py", line 198, in demo_basic
    model = VectorNet(args).to(rank)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 607, in to
    return self._apply(convert)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 376, in _apply
    param_applied = fn(param)
  File "/home/jaehyeon/anaconda3/envs/denseTNT/lib/python3.7/site-packages/torch/nn/modules/module.py", line 605, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal

About model size

Hi authors,

Congratulations on your excellent work! Can you tell me the total number of model parameters used by your best-performed model on the Argoverse dataset?

How to use Pytorch with Waymo data

Hello,

Thanks for sharing the great work.
Since you evaluated your model with Waymo data, I wonder how you could use Waymo dataloader which is in Tensorflow? Is there a way we can bypass that and still have a Pytorch code?
Thanks!

how to visualization

Your work has been a great help for a beginner, but I don’t understand how to visualize the results obtained? Hope to get your reply, thank you again for your work.

do_eval

OUTPUT_DIR=models.densetnt.1;
python src/run.py --argoverse --future_frame_num 30
--do_eval --data_dir ./train/data --output_dir ${OUTPUT_DIR}
--hidden_size 128 --train_batch_size 64 --use_map
--core_num 16 --use_centerline
--eval_params optimization MRminFDE cnt_sample=9 opti_time=0.1

Hello, the above is my command when running prediction, but the following problem always occurs. What is the reason?

2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] File "/home/notebook/code/personal/waymo/DenseTNT/src/modeling/vectornet.py", line 32, in forward
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] hidden_states, lengths = utils.merge_tensors(input_list, device)
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] File "/home/notebook/code/personal/S9048717/waymo/DenseTNT/src/utils.py", line 808, in merge_tensors
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] res[i][:tensor.shape[0
2022-07-08 14:44:26 2022-07-08 14:44:23.000 [INFO] [Driver] RuntimeError: The expanded size of the tensor (128) must match the existing size (64) at non-singleton dimension 1. Target sizes: [19, 128]. Tensor sizes: [19, 64]
2022-07-08 14:44:26 2022-07-08 14:44:24.000 [INFO] [Driver] Task task-20220708144211-34515 run failed, Exit code : 1

INTERACTION dataset

Hello,
Thanks for sharing the great work.
I wonder how you use densetnt for INTERACTION dataset. Could you release the related part?
Thanks!

Running this code for single-machine multi-GPUs

Dear authors, thank you for releasing this wonderful code. Great work!

I am trying to run your code for single-machine multi-GPUs. I read PyTorch's docs and tutorials for data parallel training, but did not come up with a method. I am wondering whether you could give me some hints. Thank you.

Why use two optimiziers?

Thanks for sharing this great work. I don't understand why introducing 'optimizer_2' for optimizing the complete trajectory decoder, could you give some directions? Thanks!

About log for loss

Dear authors,

Thanks for sharing this excellent work!
Can you share the log information about training loss? THX

result on AV2

Argoverse 2 dataset has been released. Have you ever tried to use DenseTNT on it? And what about the result?

Possible performance matching issue, how to optimize minFDE?

Dear authors,

First of all, thank you soooooo much for sharing this terrific work!
I trained and validated the model using the command you kindly provided in the readme file. The performance on the validation split was ADE=0.79 and FDE= 1.25(FYI, I use 8-GPU setting for training). I think the performance matches the "DenseTNT w/ 100ms optimization" mode in the paper. So I was wondering if the command you provided is for "DenseTNT w/ 100ms optimization" mode and if it is possible for us to train the model with the mode "DenseTNT w/ 100ms optimization (minFDE)". If so, which parameters should we adjust in the training/testing command?

About Visualization

Hi! Thanks again for your great work!
I trained and evaluate your model and the performance matches what you mentioned in your paper.
I'm wondering if you could provide me some method to visualize the prediction or any results, like heatmaps or other traj lines.
I'm a little confuse, is there any function in your code for that? and how to use it?

Question about Outcomes

Thanks for the great job.
I have some questions about the definition of outcomes. What are stage_one_k and stage_one_recall in other_errors? What are FDE, ADE, DE@1, DE@2, and DE@3 mean? Are FDE and ADE in outcomes represent the minFED1 and minADE1?
Thanks

pedestrians question

Hi,
Thank you for sharing the great work.
Since densetnt need sampling final points from centerline. So, how to deal with the situation when pedestrians are not on the lane?
I wonder if you could give me some advice.
THX

minADE is not right

when I trained the Set Predictor, eval results are as follows:

Pretrained Model + Demo

Thank you for a great repo. Will the pretrained model be released somewhere? Also, how can we recreate the video demos shown on the website (https://tsinghua-mars-lab.github.io/DenseTNT/)

training Error

when I follow the README.md running it, an error popped up saying StopIteration. Happended on src/dataset_argoverse.py line 450:
root, dirs, cur_files = os.walk(each_dir).next()
StopIteration

How to fix it? Thanks!

Online Mode

Thank you for releasing your code. And I'm comfused about the offline model to produce pseudo-labels seems not work, or your online or test mode is not released? It seems now only support offline mode, right?
Thank you very much and I'm looking forward to your response.

hardwares

I'm wondering how much time you train the model.
In fact ,i plan to train the model with two 2080ti but found it takes a lot of time to train(2h /epoch) and i found that it seems that the gpu utility is always low . is that a problem about the code?
Can we accelerate the train progress by preprocessing the data?

Some questions about loss function.

Thank you for sharing the great work! I have a confusion about the second stage loss function.

In "def goals_2D_per_example_calc_loss()", there has an nll_loss which caculated by "scores" and "[mapping[i]['goals_2D_labels']]", the "scores" is about dense goals, but the "[mapping[i]['goals_2D_labels']]" seems index from sparse goals. Am I misunderstanding? Why does this work?

Meaning of other_params

Hi,

Thank you for sharing a great work.
It would be great if you could provide some documentation about the meaning of the parameters, especially other_params, such as laneGCN-4.

Thank you!

Doubts in the evaluation and optimization section

Hello, I have a question to interrupt you. I reproduced your results experimentally, using a train-before-evaluation approach. But I found that the results after using do_eval (FDE 1.0513439117392331, MR 0.09578942034860154) are far better than when training（FDE: 3.1969465177600322 MR(2m,4m,6m): (0.49478493944897106, 0.23618785871750297, 0.1306969923570714).I found that running do_eval also running recover --model, is the optimization model? If it is a model, the new model generated is not seen.

And the relevant part of the test doesn't seem to be tested after optimization. Evaluate whether the optimization and testing parts are run together. (I just reproduced the results, I didn't study your code, please forgive me if I misunderstood) If you can answer my question, I will be very grateful

Training time on Waymo

Hi,

Thanks for your awesome work.

How long did you train DenseTNT on the Waymo dataset? And how many GPUs (type) did you use?

A problem about the experiments.

Thank you for sharing the great work! I have a confusion about the experiments:

The running results with this code seem far from the result in the paper. All params are default, after 16 training epochs:
loss=4.445
FDE: 3.1506308334590956
MR(2m,4m,6m): (0.4871615584819174, 0.2329565318727421, 0.13079283688769763)

What could be the problem?

The program is stuck

The progress bar will not be updated after arriving here
What could be the problem?

eval_batch_size

When use a small eval_batch_size, the eval results will be bad, because global_graph() use the max length in a batch to pad zero in utils.merge_tensors(). Change this 'merge_tensors' to use a fixed length, and then use different eval_batch_size will get the same eval result.

The results predicted by the set predictor model are not good for val data

Hung up at random epochs when training

Thanks so much for the sharing and congratulations on your great work! I am looking for your suggestions on what to do with an issue I encountered recently. Thank you really much for your attention!

When I was running the training program, I found that sometimes the program would be hung up at random epochs (Not the 1st epoch, and not at the end of an epoch). When it was hung up, the GPU usage is 0% while the GPU memory remains on the same level, far from being full. For the CPU part, the memory is far from being full and the CPU usage is 0%. The program just stopped outputing, and the the training was not continuing. The only way to exit it is to stop it, and the traceback showed that it stopped at:
while not spawn_context.join():
On 'top', the program sometimes show the status of 'Sleeping' after being hung up. I encountered the problem on different settings (Windows10+WSL+Docker(Ubuntu)+Nvidia RTX 3080; Ubuntu 20.04.4+Nvidia RTX A6000). I am looking forward to your suggestions on this issue.

Thanks again for your kind help and I am looking forward to learning more from you!

	for index_polygon, polygon in enumerate(polygons):
	for i, point in enumerate(polygon):
	hash = get_hash(point)
	if hash not in visit:
	visit[hash] = True
	points.append(point)

	if 'subdivide' in args.other_params:
	subdivide_points = get_subdivide_points(polygon)
	points.extend(subdivide_points)
	subdivide_points = get_subdivide_points(polygon, include_self=True)

	mapping['goals_2D'] = np.array(points)

tsinghua-mars-lab / densetnt Goto Github PK

densetnt's People

Contributors

Stargazers

Watchers

Forkers

densetnt's Issues

OUTPUT_DIR=models.densetnt.1; python src/run.py --argoverse --future_frame_num 30 --do_eval --data_dir ./train/data --output_dir ${OUTPUT_DIR} --hidden_size 128 --train_batch_size 64 --use_map --core_num 16 --use_centerline --eval_params optimization MRminFDE cnt_sample=9 opti_time=0.1

Recommend Projects

Recommend Topics

Recommend Org

OUTPUT_DIR=models.densetnt.1;
python src/run.py --argoverse --future_frame_num 30
--do_eval --data_dir ./train/data --output_dir ${OUTPUT_DIR}
--hidden_size 128 --train_batch_size 64 --use_map
--core_num 16 --use_centerline
--eval_params optimization MRminFDE cnt_sample=9 opti_time=0.1