huang-xx / stgat Goto Github PK

STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction

Python 100.00%

stgat's Introduction

STGAT

STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction

Correction

Our statement about Average Displacement Error (ADE) in the paper is wrong, and it should be RMSE or L2 distance (as in SocialAttention and SocialGan).

Requirements

Python 3
PyTorch (1.2)
Matplotlib

Datasets

All the data comes from the SGAN model without any further processing.

How to Run

First cd STGAT
To train the model run python train.py (see the code to understand all the arguments that can be given to the command)
To evalutae the model run python evaluate_model.py
Using the default parameters in the code, you can get most of the numerical results presented in the paper. But a reasonable attention visualization may require trained for a longer time and tuned some parameters. For example, for the zara1 dataset and pred_len is 8 time-steps,, you can set num_epochs to 600 (line 36 in train.py), and the learning rate in step3 to 1e-4 (line 180 in train.py).
The attachment folder contains the code that produces the attention figures presented in the paper
Check out the issue of this repo to find out how to get better results on the ETH dataset.

Acknowledgments

All data and part of the code comes from the SGAN model. If you find this code useful in your research then please also cite their paper.

If you have any questions, please contact [email protected], and if you find this repository useful for your research, please cite the following paper:

@InProceedings{Huang_2019_ICCV,
author = {Huang, Yingfan and Bi, Huikun and Li, Zhaoxin and Mao, Tianlu and Wang, Zhaoqi},
title = {STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}

stgat's People

Contributors

Stargazers

Watchers

stgat's Issues

training and evalaution

Hi, thanks again for sharing your code. My question is about the pretrained model stored in the folder after we train the model and if we can evaluate different datasets (other than the one we trained last time) later. Or there is only one best model that keeps the latest model trained only? For example, assume we first train our model for dataset Zara1 and then train our model for dataset ETH, can we evaluate our model on Zara1. Or the pretrained model can be used for ETH only? That is the pretrained model contains the model it is trained last time, so we can only evaluate ETH, and if we want to evaluate Zara1, we should train the model in Zara1 again? Thanks!

Cannot Reproduce Same Results on UNIV Dataset

Hi, I used the default configuration in your code to train and test on UNIV dataset but cannot obtain your results mentioned in your paper (0.52/1.10). I can only get ~ 0.56/1.20, which are close to the reproduced results in CausalHTP (0.57/1.22). Can you provide your checkpoints and hyper-parameters on UNIV dataset? Any help will be appreciated!

Some Questions about the paper

Hi, I just read your paper. One thing making me confused is the description of Graph Attention Layer.
In the paper it states that the input of this layer is h={h_1,h_2,h_3,...h_N}. Should this be {m_1,m_2,...,m_N} ? I.e., the hidden states of the M-LSTM ?

The code of draw_attention.py in attachment can not work

Hi,
I just try to visual attention, but the draw_attention.py in attachment is incomplete, so is it possible to share the whole code?
Thank you in advance.

issue on evaluate.py

Hi, thank u for sharing your code. I have trained the model.but i can't run the evaluate.py. I meet this
error : cublas runtime error : library not initialized at /opt/conda/conda-bld/pytorch_1556653183467/work/aten/src/THC/THCGeneral.cpp:228. Can you help me? Thank you.

Initializing Training and Validation sets takes too long on own Dataset

Hi ! I am trying to implement this data on a data of my own for trajectory prediction. I have used the format: frame id- person id - x -y as well.

It's taking around 24 hours to initialize the training dataset of around 5000 samples of 40 frames each. I understand that it would normally take longer since my data is considerably bigger, but is there a way I can speed up the initialization process ? It takes more than a whole day for me to obtain the FDE and ADE results. Should I try transforming the data to numpy arrays or pandas dataframes?

issues on 'teacher force' implementation

Hi, thank u for sharing your code. I think your implementation is rather elegant, but there're still two issues on 'teacher force' part confusing me.

In models.py line 260, 'output' is defined as:
output = obs_traj_rel[-1]
Considering only on condition that 'training_step==3' the line 260 would be scaned, with 'obs_traj_rel' including coordinates in terms of both observation steps(length=8) and prediction steps(length=12), I couldn't comprehend why using the last coordinate of the prediction time steps as the input of the first lstm decoder cell(when 'teacher_force==False').
In models.py line 262, 'input_t' in each iteration is defined as:
input_t=obs_traj_rel[-self.pred_len :].chunk(
obs_traj_rel[-self.pred_len :].size(0), dim=0)[i]
Therefore, the value of 'input_t' is actually the ground truth of the i-th time step in prediction phase. Furthermore, when 'teacher_force==True', it also serves as the input of the i-th time step. However, according to Deep Learning by Ian Goodfellow, the input of the i-th rnn unit shoud be the lable of the (i-1)-th time step when teacher forcing is activated. Thus the implementation in this part is rather like an autoencoder than teacher-forcing training.

Could you please offer some explanations on these questions? Thx a lot ;)

If I regard pedestrians as a point, can I still use your model?

Hi!
I want to know if your work run well from first-person perspective?

running the code and pretrained model

Hi, thanks for the great work. I have just started running the code and wondering if there is a pretrained model so I can just run the evaluation. Also, when I run train.py I have got the error : File "train.py", line 196
f"./checkpoint/checkpoint{epoch}.pth.tar",
^
SyntaxError: invalid syntax

Should I set any parameter or argument before running train.py?

Thanks!

Not able to reproduce the numbers on ETH dataset acc to params given in issue#1

Given the hyperparameter settings for the 8-timestep and 12-timestep models in this issue, I don't get the exact numbers in the paper. Specifically these are the test set results I get,

Forecasting Numbers	ADEmine/ ADEpaper	FDEmine/ FDEpaper	EPOCHmine/ EPOCHpaper
pred len 8, eth	0.57/0.56	1.08/1.10	356/256
pred len 12, eth	0.74/0.65	1.18/1.12	62/54

These numbers, especially for the 12-step forecaster are very off. I have tried training for longer but the errors don't improve after a point. Is the evaluation protocol in the evaluation script not correct? If not, can you verify what the val ADE and FDE values were during training at the epochs where the test set numbers match in the paper? Nevertheless, it will be helpful if you are able to release the pretrained models for all the five datasets reported in the paper.

Thanks.

Can't find an embbeding layer of trajectory input.

Hello, thank you for sharing your code. I have two questions.
I just found that the input size of traj_lstm (M-LSTM in the paper) is 2, which means there is no embedding, and also can't find the embedding layer in your code. Can you explain it?

reproduce the ADE/FDE for eth and zara1

Hi,

Thank you for providing the code.
I am trying to train the models for eth and zara1 with prediction length as 12, but I can't achieve (ade/fde) reported in the paper. (eth: 0.65/1.12 and zara1: 0.34/0.69.)
Can you share the parameters for these two datasets?
Thank you.

tar: Error opening archive: Unrecognized archive format

tar: Error opening archive: Unrecognized archive format，How to solve this problem？

Cannot find the embedding operation before M-LSTM which is described in the paper.

I don't find any embedding layer or operation in the model.py for preprocessing the raw input data. Did I mis-understand the embedding function described in the paper? Could you give some tips?

N in model STGAT-kV-N

Where is the N in model STGAT-kV-N embodied in the code

Output of different headsd in multi head attention is not averaged

In the adaptation of BatchMultiHeadAttention from xptree's implementation, the averaging over multiple heads is not implemented as in the original repository. That makes the following code which squeezes the second dimension not possible unless and until only single head is used.

x, attn = gat_layer(x)
if i + 1 == self.n_layer:
x = x.squeeze(dim=1)
else:
x = F.elu(x.transpose(1, 2).contiguous().view(bs, n, -1))
x = F.dropout(x, self.dropout, training=self.training)

Can you please explain if I am missing something here.

@huang-xx Thank you for the code along with the paper.

Question about metrics

Hi ! I just have a small confusion regarding the metric ADE.

If I trained one datasets containing of the shape (200 000, 40, 5,2) where I have 40 frames per video, with ach video containing 5 walking people. I observe 20 frames and try to predict the next 20 frames for these 5 persons. I get ADE = 11 and FDE = 15.

Is my Average Displacement Error calculated per frame per person ? Or should I divide by the number of frames (or persons ?)

Training loss in training step1&2

Hi! Thanks for sharing your work.
I don't understand the design that you use observe data and groudtruth to train lstm in traing_step1&2, is it like an autoencoder? By the way, the loss mark is at the predict phase.
Thanks for your help.

can not reproduce the results in the paper [ZARA2 ADE:0.31 / FDE:0.68 ]

2020-05-30 15:00:40,279:INFO: * ADE 0.606 FDE 1.352
2020-05-30 15:00:42,856:INFO: Epoch: [398][ 0/33] Loss 1.928555 (1.928555)
2020-05-30 15:01:08,946:INFO: Epoch: [398][10/33] Loss 0.362595 (1.410262)
2020-05-30 15:01:35,556:INFO: Epoch: [398][20/33] Loss 0.794038 (1.158781)
2020-05-30 15:02:01,813:INFO: Epoch: [398][30/33] Loss 0.603343 (1.070375)
2020-05-30 15:02:07,282:INFO: Test: [ 0/15] ADE 0.763601 (0.763601) FDE 1.685743 (1.685743)
2020-05-30 15:02:07,791:INFO: Test: [10/15] ADE 0.510709 (0.547694) FDE 1.102326 (1.213096)
2020-05-30 15:02:07,981:INFO: * ADE 0.541 FDE 1.198
2020-05-30 15:02:10,600:INFO: Epoch: [399][ 0/33] Loss 1.986611 (1.986611)
2020-05-30 15:02:36,563:INFO: Epoch: [399][10/33] Loss 0.361267 (1.415945)
2020-05-30 15:03:02,565:INFO: Epoch: [399][20/33] Loss 0.800303 (1.171037)
2020-05-30 15:03:28,411:INFO: Epoch: [399][30/33] Loss 0.601634 (1.086858)
2020-05-30 15:03:33,871:INFO: Test: [ 0/15] ADE 0.757095 (0.757095) FDE 1.652979 (1.652979)
2020-05-30 15:03:34,376:INFO: Test: [10/15] ADE 0.573862 (0.596124) FDE 1.194102 (1.289546)
2020-05-30 15:03:34,568:INFO: * ADE 0.598 FDE 1.294
2020-05-30 15:03:37,154:INFO: Epoch: [400][ 0/33] Loss 2.159152 (2.159152)
2020-05-30 15:04:03,336:INFO: Epoch: [400][10/33] Loss 0.350839 (1.431882)
2020-05-30 15:04:29,392:INFO: Epoch: [400][20/33] Loss 0.836604 (1.184262)
2020-05-30 15:04:55,025:INFO: Epoch: [400][30/33] Loss 0.775246 (1.098319)
2020-05-30 15:05:00,539:INFO: Test: [ 0/15] ADE 0.790602 (0.790602) FDE 1.762415 (1.762415)
2020-05-30 15:05:01,044:INFO: Test: [10/15] ADE 0.562749 (0.616284) FDE 1.221064 (1.385227)
2020-05-30 15:05:01,239:INFO: * ADE 0.618 FDE 1.392

training_step in train.py and models.py

Hi thank you for uploading the algorithm along with the paper. I have just started working on it. I noticed that when training_step = 1 you read the output from traj_lstm and for 2 and 3 you have different approaches can you please explain about it ? as in what exactly you intend to do there ? because it keeps on changing per the number of epochs.

Thank you

Training on the ETH dataset.

For the case where the prediction length is 8 time-steps, the result in the paper will appear at the 256th epoch. No need to change parameters.

For the case where the prediction length is 12 time-steps, do the following things:

Change "lr" to 1e-5 in line 148 and 149 in train.py;
Change value 150 to 30 in line 173 in train.py;
Comment out the 175th and 176th line of code in train.py
Change value 250 to 30 in line 178 in train.py.
Change value 5e-3 to 1e-4 in line 180 in train.py.
Set the 'seed' to 86 in line 20 in evaluate_model.py.

You will get the result at epoch 54.

Question about short test trajectories

Hi, my question is about trajectories with less than 20 frames (so we cannot observe 8 frames and predict 12 frames since the test trajectory should have 20 frames at least.). I observe all test dataset more or less contain these trajectories. How did you run evaluation on theses short trajectories? Did you remove them or adjust the time horizon for theses trajectories?