I'm currently reproducing the paper results for the UCI dataset, using the GRU archite

Paper results show different forecast horizon than reported about dts HOT 8 OPEN

tiagoyukio12 commented on May 27, 2024

Paper results show different forecast horizon than reported

from dts.

Comments (8)

tiagoyukio12 commented on May 27, 2024

For clarification, the config/recurrent.yaml parameter output_sequence_length is used in the dts/examples/recurrent.py main function, in line 62:

X_train, y_train = get_rnn_inputs(train,
                                  window_size=params['input_sequence_length'],
                                  horizon=params['output_sequence_length'],
                                  shuffle=True,
                                  multivariate_output=True)

From a snippet of the get_rnn_inputs docstring (implemented in dts/utils/split.py):

"""
:param horizon: int
    Forecasting horizon, the number of future steps that have to be forecasted
"""

On line 119 of dts/utils/split.py, we can see the targets list is made of slices of size horizon:

targets.append(
                X[i + window_size: i + window_size + horizon])

So there seems to be no correcting factor to convert the desired 24h forecast horizon reported in the paper to a horizon of 96 15-min steps.

I believe an erratum should be issued for the paper, clarifying the UCI dataset results use a 6h forecast horizon instead of 24h.

Your paper was extremely thorough and comprehensive to read, and because of this I am using it as a benchmark, and this Github issue is important for the accuracy and integrity of my research.

I understand that you must be busy, but I would appreciate any assistance you could provide. Thank you for your time and effort sharing your source code.

from dts.

albertogaspar commented on May 27, 2024

Hi, I am sorry for the late reply. The UCI dataset results are correct and use a 24h forecast horizon (output_sequence_length: 96) using input_sequence_length: 384.
I see that in your config file you are using a single epoch, which makes the learning process too short. In the paper I used 200.

To convince that the results in the paper are not obtained using a 6h forecast horizon instead of 24h I runned a simple experiment: I used your settings with a (slightly) higher number of epochs (which is of course not optimal):

train: False
dataset: 'uci'
exogenous: False
epochs: 10
batch_size: 1024
input_sequence_length: 96
output_sequence_length:  24
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True

obtaining the following results (showed between brackets are the results presented in the paper for GRU-MIMO with a 24h forecast horizon):

RMSE: 0.72 (0.75 ± 0.0)
MAE: 0.514 (0.52± 0.0)
NRMSE: 9.47 (9.83 ± 0.03)
R2: 0.333 (0.279 ± 0.004)

As you can see (and I encourage you to try yourself) , the results obtained for a 6h horizon are better then what is presented in the paper even if the training of the model has been cut short.

from dts.

tiagoyukio12 commented on May 27, 2024

Thank you for your response and for clarifying the forecast horizon discrepancy. I appreciate your efforts in running the experiment with my configuration settings and providing the results.

However, despite your explanation and the additional experiment, I'm still unable to reproduce the exact results mentioned in the paper for the UCI dataset. Even with a training duration of 200 epochs, the obtained results differ slightly from the reported values.

Here are the results I obtained with the updated configuration:

RMSE: 0.67 ± 0.01
MAE: 0.51 ± 0.01

{'mse': 0.43373233, 'mae': 0.49384913, 'nrmse_a': 1.3187281, 'nrmse_b': 745.49725, 'nrmsd': 0.1212474, 'r2': 0.3009019, 'smape': 35.21651, 'mape': 72.9558}
{'mse': 0.45289233, 'mae': 0.51428014, 'nrmse_a': 1.295853, 'nrmse_b': 761.7854, 'nrmsd': 0.123896495, 'r2': 0.27001953, 'smape': 37.224, 'mape': 80.34695}
{'mse': 0.4351429, 'mae': 0.49716938, 'nrmse_a': 1.2940199, 'nrmse_b': 746.70856, 'nrmsd': 0.121444404, 'r2': 0.29862827, 'smape': 35.53708, 'mape': 71.81708}
{'mse': 0.49264866, 'mae': 0.54142165, 'nrmse_a': 1.2457576, 'nrmse_b': 794.51807, 'nrmsd': 0.12922013, 'r2': 0.20593947, 'smape': 38.69682, 'mape': 86.43704}
{'mse': 0.43954045, 'mae': 0.50468266, 'nrmse_a': 1.2942526, 'nrmse_b': 750.4721, 'nrmsd': 0.122056514, 'r2': 0.29154032, 'smape': 36.274437, 'mape': 76.50844}
{'mse': 0.4387224, 'mae': 0.5003349, 'nrmse_a': 1.3664513, 'nrmse_b': 749.7735, 'nrmsd': 0.12194287, 'r2': 0.29285884, 'smape': 35.66649, 'mape': 73.54895}
{'mse': 0.45094696, 'mae': 0.5106664, 'nrmse_a': 1.3728217, 'nrmse_b': 760.1476, 'nrmsd': 0.123630114, 'r2': 0.27315503, 'smape': 36.90767, 'mape': 79.35831}
{'mse': 0.46145782, 'mae': 0.52123094, 'nrmse_a': 1.2773947, 'nrmse_b': 768.9555, 'nrmsd': 0.12506263, 'r2': 0.25621337, 'smape': 37.673584, 'mape': 81.25483}
{'mse': 0.4475309, 'mae': 0.5054144, 'nrmse_a': 1.2984899, 'nrmse_b': 757.2628, 'nrmsd': 0.12316096, 'r2': 0.27866113, 'smape': 36.24459, 'mape': 75.25322}
{'mse': 0.4330459, 'mae': 0.5002167, 'nrmse_a': 1.2826424, 'nrmse_b': 744.9071, 'nrmsd': 0.12115142, 'r2': 0.3020084, 'smape': 36.32402, 'mape': 75.93276}

These results were obtained after running the experiment 10 times using this configuration:

train: False
dataset: 'uci'
exogenous: False
epochs: 200
batch_size: 1024
input_sequence_length: 384
output_sequence_length:  96
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True

I obtained most of these hyperparameters from Table 4 of your paper for GRU-MIMO.

I couldn't find the batch_size and learning_rate in the paper, thus i left them with the default values found in this repo.

Can you confirm if the paper really used batch_size = 1024 and learning_rate = 0.001 to generate the results?

from dts.

albertogaspar commented on May 27, 2024

Yes the batch size and the learning rate are correct. Notice that, as written in the readme file the code has changed slightly before being published so some differences may be observed.

from dts.

tiagoyukio12 commented on May 27, 2024

Thank you for your previous response. I have thoroughly reviewed the code and could not find any apparent errors or issues that could explain the deviations in the obtained results. I would appreciate it if you could provide more information about the changes made to the code before publication, as this would help me understand the potential factors contributing to the differences.
Alternatively, if possible, could you share with me the latest version of the code?
Thank you for your assistance, and I look forward to your response.

from dts.

albertogaspar commented on May 27, 2024

The code in the repo is the latest version. The code for the experiments in the paper was refactored and then published here. This is why some differences can be observed.

from dts.

tiagoyukio12 commented on May 27, 2024

I would really appreciate it if you could share the original code used in the experiments, so I can understand the observed differences.

from dts.

albertogaspar commented on May 27, 2024

Unfortunately I only have the refactored code. I am really sorry for that.

from dts.

Paper results show different forecast horizon than reported about dts HOT 8 OPEN

Comments (8)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent