Git Product home page Git Product logo

pytorch-forecasting's Introduction

PyTorch Forecasting

PyPI Version Conda Version Docs Status Linter Status Build Status Code Coverage

Documentation | Tutorials | Release Notes

PyTorch Forecasting is a PyTorch-based package for forecasting time series with state-of-the-art network architectures. It provides a high-level API for training networks on pandas data frames and leverages PyTorch Lightning for scalable training on (multiple) GPUs, CPUs and for automatic logging.


Our article on Towards Data Science introduces the package and provides background information.

PyTorch Forecasting aims to ease state-of-the-art timeseries forecasting with neural networks for real-world cases and research alike. The goal is to provide a high-level API with maximum flexibility for professionals and reasonable defaults for beginners. Specifically, the package provides

  • A timeseries dataset class which abstracts handling variable transformations, missing values, randomized subsampling, multiple history lengths, etc.
  • A base model class which provides basic training of timeseries models along with logging in tensorboard and generic visualizations such actual vs predictions and dependency plots
  • Multiple neural network architectures for timeseries forecasting that have been enhanced for real-world deployment and come with in-built interpretation capabilities
  • Multi-horizon timeseries metrics
  • Hyperparameter tuning with optuna

The package is built on pytorch-lightning to allow training on CPUs, single and multiple GPUs out-of-the-box.

Installation

If you are working on windows, you need to first install PyTorch with

pip install torch -f https://download.pytorch.org/whl/torch_stable.html.

Otherwise, you can proceed with

pip install pytorch-forecasting

Alternatively, you can install the package via conda

conda install pytorch-forecasting pytorch -c pytorch>=1.7 -c conda-forge

PyTorch Forecasting is now installed from the conda-forge channel while PyTorch is install from the pytorch channel.

To use the MQF2 loss (multivariate quantile loss), also install pip install pytorch-forecasting[mqf2]

Documentation

Visit https://pytorch-forecasting.readthedocs.io to read the documentation with detailed tutorials.

Available models

The documentation provides a comparison of available models.

To implement new models or other custom components, see the How to implement new models tutorial. It covers basic as well as advanced architectures.

Usage example

Networks can be trained with the PyTorch Lighning Trainer on pandas Dataframes which are first converted to a TimeSeriesDataSet.

# imports for training
import lightning.pytorch as pl
from lightning.pytorch.loggers import TensorBoardLogger
from lightning.pytorch.callbacks import EarlyStopping, LearningRateMonitor
# import dataset, network to train and metric to optimize
from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer, QuantileLoss
from lightning.pytorch.tuner import Tuner

# load data: this is pandas dataframe with at least a column for
# * the target (what you want to predict)
# * the timeseries ID (which should be a unique string to identify each timeseries)
# * the time of the observation (which should be a monotonically increasing integer)
data = ...

# define the dataset, i.e. add metadata to pandas dataframe for the model to understand it
max_encoder_length = 36
max_prediction_length = 6
training_cutoff = "YYYY-MM-DD"  # day for cutoff

training = TimeSeriesDataSet(
    data[lambda x: x.date <= training_cutoff],
    time_idx= ...,  # column name of time of observation
    target= ...,  # column name of target to predict
    group_ids=[ ... ],  # column name(s) for timeseries IDs
    max_encoder_length=max_encoder_length,  # how much history to use
    max_prediction_length=max_prediction_length,  # how far to predict into future
    # covariates static for a timeseries ID
    static_categoricals=[ ... ],
    static_reals=[ ... ],
    # covariates known and unknown in the future to inform prediction
    time_varying_known_categoricals=[ ... ],
    time_varying_known_reals=[ ... ],
    time_varying_unknown_categoricals=[ ... ],
    time_varying_unknown_reals=[ ... ],
)

# create validation dataset using the same normalization techniques as for the training dataset
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training.index.time.max() + 1, stop_randomization=True)

# convert datasets to dataloaders for training
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=2)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=2)

# create PyTorch Lighning Trainer with early stopping
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=1, verbose=False, mode="min")
lr_logger = LearningRateMonitor()
trainer = pl.Trainer(
    max_epochs=100,
    accelerator="auto",  # run on CPU, if on multiple GPUs, use strategy="ddp"
    gradient_clip_val=0.1,
    limit_train_batches=30,  # 30 batches per epoch
    callbacks=[lr_logger, early_stop_callback],
    logger=TensorBoardLogger("lightning_logs")
)

# define network to train - the architecture is mostly inferred from the dataset, so that only a few hyperparameters have to be set by the user
tft = TemporalFusionTransformer.from_dataset(
    # dataset
    training,
    # architecture hyperparameters
    hidden_size=32,
    attention_head_size=1,
    dropout=0.1,
    hidden_continuous_size=16,
    # loss metric to optimize
    loss=QuantileLoss(),
    # logging frequency
    log_interval=2,
    # optimizer parameters
    learning_rate=0.03,
    reduce_on_plateau_patience=4
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

# find the optimal learning rate
res = Tuner(trainer).lr_find(
    tft, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader, early_stop_threshold=1000.0, max_lr=0.3,
)
# and plot the result - always visually confirm that the suggested learning rate makes sense
print(f"suggested learning rate: {res.suggestion()}")
fig = res.plot(show=True, suggest=True)
fig.show()

# fit the model on the data - redefine the model with the correct learning rate if necessary
trainer.fit(
    tft, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader,
)

pytorch-forecasting's People

Contributors

abcnishant007 avatar bendavidsteel avatar borda avatar chefpony avatar chenr86 avatar christy avatar crarojasca avatar dehoyosb avatar dependabot-preview[bot] avatar dependabot[bot] avatar eavae avatar eliacus avatar github-actions[bot] avatar jakef-bitweave avatar jakeforsey avatar jdb78 avatar justinneumann avatar kigawas avatar lukemerrick avatar mikcnt avatar nsarang avatar pre-commit-ci[bot] avatar rustyconover avatar seon82 avatar snumumrik avatar stllfe avatar tklerx avatar tmct avatar vakker avatar veds12 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-forecasting's Issues

Possible small fix for example ipynb

In https://github.com/jdb78/pytorch-forecasting/blob/master/docs/source/tutorials/stallion.ipynb cell 9, should learning rate not be set to the result of the study in cell 8 or am I missing something?

# configure network and trainer
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateLogger()  # log the learning rate
logger = TensorBoardLogger("lightning_logs")  # logging results to a tensorboard

trainer = pl.Trainer(
    max_epochs=30,
    gpus=0,
    weights_summary="top",
    gradient_clip_val=0.1,
    early_stop_callback=early_stop_callback,
    limit_train_batches=30,  # coment in for training, running valiation every 30 batches
    # fast_dev_run=True,  # comment in to check that networkor dataset has no serious bugs
    callbacks=[lr_logger],
    logger=logger,
)


tft = TemporalFusionTransformer.from_dataset(
    training,
    learning_rate=res.suggestion(), #<<<<<<<<<<<<<<<<<<<<<<<<<<<< this
    hidden_size=16,
    attention_head_size=1,
    dropout=0.1,
    hidden_continuous_size=8,
    output_size=7,  # 7 quantiles by default
    loss=QuantileLoss(),
    log_interval=30,  # uncomment for learning rate finder and otherwise, e.g. to 10 for logging every 10 batches
    reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

Error saving checkpoint

Saving latest checkpoint..

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-d89dc578d7fe> in <module>
      1 # fit network
      2 trainer.fit(
----> 3     tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader,
      4 )

~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
     46             if entering is not None:
     47                 self.state = entering
---> 48             result = fn(self, *args, **kwargs)
     49 
     50             # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted

~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
   1082             self.accelerator_backend = CPUBackend(self)
   1083             self.accelerator_backend.setup(model)
-> 1084             results = self.accelerator_backend.train(model)
   1085 
   1086         # on fit end callback

~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/accelerators/cpu_backend.py in train(self, model)
     37 
     38     def train(self, model):
---> 39         results = self.trainer.run_pretrain_routine(model)
     40         return results

~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
   1237 
   1238         # CORE TRAINING LOOP
-> 1239         self.train()
   1240 
   1241     def _run_sanity_check(self, ref_model, model):

~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in train(self)
    407                 if self.should_stop:
    408                     if (met_min_epochs and met_min_steps):
--> 409                         self.run_training_teardown()
    410                         return
    411                     else:

~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_teardown(self)
   1143             # model hooks
   1144             if self.is_function_implemented('on_train_end'):
-> 1145                 self.get_model().on_train_end()
   1146 
   1147         if self.logger is not None:

~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in on_train_end(self)
    583     def on_train_end(self):
    584         if self.log_interval(train=True) > 0:
--> 585             self._log_embeddings()
    586 
    587     def step(self, x, y, batch_idx, label="train"):

~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in _log_embeddings(self)
    868             labels = self.hparams.embedding_labels[name]
    869             self.logger.experiment.add_embedding(
--> 870                 emb.weight.data.cpu(), metadata=labels, tag=name, global_step=self.global_step
    871             )

~/anaconda3/envs/TF2/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py in add_embedding(self, mat, metadata, label_img, global_step, tag, metadata_header)
    786         save_path = os.path.join(self._get_file_writer().get_logdir(), subdir)
    787 
--> 788         fs = tf.io.gfile.get_filesystem(save_path)
    789         if fs.exists(save_path):
    790             if fs.isdir(save_path):

AttributeError: module 'tensorflow._api.v2.io.gfile' has no attribute 'get_filesystem'

Categorical encoding bug

  • PyTorch-Forecasting version: 0.5.2
  • PyTorch version: 1.6
  • Python version: 3.8.X
  • Operating System: Ubuntu 20.04

Looks like I've found a bug/unexpected behavior.
I'm making a prediction on a dataset with time-based features marked as 'categorical', namely month alongside day and year.
The start of the dataset is 2020-01-01, and the end is 2020-08-30. The date is parsed into 'year, 'month', and 'day' columns for each row.
Depending on the last dataset record's date(if I cut it for some reason), pytorch-forecasting throws an error that looks like:

Traceback:
File "XXX/venv/lib/python3.8/site-packages/pytorch_forecasting/data/encoders.py", line 105, in
encoded = [self.classes_[v] for v in y]
KeyError: '8'

I've made some experiments/stack traces and this is always the case when you, for instance, have this month(8, August) in the full set but don't have it in your training set - for the reason that your max_prediction_length is bigger than 31 (day) or you have a combination of the last date and max_pred_length like 2020-08-10 and 20, so the last date of training set will be ~2020-07-20 and it won't have '8' month inside.
In this case, going back to the code line provided in traceback, you have this value(8) in np.unique(y) (iterator), BUT in self.classes_ you don't.

Seems like self.classes_ is created based on the training set only, and when you try to invoke TimeSeriesDataSet.from_dataset(trainigset, fullset, .....) you get this error for any additional categorical values that might have appeared in the full dataset.

This logic makes it practically hard to be used on any type of date/time categorically encoded datasets.

Shouldn't any previously unseen categorical value be put into the special 'average' bin and treated as the average of all the known categories? As far as I remember, LightGBM exhibits this behavior for any new categorical values.

TypeError: 'str' object cannot be interpreted as an integer

  • PyTorch-Forecasting version: 0.6.0
  • PyTorch version: 1.7.0+cu101
  • Python version: 3.6.9
  • Google Colab

Expected behavior

I'm trying to create algorithmic trading system, In the very two months I have only 30%-40% win rate, It's not ML system. But I would like to predict possible target for any of found items with PyTorch and TFT model based on previous results.
My normalized data looks like this (I remove some feature column for simplicity, best_target column is my target).

Actual behavior

I got this and I could not find out the reason for the error.

TypeError                                 Traceback (most recent call last)
<ipython-input-10-329c6f2b973b> in <module>()
     16     add_target_scales=True,
     17     add_encoder_length=True,
---> 18     allow_missings=True
     19 )
     20 

1 frames
/usr/local/lib/python3.6/dist-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
    809                     f"First 10 removed groups: {list(missing_groups.iloc[:10].to_dict(orient='index').values())}",
    810                 ),
--> 811                 UserWarning,
    812             )
    813         assert len(df_index) > 0, "filters should not remove entries"

TypeError: 'str' object cannot be interpreted as an integer

Code to reproduce the problem

!pip install pytorch-lightning
!pip install pytorch-forecasting
import warnings
from pathlib import Path
import pandas as pd
import numpy as np
import torch
import copy


import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
from pytorch_lightning.loggers import TensorBoardLogger

from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer, Baseline
from pytorch_forecasting.data import GroupNormalizer

from pytorch_forecasting.metrics import PoissonLoss, QuantileLoss, SMAPE
from pytorch_forecasting.models.temporal_fusion_transformer.tuning import optimize_hyperparameters

data = pd.read_csv("http://www.sharecsv.com/dl/a08efd0677d449542fd98e2b390101a1/file.csv")
data["directory"] = data.directory.astype('str').astype('category')
data["strategy"] = data.strategy.astype('str').astype('category')
data["order_type"] = data.order_type.astype('str').astype('category')
data["new_york"] = data.new_york.astype('str').astype('category')
data["london"] = data.london.astype('str').astype('category')
data["tokyo"] = data.tokyo.astype('str').astype('category')
data["sydney"] = data.sydney.astype('str').astype('category')
data["wellington"] = data.wellington.astype('str').astype('category')
data["singapore"] = data.singapore.astype('str').astype('category')
data["hong_kong"] = data.hong_kong.astype('str').astype('category')
data["shanghai"] = data.shanghai.astype('str').astype('category')
data["pair"] = data.pair.astype('str').astype('category')
data["currency1"] = data.currency1.astype('str').astype('category')
data["currency2"] = data.currency2.astype('str').astype('category')
data["date"] = pd.to_datetime(data.date, utc=True)

max_prediction_length = 6
max_encoder_length = 24
training_cutoff = data["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="best_target",
    group_ids=["time_idx"],
    min_encoder_length=max_encoder_length // 2, 
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    time_varying_unknown_categoricals=[],
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
    allow_missings=True
)
#I got stuck at this point and did not get to the point of using TFT.

validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)

batch_size = 128 
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)

I have some questions:

  1. Is this problem fit to TFT model?
  2. My system scan the market periodically and returns the result, I increase time_idx for every new batch results, Is it right approach?
  3. What is the reason for that error?

Prediction on unseen data

Hi,
Firstly - thank you for your time, work and commitment that went into this package. All is good stuff. Yet one on thing I'm kinda struggling is how to check the predictions on the data that is not visible in the trainer class (from the documentation). I guess I should append it with the original data - but do you have any good practices that you can share ?

Make use of GluonTS's features

Overall Comments

First of all, I believe that this is a great initiative, especially that we have the TFT available + the Optuna optimisation! Great!

Descritpion

Amazon's GluonTS has some features automatically addded to the models, most notably, the time features

  • it is adding day-of-the-week, month-of-the-year and many more to the model, see here

But it also has some other features available, see here.

So my question is: are you planning to also add these to the Temporal Fusion Transformer? They would be great additions.

Validation Data is not generated properly for my dataset

Hi, Really appreciate your work on the TFT.

I am trying to use my own dataset in the code but there seems to be a bug due to which the dataset is not being loaded properly for validation.
The train dataloader is good. but the validation dataloader only has one batch and also validation(TimeSeriesDataSet) has only 1 entry.

Below is my complete code

data = load_csv()
data['date']= pd.to_datetime(data['date'])
data.reset_index(inplace=True, drop=True)
data.reset_index(inplace=True)
data.rename(columns={'index':'time_idx'}, inplace=True) # I use index as time_idx since my data is of minute frequency

validation_len = int(len(data) * 0.1)
training_cutoff = int(len(data)) - validation_len

max_encode_length = 36
max_prediction_length = 6

print('Len of training data is : ',len(data[:training_cutoff]))
print('Len of val data is : ',len(data[training_cutoff:]))
training = TimeSeriesDataSet(
    data[:training_cutoff],
    time_idx="time_idx",
    target="T",
    group_ids=["Symbol"],
    max_encoder_length=max_encode_length,
    max_prediction_length=max_prediction_length,
    static_categoricals=["Symbol"],
    static_reals=[],
    time_varying_known_categoricals=[
        "hour_of_day",
        "day_of_week",
    ],
    time_varying_known_reals=[
        "time_idx",
    ],
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=["V1", "V2","V3", "T", "V4"],
    constant_fill_strategy={"T": 0},
    dropout_categoricals=[],
)
print('Max Prediction Index : ',training.index.time.max())
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training.index.time.max()+1)
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=1)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=1)

print(len(training), len(validation))
print(len(train_dataloader), len(val_dataloader))`

This is what gets printed by the code :

Len of training data is : 25920
Len of val data is : 2880
Min Prediction Index Value is 25919

25920 1
202 1

You can see that the training dataset is good and the batches are also okay but validation batch is also 1 and dataset length is also 1.

One more thing. if i use predict = False it generates validation data correctly but another bug arises due to that.
if i use predict = True, only 1 batch and 1 sequence is given
if i use predict_mode = true on the training dataset it also generates only 1 batch.

Here is a sample of my CSV
sample_data.csv.zip

Please Help

KeyError: 'module name can\'t contain "."'

  • PyTorch-Forecasting version: 0.6.1
  • PyTorch version: 1.7.0+cu101
  • Python version: 3.6.9
  • Operating System: Windows

Expected behavior

I executed code with the intention of creating the TFT model object from a TimeSeriesDataset.

The expected result was a TFT model object that I would proceed to evaluate.

Actual behavior

The result was the following: KeyError: 'module name can't contain "."'

I'm not sure what it has to do with, spent a while digging through the PyTorch source code but this issue is rooted deep.
Not really sure what module is being added, where I can see the names of the modules, and why a module name would have a period in it.

Code to reproduce the problem

# configure network and trainer
pl.seed_everything(42)
trainer = pl.Trainer(
    gpus=0,
    # clipping gradients is a hyperparameter and important to prevent divergance
    # of the gradient for recurrent neural networks
    gradient_clip_val=0.1,
)


tft = TemporalFusionTransformer.from_dataset(
    training,
    # not meaningful for finding the learning rate but otherwise very important
    learning_rate=0.03,
    hidden_size=16,  # most important hyperparameter apart from learning rate
    # number of attention heads. Set to up to 4 for large datasets
    attention_head_size=1,
    dropout=0.1,  # between 0.1 and 0.3 are good values
    hidden_continuous_size=8,  # set to <= hidden_size
    output_size=7,  # 7 quantiles by default
    loss=QuantileLoss(),
    # reduce learning rate if no improvement in validation loss after x epochs
    reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

Here is the condensed traceback:
image

Expanded traceback:
image
image

Any help is greatly appreciated.

question : Why does the error occur in the tutorial?

  • PyTorch-Forecasting version: 0.6.0
  • PyTorch version: 1.4.0
  • PyTorch-Lightning version: 1.0.6
  • Python version: 3.6.6
  • Operating System: Linux

Expected behavior

I executed notebook in web site https://pytorch-forecasting.readthedocs.io/en/latest/tutorials/stallion.html.

When fitting network, I get unexpected error.

Code to reproduce the problem

When I try to bellow code,

# fit network
trainer.fit(
    tft,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
)

bellow error messages was displayed.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-263e8be26564> in <module>
      3     tft,
      4     train_dataloader=train_dataloader,
----> 5     val_dataloaders=val_dataloader,
      6 )

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    442         self.call_hook('on_fit_start')
    443 
--> 444         results = self.accelerator_backend.train()
    445         self.accelerator_backend.teardown()
    446 

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in train(self)
     61 
     62         # train or test
---> 63         results = self.train_or_test()
     64         return results
     65 

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
     72             results = self.trainer.run_test()
     73         else:
---> 74             results = self.trainer.train()
     75         return results
     76 

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
    464 
    465     def train(self):
--> 466         self.run_sanity_check(self.get_model())
    467 
    468         self.checkpoint_connector.has_trained = False

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
    656 
    657             # run eval step
--> 658             _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
    659 
    660             # allow no returns from eval

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, test_mode, max_batches)
    576 
    577                 # lightning module methods
--> 578                 output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
    579                 output = self.evaluation_loop.evaluation_step_end(output)
    580 

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_step(self, test_mode, batch, batch_idx, dataloader_idx)
    169             output = self.trainer.accelerator_backend.test_step(args)
    170         else:
--> 171             output = self.trainer.accelerator_backend.validation_step(args)
    172 
    173         # track batch size for weighted average

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in validation_step(self, args)
     85                 output = self.__validation_step(args)
     86         else:
---> 87             output = self.__validation_step(args)
     88 
     89         return output

/opt/conda/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in __validation_step(self, args)
     93         batch = self.to_device(batch)
     94         args[0] = batch
---> 95         output = self.trainer.model.validation_step(*args)
     96         return output
     97 

/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/base_model.py in validation_step(self, batch, batch_idx)
    161     def validation_step(self, batch, batch_idx):
    162         x, y = batch
--> 163         log, _ = self.step(x, y, batch_idx, label="val")  # log loss
    164         self.log("val_loss", log["loss"], on_step=False, on_epoch=True, prog_bar=True)
    165         return log

/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in step(self, x, y, batch_idx, label)
    520         """
    521         # extract data and run model
--> 522         log, out = super().step(x, y, batch_idx, label=label)
    523         # calculate interpretations etc for latter logging
    524         if self.log_interval(label == "train") > 0:

/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/base_model.py in step(self, x, y, batch_idx, label, **kwargs)
    232         self._log_metrics(x, y, out, label=label)
    233         if self.log_interval(label == "train") > 0:
--> 234             self._log_prediction(x, out, batch_idx, label=label)
    235         log = {"loss": loss, "n_samples": x["decoder_lengths"].size(0)}
    236 

/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/base_model.py in _log_prediction(self, x, out, batch_idx, label)
    301                 log_indices = [0]
    302             for idx in log_indices:
--> 303                 fig = self.plot_prediction(x, out, idx=idx, add_loss_to_title=True)
    304                 tag = f"{label.capitalize()} prediction"
    305                 if label == "train":

/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in plot_prediction(self, x, out, idx, plot_attention, add_loss_to_title, show_future_observed, ax)
    669         # plot prediction as normal
    670         fig = super().plot_prediction(
--> 671             x, out, idx=idx, add_loss_to_title=add_loss_to_title, show_future_observed=show_future_observed, ax=ax
    672         )
    673 

/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/base_model.py in plot_prediction(self, x, out, idx, add_loss_to_title, show_future_observed, ax)
    389         for i in range(y_quantiles.shape[1] // 2):
    390             if len(x_pred) > 1:
--> 391                 ax.fill_between(x_pred, y_quantiles[:, i], y_quantiles[:, -i - 1], alpha=0.15, fc=pred_color)
    392             else:
    393                 quantiles = torch.tensor([[y_quantiles[0, i]], [y_quantiles[0, -i - 1]]])

/opt/conda/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1808                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1809                         RuntimeWarning, stacklevel=2)
-> 1810             return func(ax, *args, **kwargs)
   1811 
   1812         inner.__doc__ = _add_data_doc(inner.__doc__,

/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py in fill_between(self, x, y1, y2, where, interpolate, step, **kwargs)
   5116             polys.append(X)
   5117 
-> 5118         collection = mcoll.PolyCollection(polys, **kwargs)
   5119 
   5120         # now update the datalim and autoscale

/opt/conda/lib/python3.6/site-packages/matplotlib/collections.py in __init__(self, verts, sizes, closed, **kwargs)
    931         %(Collection)s
    932         """
--> 933         Collection.__init__(self, **kwargs)
    934         self.set_sizes(sizes)
    935         self.set_verts(verts, closed)

/opt/conda/lib/python3.6/site-packages/matplotlib/collections.py in __init__(self, edgecolors, facecolors, linewidths, linestyles, capstyle, joinstyle, antialiaseds, offsets, transOffset, norm, cmap, pickradius, hatch, urls, offset_position, zorder, **kwargs)
    164 
    165         self._path_effects = None
--> 166         self.update(kwargs)
    167         self._paths = None
    168 

/opt/conda/lib/python3.6/site-packages/matplotlib/artist.py in update(self, props)
    914 
    915         with cbook._setattr_cm(self, eventson=False):
--> 916             ret = [_update_property(self, k, v) for k, v in props.items()]
    917 
    918         if len(ret):

/opt/conda/lib/python3.6/site-packages/matplotlib/artist.py in <listcomp>(.0)
    914 
    915         with cbook._setattr_cm(self, eventson=False):
--> 916             ret = [_update_property(self, k, v) for k, v in props.items()]
    917 
    918         if len(ret):

/opt/conda/lib/python3.6/site-packages/matplotlib/artist.py in _update_property(self, k, v)
    910                 func = getattr(self, 'set_' + k, None)
    911                 if not callable(func):
--> 912                     raise AttributeError('Unknown property %s' % k)
    913                 return func(v)
    914 

AttributeError: Unknown property fc

I didn't change the notebook.

Please tell me cause of this error.

AttributeError: 'NoneType' object has no attribute 'item'

  • PyTorch-Forecasting version: 0.5.2
  • PyTorch version: 1.6.0
  • PyTorch lightning version: 1.0.4
  • Python version: 3.7
  • Operating System: Ubuntu 18.04.2

Expected behavior

In the stallion notebook, I executed the blockcode

# find optimal learning rate
res = trainer.tuner.lr_find(
    tft,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
    max_lr=10.0,
    min_lr=1e-6,
)

print(f"suggested learning rate: {res.suggestion()}")
fig = res.plot(show=True, suggest=True)
fig.show()

and I expect to see the learning rate as well as the plot like shown in the notebook.

Actual behavior

However, the result was

------------------------------------------------------------------------
AttributeError                         Traceback (most recent call last)
<ipython-input-8-a92b5627800b> in <module>
      5     val_dataloaders=val_dataloader,
      6     max_lr=10.0,
----> 7     min_lr=1e-6,
      8 )
      9 

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
    128             mode,
    129             early_stop_threshold,
--> 130             datamodule,
    131         )
    132 

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/tuner/lr_finder.py in lr_find(trainer, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
    173                 train_dataloader=train_dataloader,
    174                 val_dataloaders=val_dataloaders,
--> 175                 datamodule=datamodule)
    176 
    177     # Prompt if we stopped early

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    438         self.call_hook('on_fit_start')
    439 
--> 440         results = self.accelerator_backend.train()
    441         self.accelerator_backend.teardown()
    442 

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py in train(self)
     46 
     47         # train or test
---> 48         results = self.train_or_test()
     49         return results
     50 

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
     66             results = self.trainer.run_test()
     67         else:
---> 68             results = self.trainer.train()
     69         return results
     70 

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
    483 
    484                 # run train epoch
--> 485                 self.train_loop.run_training_epoch()
    486 
    487                 if self.max_steps and self.max_steps <= self.global_step:

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
    558             # hook
    559             # TODO: add outputs to batches
--> 560             self.on_train_batch_end(epoch_output, epoch_end_outputs, batch, batch_idx, dataloader_idx)
    561 
    562             # -----------------------------------------

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in on_train_batch_end(self, epoch_output, epoch_end_outputs, batch, batch_idx, dataloader_idx)
    248         # hook
    249         self.trainer.call_hook("on_batch_end")
--> 250         self.trainer.call_hook("on_train_batch_end", epoch_end_outputs, batch, batch_idx, dataloader_idx)
    251 
    252     def reset_train_val_dataloaders(self, model):

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in call_hook(self, hook_name, *args, **kwargs)
    823             if hasattr(self, hook_name):
    824                 trainer_hook = getattr(self, hook_name)
--> 825                 trainer_hook(*args, **kwargs)
    826 
    827             # next call hook in lightningModule

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py in on_train_batch_end(self, outputs, batch, batch_idx, dataloader_idx)
    145         """Called when the training batch ends."""
    146         for callback in self.callbacks:
--> 147             callback.on_train_batch_end(self, self.get_model(), outputs, batch, batch_idx, dataloader_idx)
    148 
    149     def on_validation_batch_start(self, batch, batch_idx, dataloader_idx):

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/tuner/lr_finder.py in on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx)
    399             self.progress_bar.update()
    400 
--> 401         current_loss = trainer.train_loop.running_loss.last().item()
    402         current_step = trainer.global_step + 1  # remove the +1 in 1.0
    403 

AttributeError: 'NoneType' object has no attribute 'item'

I think it has to do with the version of my pytorch lightning version because the error seems to be in pytorch_lightning. I wonder what version of pytorch_lightning did you use in the notebook?

warnings.warn Type error: expected string or bytes-like object

  • PyTorch-Forecasting version: 0.6.0
  • PyTorch version: 1.6
  • Python version: 3.8
  • Operating System: Ubuntu 20.04

I updated PyTorch-Forecasting from 0.5.2. to 0.6.0.
Seems like a part of your code in timeseries.py around [804:812] has a mistake.
Here how it looks:

warnings.warn(
(
"Min encoder length and/or min_prediction_idx and/or min prediction length is too large for "
f"{len(missing_groups)} series/groups which therefore are not present in the dataset index. "
"This means no predictions can be made for those series",
f"First 10 removed groups: {list(missing_groups.iloc[:10].to_dict(orient='index').values())}",
),
UserWarning,
)

which results in Typeerror: expected string or bytes-like object

You have two redundant commas in the message part which crashes the code. Should look like:

warnings.warn(
(
"Min encoder length and/or min_prediction_idx and/or min prediction length is too large for "
f"{len(missing_groups)} series/groups which therefore are not present in the dataset index. "
"This means no predictions can be made for those series"<NO_COMMA>
f"First 10 removed groups: {list(missing_groups.iloc[:10].to_dict(orient='index').values())}"<NO_COMMA>
),
UserWarning,
)

am I right?

ValueError in _construct_index when initialising TimeSeriesDataSet

I have been trying to test TFT for an extremely simple toy dataset, but always encounter a ValueError when initialising TimeSeriesDataSet.

I am trying to forecast a simple sine wave (again, this is just to get up and running); my DataFrame has two columns, time_idx (int 0 to 100), and price (float -1.0 to 1.0). The code for generating this dataset and initialising my dataset is as follows:

# Simply sample a sin wave
def sample_sin(samples_per_cycle, n_cycles, noise=None):
    sampling_gap = 2 * math.pi / samples_per_cycle
    xs = [sample * sampling_gap for sample in range(samples_per_cycle * n_cycles)]
    ys = [math.sin(x) + ((noise * random.random()) if noise is not None else 0) for x in xs]

    return xs, ys


# Save sampled sin wave as csv
def save_sin_dataset(filename, samples_per_cycle, n_cycles, noise=None):
    _, ys = sample_sin(samples_per_cycle, n_cycles, noise=noise)

    df = DataFrame({'price': ys})
    df.index.name = 'time_idx'
    df.to_csv(filename)

    return range(len(ys)), ys

# Load csv
df = pd.read_csv('sin.csv')

max_encode_length = 36
max_prediction_length = 6
training_cutoff = 90

training = TimeSeriesDataSet(
    df[:training_cutoff],
    time_idx="time_idx",
    group_ids=["price"],
    target="price",
    min_encoder_length=max_encode_length,
    max_encoder_length=max_encode_length,
    min_prediction_length=1,
    static_categoricals=[],
    static_reals=[],
    time_varying_known_categoricals=[],
    max_prediction_length=max_prediction_length,
    time_varying_unknown_reals=[
        "price",
    ],
    target_normalizer=EncoderNormalizer(
        coerce_positive=1.0
    ),
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
)

The error occurs at line 707 in timeseries.py:

df_index["count"] = (df_index["time_last"] - df_index["time_first"]).astype(int) + 1

Traceback:

Traceback (most recent call last):
  File "/Users/fraser/Documents/Personal Projects/Kontrary/forecasting.py", line 36, in <module>
    add_encoder_length=True,
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pytorch_forecasting/data/timeseries.py", line 284, in __init__
    self.index = self._construct_index(data, predict_mode=predict_mode)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pytorch_forecasting/data/timeseries.py", line 707, in _construct_index
    df_index["count"] = (df_index["time_last"] - df_index["time_first"]).astype(int) + 1
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 5546, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 595, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 406, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 595, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 966, in astype_nansafe
    raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer

When I debug, df_index_first and df_index_last contain NaN values, and I have no clue why; my DataFrame has no gaps or NaNs.

If someone could let me know what I'm doing wrong that would be great.

Errors when running ar.py on Win10

Hi,
I appreciate your work to provied a package for TFT using pytorch. I just succesfuly run the google AIHUB-implemaentation and would be happy to use a PyTorch version which seems to be somewhat easier.

But, so far I am not able to run even a simple example without problems on Windows 10 without GPU.
It would be great if you could provide an easy starting point into your development. It would help a lot to contribute to your project.

Installation:

I created an enironment with Python version 3.7.8.
After pip install pytorch-forecasting I got this error:
ERROR: Could not find a version that satisfies the requirement torch<2.0,>=1.6 (from pytorch-forecasting) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2) RROR: No matching distribution found for torch<2.0,>=1.6 (from pytorch-forecasting)

Nevertheless, when I install pytorch first (conda install pytorch torchvision cpuonly -c pytorch) then pytorch-forecasting got installed w/o problems and required versions were installed - among other:
optuna 2.0.0 pandas 1.1.0 pytorch-forecasting 0.2.0 pytorch-lightning 0.8.5 pytorch-ranger 0.1.1 scikit-learn 0.23.2 scipy 1.5.2 statsmodels 0.11.1 torch 1.6.0

Running an example

Your github README.md does not provide a directly executable example (data is missing). Thus I tried ar.py.
First obstacle is that you need generate_ar_data formexample/data/__init__.py which is not installed via above pip install procedure. A local copy in example_data.py with

#from data import generate_ar_data
from example_data import generate_ar_data

helped to overcome this.

TimeSeriesDataSet reported an error which led to following correction:

#data["static"] = 2
data["static"] = '2'   #must be string

pl.Trainer
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

TemporalFusionTransformer.from_dataset
Number of parameters in network: 76.6k

trainer.fit results in following list:

n Name Type Params
0 loss QuantileLoss 0
1 input_embeddings ModuleDict 1
2 prescalers ModuleDict 256
3 static_variable_selection VariableSelectionNetwork 9 K
4 encoder_variable_selection VariableSelectionNetwork 8 K
5 decoder_variable_selection VariableSelectionNetwork 4 K
6 static_context_variable_selection GatedResidualNetwork 4 K
7 static_context_initial_hidden_lstm GatedResidualNetwork 4 K
8 static_context_initial_cell_lstm GatedResidualNetwork 4 K
9 static_context_enrichment GatedResidualNetwork 4 K
10 lstm_encoder LSTM 8 K
11 lstm_decoder LSTM 8 K
12 post_lstm_gate_encoder GatedLinearUnit 2 K
13 post_lstm_add_norm_encoder AddNorm 64
14 static_enrichment GatedResidualNetwork 5 K
15 multihead_attn InterpretableMultiHeadAttention 4 K
16 post_attn_gate_norm GateAddNorm 2 K
17 pos_wise_ff GatedResidualNetwork 4 K
18 pre_output_gate_norm GateAddNorm 2 K
19 output_layer Linear 99

Validation sanity check: 0it [00:00, ?it/s]

and following RuntiemError:

RuntimeError Traceback (most recent call last)
in
12 torch.set_num_threads(10)
13 trainer.fit(
---> 14 tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader,
15 )

~\miniconda3\envs\tft\lib\site-packages\pytorch_lightning\trainer\trainer.py in fit(self, model, train_dataloader, val_dataloaders)
1042 self.optimizers, self.lr_schedulers, self.optimizer_frequencies = self.init_optimizers(model)
1043
-> 1044 results = self.run_pretrain_routine(model)
1045
1046 # callbacks

~\miniconda3\envs\tft\lib\site-packages\pytorch_lightning\trainer\trainer.py in run_pretrain_routine(self, model)
1194 self.val_dataloaders,
1195 max_batches,
-> 1196 False)
1197
1198 # allow no returns from eval

~\miniconda3\envs\tft\lib\site-packages\pytorch_lightning\trainer\evaluation_loop.py in _evaluate(self, model, dataloaders, max_batches, test_mode)
291 output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
292 else:
--> 293 output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
294
295 # on dp / ddp2 might still want to do something with the batch parts

~\miniconda3\envs\tft\lib\site-packages\pytorch_lightning\trainer\evaluation_loop.py in evaluation_forward(self, model, batch, batch_idx, dataloader_idx, test_mode)
468 output = model.test_step(*args)
469 else:
--> 470 output = model.validation_step(*args)
471
472 return output

~\miniconda3\envs\tft\lib\site-packages\pytorch_forecasting\models\base_model.py in validation_step(self, batch, batch_idx)
143 def validation_step(self, batch, batch_idx):
144 x, y = batch
--> 145 log, _ = self.step(x, y, batch_idx, label="val")
146 return log
147

~\miniconda3\envs\tft\lib\site-packages\pytorch_forecasting\models\temporal_fusion_transformer_init_.py in step(self, x, y, batch_idx, label)
572 # extract data and run model
573 y = rnn.pack_padded_sequence(y, lengths=x["decoder_lengths"], batch_first=True, enforce_sorted=False)
--> 574 log, out = super().step(x, y, batch_idx, label=label)
575 # calculate interpretations etc for latter logging
576 if self.log_interval(label == "train") > 0:

~\miniconda3\envs\tft\lib\site-packages\pytorch_forecasting\models\base_model.py in step(self, x, y, batch_idx, label)
185 loss = self.loss(prediction, y) * (1 + monotinicity_loss)
186 else:
--> 187 out = self(x)
188 out["prediction"] = self.transform_output(out)
189

~\miniconda3\envs\tft\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~\miniconda3\envs\tft\lib\site-packages\pytorch_forecasting\models\temporal_fusion_transformer_init_.py in forward(self, x)
445 )
446 else:
--> 447 input_vectors[name] = emb(x_cat[..., self.hparams.x_categoricals.index(name)])
448 input_vectors.update({name: x_cont[..., idx].unsqueeze(-1) for idx, name in enumerate(self.hparams.x_reals)})
449

~\miniconda3\envs\tft\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~\miniconda3\envs\tft\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
--> 126 self.norm_type, self.scale_grad_by_freq, self.sparse)
127
128 def extra_repr(self) -> str:

~\miniconda3\envs\tft\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

Best Model not provided

The AR example has a bestmodel with an address to a personal computer, this bestmodel is not provided and cause errors when running the notebook.

Group ID specific information

Hello @jdb78 ,

If possible could u push me in the right direction for following goal, I could write additional code myself if needed.

The values we want to predict (="target") grouped by a unique combination of "group_id's" I call this a "unique time-series".

My goal is to : Extract predictions, actual values for "target" for each "unique time-series". So I can calculate sMAPE, make plots for each unique time-series.
More in detail below

Currently I tested out TFT on the M5 kaggle dataset: https://www.kaggle.com/c/m5-forecasting-accuracy/data
The google colab notebook is here : https://colab.research.google.com/drive/1OSSf7qgOeyZRSUbGgBektHSPSZ1Yambp?usp=sharing

I used 'shop_id' as single group_id creating time-series for 10 unique groups. I have 19130 samples.

Validation dataset

We can make predictions and show them using the plot function (see code below).
This shows some samples of the validation dataset but group_id information can't be found.
My question is: How can val_dataloader be used to get the actual and predicted values of target for each unique time-series ?

# raw predictions are a dictionary from which all kind of information including quantiles can be extracted
raw_predictions, x = best_tft.predict(val_dataloader, mode="raw", return_x=True)

#we can see which information is in these raw predictions
display(raw_predictions.keys())
display(x.keys())

# calculate metric by which to display
predictions = best_tft.predict(val_dataloader)
mean_losses = SMAPE(reduction="none")(predictions, actuals).mean(1)
indices = mean_losses.argsort(descending=False)  # sort losses
for idx in range(10):  # plot 10 examples
    best_tft.plot_prediction(x, raw_predictions, idx=indices[idx], add_loss_to_title=SMAPE())

New data/Test-set

I have kept some data seperate to not include in the training procedure.
I can extract the real values of the target for each unique time-series.
My Question is: How can I extract predictions on the target for each unique time-series for new data ?

Tensor Dimension Error When Applying TFT to Multiple Groups in Own Data

Hi @jdb78,

After getting the TFT model working well for one group of data based on our last convo (and updating to the latest version of the library), I'm getting an odd tensor dimension error when I try to train my model across multiple groups on my data. Specifically on epoch 15 I get this error/trace:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-32-14fda4f79b4a> in <module>
      1 # Train model
----> 2 trainer.fit(
      3     tft,
      4     train_dataloader = train_dataloader,
      5     val_dataloaders = val_dataloader

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
     46             if entering is not None:
     47                 self.state = entering
---> 48             result = fn(self, *args, **kwargs)
     49 
     50             # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
   1071             self.accelerator_backend = GPUBackend(self)
   1072             model = self.accelerator_backend.setup(model)
-> 1073             results = self.accelerator_backend.train(model)
   1074 
   1075         elif self.use_tpu:

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_backend.py in train(self, model)
     49 
     50     def train(self, model):
---> 51         results = self.trainer.run_pretrain_routine(model)
     52         return results
     53 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
   1237 
   1238         # CORE TRAINING LOOP
-> 1239         self.train()
   1240 
   1241     def _run_sanity_check(self, ref_model, model):

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py in train(self)
    392                 # RUN TNG EPOCH
    393                 # -----------------
--> 394                 self.run_training_epoch()
    395 
    396                 if self.max_steps and self.max_steps <= self.global_step:

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
    548 
    549         # process epoch outputs
--> 550         self.run_training_epoch_end(epoch_output, checkpoint_accumulator, early_stopping_accumulator, num_optimizers)
    551 
    552         # checkpoint callback

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch_end(self, epoch_output, checkpoint_accumulator, early_stopping_accumulator, num_optimizers)
    662             # run training_epoch_end
    663             # a list with a result per optimizer index
--> 664             epoch_output = model.training_epoch_end(epoch_output)
    665 
    666             if isinstance(epoch_output, Result):

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/base_model.py in training_epoch_end(self, outputs)
    133 
    134     def training_epoch_end(self, outputs):
--> 135         log, _ = self.epoch_end(outputs, label="train")
    136         return log
    137 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in epoch_end(self, outputs, label)
    613         log, out = super().epoch_end(outputs, label=label)
    614         if self.log_interval(label == "train") > 0:
--> 615             self._log_interpretation(out, label=label)
    616         return log, out
    617 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in _log_interpretation(self, outputs, label)
    820         """
    821         # extract interpretations
--> 822         interpretation = {
    823             name: torch.stack([x["interpretation"][name] for x in outputs]).sum(0)
    824             for name in outputs[0]["interpretation"].keys()

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in <dictcomp>(.0)
    821         # extract interpretations
    822         interpretation = {
--> 823             name: torch.stack([x["interpretation"][name] for x in outputs]).sum(0)
    824             for name in outputs[0]["interpretation"].keys()
    825         }

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 6 and 7 in dimension 1 at /tmp/pip-req-build-8yht7tdu/aten/src/THC/generic/THCTensorMath.cu:71

At first I thought this could be because not every state has the same amount of data available. For example, for select states, the number of days for which data exist are:

state
CA    218
FL    218
GA    218
NY    218
TX    218
WA    260

So I tried just getting the last 218 observations for each state and resetting time_idx, but the same tensor error was raised.

Right now I'm only including the state as a group variable and am doing univariate time series modeling, so no other data are in the TFT.

Also, the learning rate finder works, this is just the training that's failing. Really odd.

I've made my code available here: https://drive.google.com/file/d/1r1w2tHZJrr8iXVqw7U5_qL1x4iOUsauk/view?usp=sharing. The data I'm playing with are on COVID, and the notebook includes a pd.read_csv() call that read in all the data for you from an online source, so you should be able to run it on your own without any issues.

Any thoughts? Again, I greatly appreciate your feedback as I'm new to deep learning on time series data.

Thanks in advance!

Best,
Alex

Prediction as a DataFrame

What would be the best way to get the tft prediction in the same format as my input dataframe?
Instead of just having tensors with predicted values and encoded group_ids I want to have a dataframe with my original group_ids (single value prediction or all quantiles).

I want to go from:
tensor([[0, 0], [0, 1]])
tensor([[234, 375], [73, 70]])

To:

region sku time_idx prediction
reg1 sku1 100 234
reg1 sku1 101 375
reg1 sku2 100 73
reg1 sku2 101 70

Thank you for a fantastic library!

Add Amazon's GluonTS features to TFT

Overall Comments

First of all, I believe that this is a great initiative, especially that we have the TFT available + the Optuna optimisation! Great!

Descritpion

Amazon's GluonTS has some features automatically addded to the models, most notably, the time features, e.g., it is adding day-of-the-week, month-of-the-year and many more to the model, see here. But it also has some other features available, see here.

So my question is: are you planning to also add these to the Temporal Fusion Transformer? They would be great additions.

Trained Models attention only on recent data

  • PyTorch-Forecasting version: 0.6.0
  • PyTorch version: 1.6
  • Python version: 3.6
  • Operating System: Linux

I made a model to predict some stock market features but the results seem to be very weird. I used the "Demand forecasting with the Temporal Fusion Transformer" example as a base but my model seems to be off, despite having a larger size, more training, more data etc.

learning_rate=0.013489628825916528, #I got the learning rate by tuner.
hidden_size=64,
attention_head_size=16,
dropout=0.1,
hidden_continuous_size=16,
output_size=7,
loss=QuantileLoss(),
log_interval=10,
reduce_on_plateau_patience=4,

Here are some prediction examples:
download (9)
download (8)
download (7)

Since I'm using colab for training, I did run around 100 epochs, but there were no good. My dataset has 488468 rows, and 6 time_varying_unknown_reals. I really don't understand why at this point the network is misbehaving.

Quick Note:
I also calculated the "Actuals vs predictions by variables" and the results in that part seem to be a little more promising but ended up confusing me even more.
download (12)
download (11)
download (10)

question: how to load data from timeseries of multiple devices

I have a data set which consists of many multivariate time series (i.e. time series with > 1 value per timestamp originating from many IoT devices).

How can I load such a dataset to pytorch using your https://pytorch-forecasting.readthedocs.io/en/latest/data.html data loader - or do I need to implement my own? I need to ensure that the data is interpreted in the right way to allow the LSTM to learn patterns from an individual time-series / window and include information from multiple devices / time windows in a batch.

I would want to use it for an LSTM-autoencoder to perform anomaly detection.

image


    import pandas as pd
      from pandas import Timestamp
      df = pd.DataFrame({'hour': {0: Timestamp('2020-01-01 00:00:00'), 1: Timestamp('2020-01-01 00:00:00'), 2: Timestamp('2020-01-01 00:00:00'), 3: Timestamp('2020-01-01 00:00:00'), 4: Timestamp('2020-01-01 00:00:00'), 5: Timestamp('2020-01-01 01:00:00'), 6: Timestamp('2020-01-01 01:00:00'), 7: Timestamp('2020-01-01 01:00:00'), 8: Timestamp('2020-01-01 01:00:00'), 9: Timestamp('2020-01-01 01:00:00')}, 'metrik_0': {0: 2.020883621337143, 1: 2.808770093182167, 2: 2.5267618429653402, 3: 3.2709845883575346, 4: 3.7984105853602235, 5: 4.0385160093937795, 6: 4.643267594258785, 7: 1.3012379179114388, 8: 3.509304898336378, 9: 2.8664748765561208}, 'metrik_1': {0: 4.580434685779621, 1: 2.933188328317023, 2: 3.999229120882797, 3: 2.9099857745449706, 4: 4.6302055552849, 5: 4.012670194672169, 6: 3.697352153313931, 7: 4.855210603371005, 8: 2.2197913449032254, 9: 2.393605868973481}, 'metrik_2': {0: 3.680527279150989, 1: 2.511065648719921, 2: 3.8350007982479113, 3: 2.4063786290320333, 4: 3.231433617897482, 5: 3.8505378854180115, 6: 5.359150077287063, 7: 2.8966469424805386, 8: 4.554080028058399, 9: 3.3319064764061914}, 'cohort_id': {0: 1, 1: 2, 2: 1, 3: 2, 4: 2, 5: 1, 6: 2, 7: 2, 8: 1, 9: 2}, 'device_id': {0: 1, 1: 3, 2: 4, 3: 2, 4: 5, 5: 4, 6: 3, 7: 2, 8: 1, 9: 5}})


N-Beats on M4 Dataset (with NAN)

Hey,
I'm trying to run N-Beats from the tutorial on M4 dataset.

For example, using the hourly data:

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ... V952 V953 V954 V955 V956 V957 V958 V959 V960 V961
H1 605.0 586.0 586.0 559.0 511.0 443.0 422.0 395.0 382.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
H2 3124.0 2990.0 2862.0 2809.0 2544.0 2201.0 1996.0 1861.0 1735.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

I've melted the dataset to the TimeSeriesDataSet format and used the tutorial.
The difference with the M4 data compared to the toy dataset in the tutorial is that the series are of unequal length, therefore there are NAN's at the end. I've used the allow_missings=True option.

The baseline predictions are filled with NANs, which causes a warning and gives a very high SMAPE loss.

baseline_predictions
>>> tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...

SMAPE()(baseline_predictions, actuals)
>>>UserWarning: Loss is not finite. Resetting it to 1e9
  warnings.warn("Loss is not finite. Resetting it to 1e9")

The above is "solved" by dropping NA's from the dataset - the net is training as expected. But this leaves me with very little data.
How can I still work around the NAs?

  • PyTorch-Forecasting version: 0.5.2
  • PyTorch lightening version:1.0.3
  • Python version: 3.8
  • Operating System: Ubuntu 20.04

Forecasting for categorical target parameter in DataSetTimeSeries?

Hello @jdb78,

Is there the ability to use the forcasting package to predict categorical timeseries? I would like to use the package for a multiclass classification problem. So, in my problem the target parameter in DataSetTimeSeries is categorical and not numeric.

If this is not supported right now can you estimate whether it is possible to integrate this feature and where in the code do you see the biggest adjustments/changes which have to be made?

Issues running TDS Stallion Example with W&B logger

I'm seeing issues trying to run the W&B logger when replicating the Towards Data Science example with the Stallion dataset (to be fair, switching to TensorBoard made it fail too, although for a completely different-sounding reason oddly). When I try to train the model, I get a single graphical output (attached) and then it errors out. I'm using:

- pytorch=1.4.0
- pytorch-forecasting=0.4.1
- pytorch-lightning=0.9.0
- wandb=0.10.4

I know the requirements.txt indicates (py)torch >= 1.6, but I can't get conda to find a good solution for that in my dependency tree, and this seems to be a logger issue anyhow. Here's the full traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-562ea3edbba3> in <module>
     25     tft,
     26     train_dataloader=train_dataloader,
---> 27     val_dataloaders=val_dataloader
     28 )

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
     46             if entering is not None:
     47                 self.state = entering
---> 48             result = fn(self, *args, **kwargs)
     49 
     50             # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
   1082             self.accelerator_backend = CPUBackend(self)
   1083             self.accelerator_backend.setup(model)
-> 1084             results = self.accelerator_backend.train(model)
   1085 
   1086         # on fit end callback

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/accelerators/cpu_backend.py in train(self, model)
     37 
     38     def train(self, model):
---> 39         results = self.trainer.run_pretrain_routine(model)
     40         return results

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
   1222 
   1223         # run a few val batches before training starts
-> 1224         self._run_sanity_check(ref_model, model)
   1225 
   1226         # clear cache before training

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in _run_sanity_check(self, ref_model, model)
   1255             num_loaders = len(self.val_dataloaders)
   1256             max_batches = [self.num_sanity_val_steps] * num_loaders
-> 1257             eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
   1258 
   1259             # allow no returns from eval

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py in _evaluate(self, model, dataloaders, max_batches, test_mode)
    331                         output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
    332                 else:
--> 333                     output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
    334 
    335                 is_result_obj = isinstance(output, Result)

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_forward(self, model, batch, batch_idx, dataloader_idx, test_mode)
    685             output = model.test_step(*args)
    686         else:
--> 687             output = model.validation_step(*args)
    688 
    689         return output

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in validation_step(self, batch, batch_idx)
    138     def validation_step(self, batch, batch_idx):
    139         x, y = batch
--> 140         log, _ = self.step(x, y, batch_idx, label="val")
    141         return log
    142 

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in step(self, x, y, batch_idx, label)
    595         """
    596         # extract data and run model
--> 597         log, out = super().step(x, y, batch_idx, label=label)
    598         # calculate interpretations etc for latter logging
    599         if self.log_interval(label == "train") > 0:

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in step(self, x, y, batch_idx, label)
    223             log["loss"] = loss
    224         if self.log_interval(label == "train") > 0:
--> 225             self._log_prediction(x, out, batch_idx, label=label)
    226         return log, out
    227 

/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in _log_prediction(self, x, out, batch_idx, label)
    281                 else:
    282                     tag += f" of item {idx} in batch {batch_idx}"
--> 283                 self.logger.experiment.add_figure(
    284                     tag,
    285                     fig,

AttributeError: 'Run' object has no attribute 'add_figure'

It seems as though pytorch_forecasting is assuming that all loggers have the same add_figure() method, but clearly that's not the case in this version of W&B/pytorch-lightning. Any thoughts on way to rectify this? I'd also be game for a workaround to disable the default figure generation during training, although it is very nice to get those super-informative figures so I'd rather get them working if I could!

TFT_Fig

Ability to do multi-target forecasting?

I noticed that the TimeSeriesDataset class is designed to only look at one column for the target variable. In my use case, I'm trying to forecast traffic patterns and, as such, will need to forecast X and Y simultaneously. Given that these are expected to be somewhat correlated (as the presence of a road, for example, isn't equally probable at all latitudes and longitudes), I don't think it would work to build two separate models for forecasting each in isolation. Is there a theoretical limitation for the models currently included in the package that makes it impossible to have more than one target variable?

validation dataset containing categorical value not in training dataset

  • PyTorch-Forecasting version: latest
  • PyTorch version: 1.7
  • Python version: 3.7
  • Operating System: debian

Hi. One of my column has unique values [c0, c1, c2, ..., c15, c16, h1, h2]. These values โ€‹โ€‹are categorical values. I am trying to predict the rows denoted by h using the rows denoted by c. As h values โ€‹โ€‹are not included in training, so an error occurs when creating a validation set. I currently have a dataset excluding this column, is there any good way to include this column values?

KeyError: 'module name can\'t contain "."'

while following tutorial with my own dataset error occured.

baseline model was ok but TFT not working

code

pl.seed_everything(42) ##
trainer = pl.Trainer(
gpus=0,
# clipping gradients is a hyperparameter and important to prevent divergance
# of the gradient for recurrent neural networks
gradient_clip_val=0.1,
)

tft = TemporalFusionTransformer.from_dataset(
training,
# not meaningful for finding the learning rate but otherwise very important
learning_rate=0.03,
hidden_size=16, # most important hyperparameter apart from learning rate
# number of attention heads. Set to up to 4 for large datasets
attention_head_size=1,
dropout=0.1, # between 0.1 and 0.3 are good values
hidden_continuous_size=8, # set to <= hidden_size
output_size=7, # 7 quantiles by default
loss=QuantileLoss(),
# reduce learning rate if no improvement in validation loss after x epochs
reduce_on_plateau_patience=4
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

error


KeyError Traceback (most recent call last)
in
24 loss=QuantileLoss(),
25 # reduce learning rate if no improvement in validation loss after x epochs
---> 26 reduce_on_plateau_patience=4, ## patience after which learning rate is reduced by a factor of 10
27 )
28 print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

/kaggle/working/pytorch_forecasting/models/temporal_fusion_transformer/init.py in from_dataset(cls, dataset, allowed_encoder_known_variable_names, **kwargs)
341 # create class and return
342 return super().from_dataset(
--> 343 dataset, allowed_encoder_known_variable_names=allowed_encoder_known_variable_names, **new_kwargs
344 )
345

/kaggle/working/pytorch_forecasting/models/base_model.py in from_dataset(cls, dataset, allowed_encoder_known_variable_names, **kwargs)
887 )
888 new_kwargs.update(kwargs)
--> 889 return super().from_dataset(dataset, **new_kwargs)
890
891 def calculate_prediction_actual_by_variable(

/kaggle/working/pytorch_forecasting/models/base_model.py in from_dataset(cls, dataset, **kwargs)
534 if "output_transformer" not in kwargs:
535 kwargs["output_transformer"] = dataset.target_normalizer
--> 536 net = cls(**kwargs)
537 net.dataset_parameters = dataset.get_parameters()
538 return net

/kaggle/working/pytorch_forecasting/models/temporal_fusion_transformer/init.py in init(self, hidden_size, lstm_layers, dropout, output_size, loss, attention_head_size, max_encoder_length, static_categoricals, static_reals, time_varying_categoricals_encoder, time_varying_categoricals_decoder, categorical_groups, time_varying_reals_encoder, time_varying_reals_decoder, x_reals, x_categoricals, hidden_continuous_size, hidden_continuous_sizes, embedding_sizes, embedding_paddings, embedding_labels, learning_rate, log_interval, log_val_interval, log_gradient_flow, reduce_on_plateau_patience, monotone_constaints, share_single_variable_networks, logging_metrics, **kwargs)
144 embedding_paddings=self.hparams.embedding_paddings,
145 x_categoricals=self.hparams.x_categoricals,
--> 146 max_embedding_size=self.hparams.hidden_size,
147 )
148

/kaggle/working/pytorch_forecasting/models/nn/embeddings.py in init(self, embedding_sizes, categorical_groups, embedding_paddings, x_categoricals, max_embedding_size)
43 self.x_categoricals = x_categoricals
44
---> 45 self.init_embeddings()
46
47 def init_embeddings(self):

/kaggle/working/pytorch_forecasting/models/nn/embeddings.py in init_embeddings(self)
66 self.embedding_sizes[name][0],
67 embedding_size,
---> 68 padding_idx=padding_idx,
69 )
70

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py in setitem(self, key, module)
285
286 def setitem(self, key: str, module: Module) -> None:
--> 287 self.add_module(key, module)
288
289 def delitem(self, key: str) -> None:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in add_module(self, name, module)
345 raise KeyError("attribute '{}' already exists".format(name))
346 elif '.' in name:
--> 347 raise KeyError("module name can't contain "."")
348 elif name == '':
349 raise KeyError("module name can't be empty string """)

KeyError: 'module name can't contain "."'

question : dataloader size?

  • PyTorch-Forecasting version: latest
  • PyTorch version: 1.7
  • Python version: 3.7
  • Operating System: debian

Hi. I thought TimeSeriesDataSet automatically group input data using group_id. In stallion data, there are 351 unique groups and there are 20651 items in train_dataloader (when i set batch size =1). What makes this difference?

And how is the size of the valdiation determined? I checked to_dataloader but couldn't find the answer.

Errors

Hello, im stallion.py but im getting few problems:
using gpu = 1 in the trainer return

models/temporal_fusion_transformer/init.py", line 793, in _log_interpretation dim=0

using only the cpu it works but return this error:

AttributeError: module 'tensorflow._api.v1.io.gfile' has no attribute 'get_filesystem'
which apparently it is due to pytorch and tensorboard incompatibility but i managed to install pytorch-forecasting without problems so could you tell me what version of pythorch/tensorboard are you using?

thanks

question: TFT for time series classification

Hi,
I would like to use the temporal fusion transformer(TFT) for time-series-classification. For our illustrative example we have 2 target classes (True, False) for the cancellation of a membership.

Illustrative example:
Suppose you have an e-commerce shop, and you want to find out whether a customer will cancel his premium membership (like Amazon-prime) based on his shopping behavior.

Parameters for TimeSeriesDataSet:
target = [โ€˜cancellationโ€™]
group_ids= [โ€˜customerIDโ€™]
static_categoricals = [โ€˜zipโ€™, โ€˜genderโ€™]
time_varying_known_reals = [โ€˜time_idxโ€™]
time_varying_unknown_categoricals = [โ€˜shopping_eventโ€™]
time_varying_unknown_reals = [โ€˜age_at_shopping_eventโ€™]
max_prediction_length=1

Is it convenient to use the TFT for this kind of task?
How does the shape of the target vector look like, e.g.

  • Method 1
    - always True for customers who cancel the membership?
    - always False for customers who donโ€™t cancel the membership?
  • Method 2
    - False until the customer cancels the membership. Then always True?

I have already implemented a BinaryCrossEntropyLoss metric analogous to the CrossEntropyLoss.
If you like, I can make a merge request for the BinaryCrossEntropyLoss.

Thanks for your ideas

Question about forecasting ahead to days outside training/validation set

Hello! First of all, thank you for this wonderful and easy to use library. It's been a joy to work with, and I am really impressed with all of the features you've so kindly added to it for things like easy visualization and evaluation of feature importance!

I am new to deep learning for time series (working on graphs and NLP has been my focus area), and was wondering how I can get the TFT model to generate forecasts ahead to days that are not in the training/validation data? For example, in your example with the stallion data, you produce graphs showing predicted vs observed on the validation set, but how can I get the model to show me predicted trends for data beyond this (e.g., if one wanted to use the model to predict what volume would be in the future)?

Thanks so much for your insight and advice! Really appreciate it!

Hyperparameter optimization does not return learning rate

Presently, the hyperparameter optimization returns the best parameters for

{'gradient_clip_val': 1.028043566346387,
 'hidden_size': 176,
 'dropout': 0.22095396475678628,
 'hidden_continuous_size': 36,
 'attention_head_size': 1}

even when use_learning_rate_finder=True. I expect this could be fixed easily just by adding it to the optuna parameters to track.

Construction of TimeSeriesDataSet for interrupted sequences and variable sequence sampling frequency

Hi,
I have a question on the TimeSeriesDataSet class. In my case, I have a time series sequence with a sampling frequency of 15 minutes, and one target variable. Unfortunately, the sequence is extensively interrupted (too extensively to impute the np.nan's with values) at several points in the sequence; e.g. it looks like:

target input_0
2004-10-31 23:00:00+00:00 90 3.3
2004-10-31 23:15:00+00:00 91.5 3.3
..... 91.5 3.3
2004-11-30 23:15:00+00:00 91.5 3.3
2004-11-30 23:30:00+00:00 np.nan np.nan
.... np.nan np.nan
2004-12-01 23:30:00+00:00 89.5 3.25
2004-12-01 23:45:00+00:00 86 3.2

How to handle this? From the documentation, I have understood that I could handle this by assigning to each uninterrupted chunk of the time series a different 'timeseries' number:

target input_0 timeseries
2004-10-31 23:00:00+00:00 90 3.3 0
2004-10-31 23:15:00+00:00 91.5 3.3 0
..... 91.5 3.3 0
2004-11-30 23:15:00+00:00 91.5 3.3 0
2004-12-01 23:30:00+00:00 89.5 3.25 1
2004-12-01 23:45:00+00:00 86 3.2 1

Is this correct?

In addition, I understand that to make predictions for 6 points in the future based on 24 points in the past I need to set max_prediction_length = 6 and max_encoder_length = 24 (is this correct?). However, suppose that I want to predict 6 points in the future, with a frequency of 30 minutes; using 24 points in the past with a different sampling frequency (say an hour). How to achieve this? Should I make one dataframe with many 'time series', where each time series only has 30 points (the 24 first points the input sampled every hour and the final 6 points to predict which are sampled every 30 minutes)? Would that work?

Many thanks in any case!
Tomas

Error passing CUDA tensor to nn.utils.rnn.pack_padded_sequence

  • PyTorch-Forecasting version: v0.5.3
  • PyTorch version: 1.7.0
  • Python version: 3.7.9
  • Operating System: Ubuntu 20.04.1 LTS

Expected behavior

I executed codes to find optimal learning rate or to fit network and and expected to get result as written in pytorch-forecasting.readthedocs.io. The only difference was gpus=1 in pl.Trainer parameter.

# configure network and trainer
pl.seed_everything(42)
trainer = pl.Trainer(
    gpus=1,
    # clipping gradients is a hyperparameter and important to prevent divergance
    # of the gradient for recurrent neural networks
    gradient_clip_val=0.1,
)


tft = TemporalFusionTransformer.from_dataset(
    training,
    # not meaningful for finding the learning rate but otherwise very important
    learning_rate=0.03,
    hidden_size=16,  # most important hyperparameter apart from learning rate
    # number of attention heads. Set to up to 4 for large datasets
    attention_head_size=1,
    dropout=0.1,  # between 0.1 and 0.3 are good values
    hidden_continuous_size=8,  # set to <= hidden_size
    output_size=7,  # 7 quantiles by default
    loss=QuantileLoss(),
    # reduce learning rate if no improvement in validation loss after x epochs
    reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

# find optimal learning rate
res = trainer.tuner.lr_find(
    tft,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
    max_lr=10.0,
    min_lr=1e-6,
)

print(f"suggested learning rate: {res.suggestion()}")
fig = res.plot(show=True, suggest=True)
fig.show()

Actual behavior

However, it gives RuntimeError like below:

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-a92b5627800b> in <module>
      5     val_dataloaders=val_dataloader,
      6     max_lr=10.0,
----> 7     min_lr=1e-6,
      8 )
      9 

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
    128             mode,
    129             early_stop_threshold,
--> 130             datamodule,
    131         )
    132 

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/tuner/lr_finder.py in lr_find(trainer, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
    173                 train_dataloader=train_dataloader,
    174                 val_dataloaders=val_dataloaders,
--> 175                 datamodule=datamodule)
    176 
    177     # Prompt if we stopped early

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    437         self.call_hook('on_fit_start')
    438 
--> 439         results = self.accelerator_backend.train()
    440         self.accelerator_backend.teardown()
    441 

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in train(self)
     52 
     53         # train or test
---> 54         results = self.train_or_test()
     55         return results
     56 

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
     64             results = self.trainer.run_test()
     65         else:
---> 66             results = self.trainer.train()
     67         return results
     68 

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
    459 
    460     def train(self):
--> 461         self.run_sanity_check(self.get_model())
    462 
    463         # enable train mode

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
    645 
    646             # run eval step
--> 647             _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
    648 
    649             # allow no returns from eval

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, test_mode, max_batches)
    565 
    566                 # lightning module methods
--> 567                 output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
    568                 output = self.evaluation_loop.evaluation_step_end(output)
    569 

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_step(self, test_mode, batch, batch_idx, dataloader_idx)
    169             output = self.trainer.accelerator_backend.test_step(args)
    170         else:
--> 171             output = self.trainer.accelerator_backend.validation_step(args)
    172 
    173         # track batch size for weighted average

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in validation_step(self, args)
     76                 output = self.__validation_step(args)
     77         else:
---> 78             output = self.__validation_step(args)
     79 
     80         return output

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in __validation_step(self, args)
     84         batch = self.to_device(batch)
     85         args[0] = batch
---> 86         output = self.trainer.model.validation_step(*args)
     87         return output
     88 

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in validation_step(self, batch, batch_idx)
    138     def validation_step(self, batch, batch_idx):
    139         x, y = batch
--> 140         log, _ = self.step(x, y, batch_idx, label="val")  # log loss
    141         self.log("val_loss", log["loss"], on_step=False, on_epoch=True, prog_bar=True)
    142         return log

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in step(self, x, y, batch_idx, label)
    566         """
    567         # extract data and run model
--> 568         log, out = super().step(x, y, batch_idx, label=label)
    569         # calculate interpretations etc for latter logging
    570         if self.log_interval(label == "train") > 0:

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in step(self, x, y, batch_idx, label)
    194             loss = loss * (1 + monotinicity_loss)
    195         else:
--> 196             out = self(x)
    197             out["prediction"] = self.transform_output(out)
    198 

~/repo/emart-promo/env/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in forward(self, x)
    489         encoder_output, (hidden, cell) = self.lstm_encoder(
    490             rnn.pack_padded_sequence(
--> 491                 embeddings_varying_encoder, lstm_encoder_lengths, enforce_sorted=False, batch_first=True
    492             ),
    493             (input_hidden, input_cell),

~/repo/emart-promo/env/lib/python3.7/site-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)
    242 
    243     data, batch_sizes = \
--> 244         _VF._pack_padded_sequence(input, lengths, batch_first)
    245     return _packed_sequence_init(data, batch_sizes, sorted_indices, None)
    246 

RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Seems related to these issues:

'TimeSeriesDataSet' object has no attribute 'args'

When I try to create a TimeSeriesDataSet for validation it shows me the error in the title. I tried:

validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training.index.time.max() + 1, stop_randomization=True)

The training dataset is created successfully.
Any ideas what could be wrong?

This is Awesome!!

Hello Jan! I just read the article about this package. I was thrilled to find that it had the temporal fusion transformer, but also that it was working with pytorch-lightning!!. I came straight to the repo to check the code and the implementation, only to find out that you improved a lot upon the code I had built on my initial implementation of the TFT!!. I'm just ecstatic!!. I would love to start contributing to this repo, adding more recipes from simpler pytorch architectures, and also work on the implementation of newer ones!

I am a firm believer that the advances that have happened in Image and NLP, can happen in time series too. We just need a package like this, that helps unify the use for everybody and solves the common issues that happen in the implementations of time series models!

For my part im gonna start testing the models, because im currently working in a time series problem that im trying to solve with neural networks. I was thinking on continue the work i started on the TFT, but i think this implementation is already way better!

Lets keep in touch!

Handling of look-ahead bias

Hi, can I please ask how you are handling the potential for look-ahead bias in the scaling of the features etc? This seems to be a common problem in timeseries prediction. I did search the docs but couldn't find any such information. Many thanks.

AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True

  • PyTorch-Forecasting version: 0.5.2
  • PyTorch version: 1.6.0
  • PyTorch lightning version: 1.0.4
  • Python version: 3.7
  • Operating System: Ubuntu 18.04.2

Expected behavior

My data looks something like this. As you can see, there is no data for some dates. So what I did was to find unique dates and sort them. Then I just find the time index based on the date's position in the sorted date list.

sorted_dates = sorted(list(df.index.unique()))

DATE_TO_INDEX = {i: date for date, i in enumerate(sorted_dates)}

df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))

And this is what I got afterward
When I executed code

# categories have to be strings
df['month'] = df.index.month.astype(str).astype('category')
df['weekday'] = df.weekday.astype(str).astype('category')
df['sentiment_binary'] = df.sentiment_binary.astype(str).astype('category')
df['Instrument'] = df.Instrument.astype(str).astype('category')
df['log_close'] = np.log(df.Close + 1e-8)

# Add holidays
us_holidays = holidays.UnitedStates()
df['holiday'] = df.index.map(lambda date: us_holidays[date] if date in us_holidays else '-').astype('category')

#sentiment per instrument for each time index
df['avg_close_by_instrument'] = df.groupby(['time_idx', 'Instrument'], observed=True).Close.transform('mean')

df.reset_index(inplace=True)
df.rename(columns={'index': 'published'}, inplace=True)

train_percentage = 0.1
train_size = int((1 - train_percentage) * len(df))

train = df.iloc[0:train_size]
test = df.iloc[train_size:] 

max_prediction_length = test['time_idx'].min() 
max_encoder_length = 24
training_cutoff = df["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(
    df[lambda x: x.time_idx <= training_cutoff],
    time_idx='time_idx', 
    target='Close',
    group_ids=['Instrument'],
   )

I got this error

------------------------------------------------------------------------
AssertionError                         Traceback (most recent call last)
<ipython-input-216-a26698f913ea> in <module>
      7     time_idx='time_idx',
      8     target='Close',
----> 9     group_ids=['Instrument'],
     10    )

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, dropout_categoricals, constant_fill_strategy, allow_missings, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
    282 
    283         # create index
--> 284         self.index = self._construct_index(data, predict_mode=predict_mode)
    285 
    286         # convert to torch tensor for high performance data loading later

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
    718             assert (
    719                 self.allow_missings
--> 720             ), "Time difference between steps has been idenfied as larger than 1 - set allow_missings=True"
    721 
    722         df_index["index_end"], missing_sequences = _find_end_indices(

AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True

I have also tried to extract time_idx based on the difference between the date and the min date

def map_date_index(date, min_date):
    return (date - min_date).days
min_date = df.index.min()

df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))

but I encountered the same error. I wonder what could be the reason for this?

Reduction to practice of N-BEATS

  • PyTorch-Forecasting version: 0.5.1
  • PyTorch version: 1.6.0+cu101
  • Python version: 3.6.6
  • Operating System: Windows 7

First of all, amazing package!
Looks like tons of work.
Thanks so much.

I have spent a few days now getting to know it better, reading the documentation, and also running in debug to better understand what is going on.

I am trying to run N-BEATS on new data, and have a few questions regarding it:
1.a. Once I have a pre-trained model, what is the simplest way to predict on new data?
1.b. What is the minimal length of the data required?
1.c. What would be the behavior if the data is longer? Will the model "cut" just the required last samples and use them to predict?

  1. I need to train my model based on several series of different lengths.
    Building on top of the N-BEATS tutorial, I wrote a modified data generator in which I replaced the end generate_ar_data:

    insert into dataframe

    data = (
    pd.DataFrame(series)
    .stack()
    .reset_index()
    .rename(columns={"level_0": "series", "level_1": "time_idx", 0: "value"})

with the following:
# convert to dataframe, where the various series have different lengths
data = pd.DataFrame()
for k in range(series.shape[0]):
truncate = np.random.randint(0, 20)
if truncate > 0:
truncated_data = series[k, :-truncate]
else:
truncated_data = series[k, :]
new_df = pd.DataFrame({'series': k, 'time_idx': np.arange(len(truncated_data)), 'value': truncated_data})
data = pd.concat([data, new_df], axis=0)
data.reset_index(drop=True, inplace=True)
return data

I executed synthetic_data_tutorial, and got the following exception:
ValueError: Min encoder length and/or min prediction length is too large for 8 series/group

After some digging, I found out that it crashed in:
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff+1)

Digging deeper, I found out that when the series (meaning the different group id's) have different lengths, then the longest one will determine the parameters influencing the sequence length, start and end (in the df_index).

Consequently, after the "filter too short sequences" section, some group id's are being filtered out (although they have sufficient length for prediction).

I managed to circumvent this issue after making the following changes:
In the init() of timeseries.py:
I replaced:
if min_prediction_idx is not None:
data = data[lambda x: data[self.time_idx] >= self.min_prediction_idx - self.max_encoder_length] # before my fix, this was the only line in the clause

with:
delta_per_group = data.groupby(self.group_ids)["time_idx"].max().max() - data.groupby('series')["time_idx"].max()
inds_to_keep = np.zeros(shape=(data.shape[0],)).astype(bool)
for k in delta_per_group.index:
inds_to_keep = np.logical_or(inds_to_keep, np.logical_and(data[self.time_idx] >= self.min_prediction_idx - self.max_encoder_length - delta_per_group[k], np.squeeze(data[self.group_ids] == k)))
data = data[inds_to_keep]

and in _construct_index() in the section of "#filter too short sequences":
I replaced:
(x["sequence_length"] + x["time"] >= self.min_prediction_idx + self.min_prediction_length)
with:
(x["sequence_length"] + x["time"] >= self.min_prediction_length + self.min_prediction_idx - (df_index['time_last'].max() - df_index['time_last']))

df_index now looks exactly as I thought it should, and indeed it passed:
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
and also:
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)
but it crashed in the next line:

Traceback (most recent call last):
File "D:\Users\Lihu\Dropbox\Projects\Maytronics\pytorch_prediction\venv\lib\site-packages\IPython\core\interactiveshell.py", line 2731, in safe_execfile
self.compile if shell_futures else None)
File "D:\pytorch_prediction\venv\lib\site-packages\IPython\utils\py3compat.py", line 168, in execfile
exec(compiler(f.read(), fname, 'exec'), glob, loc)
File "D:\pytorch_prediction\synthetic_data_tutorial.py", line 63, in
actuals = torch.cat([y for x, y in iter(val_dataloader)])
File "D:\pytorch_prediction\synthetic_data_tutorial.py", line 63, in
actuals = torch.cat([y for x, y in iter(val_dataloader)])
File "D:\pytorch_prediction\venv\lib\site-packages\torch\utils\data\dataloader.py", line 363, in next
data = self._next_data()
File "D:\pytorch_prediction\venv\lib\site-packages\torch\utils\data\dataloader.py", line 403, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "D:\pytorch_prediction\venv\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\pytorch_prediction\venv\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\pytorch_prediction\venv\lib\site-packages\pytorch_forecasting\data\timeseries.py", line 932, in getitem
), "Decoder length should be at least minimum prediction length"
AssertionError: Decoder length should be at least minimum prediction length

I'd definitely appreciate some help from someone who knows the code much better than me :)
Thanks,
Lihu

Bug .get_parameters()

  • PyTorch-Forecasting version: 0.5.2
  • PyTorch version: 1.7.0
  • Python version: 3.7
  • Operating System: macOS 10.15.7

Expected behavior

df_train is a pandas dataframe with columns 'timestamp', 'user_id', 'content_id', 'content_type_id', 'prior_question_elapsed_time', 'prior_question_had_explanation_False', 'prior_question_had_explanation_True', 'question_cluster', 'answered_correctly'
a minimal version of the code is:

from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer
max_prediction_length = 1  # forecast 1 question
max_encoder_length = 5  # use 5 question of history

training = TimeSeriesDataSet(data=df_train,
                             time_idx="timestamp",
                             target="answered_correctly",
                             group_ids=["user_id"],
                             min_encoder_length=0,  # allow predictions without history
                             max_encoder_length=max_encoder_length,
                             min_prediction_length=1,
                             max_prediction_length=max_prediction_length,
                             static_categoricals=["user_id"],
                             allow_missings=True,
                             static_reals=[],
                             time_varying_known_categoricals=[],
                             time_varying_known_reals=[],
                             time_varying_unknown_categoricals=[],
                             time_varying_unknown_reals=['content_id',
                                                         'content_type_id',
                                                         'prior_question_elapsed_time',
                                                         'prior_question_had_explanation_False',
                                                         'prior_question_had_explanation_True',
                                                         'question_cluster']
                             )
print(training.get_parameters())

I would expect to get a dict with the parameters on the std output

Actual behavior

However, I get the following:

Traceback (most recent call last):
File "/Users/silvio/Desktop/working/train_TS_nn.py", line 144, in
print(training.get_parameters())
File "/Users/silvio/Desktop/working/venv/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py", line 615, in get_parameters
name: getattr(self, name) for name in inspect.signature(self.class).parameters.keys() if name != "data"
File "/Users/silvio/Desktop/working/venv/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py", line 615, in
name: getattr(self, name) for name in inspect.signature(self.class).parameters.keys() if name != "data"
AttributeError: 'TimeSeriesDataSet' object has no attribute 'args'

How to format a pandas dataframe

Hi @jdb78 !! love the library :).

I noticed the time series in the nbeats example are stacked on top of each other.

how do a format a pandas data frame to look the same.
would i need to normalise my data first? -

' ' ' python
import yfinance as yf
data = yf.download("SPY IBM AMZN AAPL", start="2017-01-01", end="2017-04-30")

data['Close']
'''

AssertionError: filters should not remove entries

  • PyTorch-Forecasting version: 0.5.3
  • PyTorch version: 1.7.0+cu101
  • Python version: 3.6.9
  • Operating System: Windows

Expected behavior

I ran the code to create a TimeSeriesDataSet and expected the code to create the object in order to move on to the validation split.

Actual behavior

  • The result was this error "AssertionError: filters should not remove entries"
  • It has to do with the creation of an index of samples.
  • My id column is a list of property id's.
  • Could it have something to do with the formatting of our dataset that we might have to change?
  • The screenshot below is a result from running the following line of code: data.groupby('property_id', observed=True).head()

image

Code to reproduce the problem

max_prediction_length = 6
max_encoder_length = 3914
training_cutoff = data["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx = col_def['time_idx'],
target = col_def['target'], #Value
group_ids = col_def['group_ids'], # the error stems from _construct_index in timeseries.py
min_encoder_length=max_encoder_length // 2, # keep encoder length long (as it is in the validation set)
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
static_categoricals = col_def['static_categoricals'],
static_reals = col_def['static_reals'],
time_varying_known_categoricals = col_def['time_varying_known_categoricals'],
variable_groups = {}, # group of categorical variables can be treated as one variable
time_varying_known_reals = col_def['time_varying_known_reals'],
time_varying_unknown_categoricals = [],
time_varying_unknown_reals = col_def['time_varying_unknown_reals'],
target_normalizer=GroupNormalizer(
groups = col_def['group_ids'], coerce_positive=1.0
), # use softplus with beta=1.0 and normalize by group
add_relative_time_idx = True,
add_target_scales = True,
add_encoder_length = True,
)

Is there any other code you might need to see to be able to better understand where the issue might stem from?

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

image

Tensorflow issue in Stallion Notebook

Stallion Notebook does not finish running. I believe it has to do with having some specific version of tensorflow.

I'm using:

pytorch-lightning==0.9.0
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==2.0.1
tensorflow-gpu==2.3.0
tensorflow-gpu-estimator==2.3.0

Here's the traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-d89dc578d7fe> in <module>
      1 # fit network
      2 trainer.fit(
----> 3     tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader,
      4 )

~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\states.py in wrapped_fn(self, *args, **kwargs)
     46             if entering is not None:
     47                 self.state = entering
---> 48             result = fn(self, *args, **kwargs)
     49 
     50             # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted

~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
   1082             self.accelerator_backend = CPUBackend(self)
   1083             self.accelerator_backend.setup(model)
-> 1084             results = self.accelerator_backend.train(model)
   1085 
   1086         # on fit end callback

~\Anaconda3\lib\site-packages\pytorch_lightning\accelerators\cpu_backend.py in train(self, model)
     37 
     38     def train(self, model):
---> 39         results = self.trainer.run_pretrain_routine(model)
     40         return results

~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py in run_pretrain_routine(self, model)
   1237 
   1238         # CORE TRAINING LOOP
-> 1239         self.train()
   1240 
   1241     def _run_sanity_check(self, ref_model, model):

~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\training_loop.py in train(self)
    407                 if self.should_stop:
    408                     if (met_min_epochs and met_min_steps):
--> 409                         self.run_training_teardown()
    410                         return
    411                     else:

~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\training_loop.py in run_training_teardown(self)
   1143             # model hooks
   1144             if self.is_function_implemented('on_train_end'):
-> 1145                 self.get_model().on_train_end()
   1146 
   1147         if self.logger is not None:

~\Documents\GitHub\pytorch-forecasting\pytorch_forecasting\models\temporal_fusion_transformer\__init__.py in on_train_end(self)
    584     def on_train_end(self):
    585         if self.log_interval(train=True) > 0:
--> 586             self._log_embeddings()
    587 
    588     def step(self, x, y, batch_idx, label="train"):

~\Documents\GitHub\pytorch-forecasting\pytorch_forecasting\models\temporal_fusion_transformer\__init__.py in _log_embeddings(self)
    868             labels = self.hparams.embedding_labels[name]
    869             self.logger.experiment.add_embedding(
--> 870                 emb.weight.data.cpu(), metadata=labels, tag=name, global_step=self.global_step
    871             )

~\Anaconda3\lib\site-packages\torch\utils\tensorboard\writer.py in add_embedding(self, mat, metadata, label_img, global_step, tag, metadata_header)
    786         save_path = os.path.join(self._get_file_writer().get_logdir(), subdir)
    787 
--> 788         fs = tf.io.gfile.get_filesystem(save_path)
    789         if fs.exists(save_path):
    790             if fs.isdir(save_path):

AttributeError: module 'tensorflow._api.v2.io.gfile' has no attribute 'get_filesystem'

Questions about TimeSeriesDataSet structure and sampling

First of all, this library is fantastic- thanks so much for the work!

I am attempting to train TFT on a very large time series dataset, but am finding working with the TimeSeriesDataSet fairly confusing with the current documentation; it would be great if anyone can clarify the following:

  • Should group_ids identify samples of time series or whole time series? If i have some time series (call it 'PRICE') with a length of 1,000 but a sample length of 100 for my model, should I create e.g. 10 group identifiers ('PRICE0-PRICE9') to specify training examples, or should I keep as single series?

  • How are training examples sampled from a TimeSeriesDataSet? Assuming group_ids identify whole time series, how are training examples sampled from a series? Are random blocks taken? Are they taken in order? This is not very clear.

  • Are samples aligned by time_idx? If I have multiple time series which should be aligned (e.g. prices of multiple securities), will they be aligned by time_idx when sampled? If so, what happens when points are missing?

  • How are validation examples sampled?

  • Is there support for very large datasets? The dataset I am working with is >100GB of time series data; loading it all into a DataFrame isn't possible. Is there support for building custom DataLoaders which can load batches at train time?

I would be happy to update documentation if these are answered, thanks a lot!

N-beats error

Hi ! love the look of the package! looks amazing!

I am getting an error for the nbeats example.

load data
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
Number of parameters in network: 1859.9k

| Name | Type | Params

0 | loss | SMAPE | 0
1 | logging_metrics | ModuleList | 0
2 | net_blocks | ModuleList | 1 M
Validation sanity check: 0it [00:00, ?it/s]
/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 4 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
warnings.warn(*args, **kwargs)

TypeError Traceback (most recent call last)
in
84 # net.hparams.learning_rate = res.suggestion()
85
---> 86 trainer.fit(
87 net,
88 train_dataloader=train_dataloader,

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
438 self.call_hook('on_fit_start')
439
--> 440 results = self.accelerator_backend.train()
441 self.accelerator_backend.teardown()
442

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py in train(self)
46
47 # train or test
---> 48 results = self.train_or_test()
49 return results
50

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
64 results = self.trainer.run_test()
65 else:
---> 66 results = self.trainer.train()
67 return results
68

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
460
461 def train(self):
--> 462 self.run_sanity_check(self.get_model())
463
464 # enable train mode

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
646
647 # run eval step
--> 648 _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
649
650 # allow no returns from eval

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, test_mode, max_batches)
554 dl_max_batches = self.evaluation_loop.max_batches[dataloader_idx]
555
--> 556 for batch_idx, batch in enumerate(dataloader):
557 if batch is None:
558 continue

~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py in next(self)
343
344 def next(self):
--> 345 data = self._next_data()
346 self._num_yielded += 1
347 if self._dataset_kind == _DatasetKind.Iterable and \

~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
854 else:
855 del self._task_info[idx]
--> 856 return self._process_data(data)
857
858 def _try_put_index(self):

~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
879 self._try_put_index()
880 if isinstance(data, ExceptionWrapper):
--> 881 data.reraise()
882 return data
883

~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
392 # (https://bugs.python.org/issue2651), so we work around it.
393 msg = KeyErrorMessage(msg)
--> 394 raise self.exc_type(msg)

TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py", line 968, in getitem
self.target_normalizer.fit(target[:encoder_length])
TypeError: only integer tensors of a single element can be converted to an index

Implement New Model: Graph Deep Factors for Forecasting

Description

The following is taken from Graph Deep Factors for Forecasting:

Deep probabilistic forecasting techniques have recently been proposed for modeling large collections of time-series. However, these techniques explicitly assume either complete independence (local model) or complete dependence (global model) between time-series in the collection. This corresponds to the two extreme cases where every time-series is disconnected from every other time-series in the collection or likewise, that every time-series is related to every other time-series resulting in a completely connected graph. In this work, we propose a deep hybrid probabilistic graph-based forecasting framework called Graph Deep Factors (GraphDF) that goes beyond these two extremes by allowing nodes and their time-series to be connected to others in an arbitrary fashion. GraphDF is a hybrid forecasting framework that consists of a relational global and relational local model. In particular, we propose a relational global model that learns complex non-linear time-series patterns globally using the structure of the graph to improve both forecasting accuracy and computational efficiency. Similarly, instead of modeling every time-series independently, we learn a relational local model that not only considers its individual time-series but also the time-series of nodes that are connected in the graph.

The idea is to have a global-local model that explicitly considers the local pattern of each time series, which is in contrast to purely global models, such as DeepAR, MQRNN, etc.

image

Expand example logging for multiple logging platforms

Following up on the discussion from issue #79, there is a need to extend the logging capabilities for keeping track of training figures (e.g. showing attention, forecast quantiles, etc.) for loggers beyond tensorboard (e.g. W&B). The self.logger.experiment.add_figure() lines that exist in models.base_model seem to be the route of the issue, as not every logger platform has an add_figure() method for its experiment (or Run in W&B's case) objects.

This is a to-do item for now. Can currently be circumvented (at least in the case of using W&B) by setting log_interval=-1 in the instantiation for TemporalFusionTransformer object.

Error when Using Distributed GPU Processing

When I initialize my TFT trainer to use multiple GPUs

# Configure network and trainer
pl.seed_everything(407)
trainer = pl.Trainer(
    gpus = [0],
    gradient_clip_val = 0.1  # hyperparam to prevent gradient divergance for RNNs
)

tft = TemporalFusionTransformer.from_dataset(
    training,
    # not meaningful for finding the learning rate but otherwise very important
    learning_rate = 0.03,
    hidden_size = 16,  # most important hyperparameter apart from learning rate
    # number of attention heads. Set to up to 4 for large datasets
    attention_head_size = 1,
    dropout = 0.1,  # between 0.1 and 0.3 are good values
    hidden_continuous_size = 8,  # set to <= hidden_size
    output_size = 7,  # 7 quantiles by default
    loss = QuantileLoss(),
    # reduce learning rate if no improvement in validation loss after x epochs
    reduce_on_plateau_patience = 4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

The library is able to recognize that I used both GPUs

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0,1]
Number of parameters in network: 23.4k

However, when I try to find the optimal learning rate

# Find optimal learning rate
res = trainer.lr_find(
    tft,
    train_dataloader = train_dataloader,
    val_dataloaders = val_dataloader,
    max_lr = 10.,
    min_lr = 1e-6,
)

print(f"Suggested learning rate: {res.suggestion()}")
fig = res.plot(show = True, suggest = True)
fig.show()

I get an AttributeError: Can't pickle local object '_apply_to_outputs.<locals>.decorator_fn.<locals>.new_func' error with the following trace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-29-01060df08a43> in <module>
      1 # Find optimal learning rate
----> 2 res = trainer.lr_find(
      3     tft,
      4     train_dataloader = train_dataloader,
      5     val_dataloaders = val_dataloader,

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/lr_finder.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold)
    198 
    199         # Fit, lr & loss logged in callback
--> 200         self.fit(model,
    201                  train_dataloader=train_dataloader,
    202                  val_dataloaders=val_dataloaders)

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
     46             if entering is not None:
     47                 self.state = entering
---> 48             result = fn(self, *args, **kwargs)
     49 
     50             # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
   1050             self.accelerator_backend = DDPSpawnBackend(self)
   1051             self.accelerator_backend.setup()
-> 1052             self.accelerator_backend.train(model, nprocs=self.num_processes)
   1053             results = self.accelerator_backend.teardown(model)
   1054 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_spawn_backend.py in train(self, model, nprocs)
     41 
     42     def train(self, model, nprocs):
---> 43         mp.spawn(self.ddp_train, nprocs=nprocs, args=(self.mp_queue, model,))
     44 
     45     def teardown(self, model):

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/torch/multiprocessing/spawn.py in spawn(fn, args, nprocs, join, daemon)
    160             daemon=daemon,
    161         )
--> 162         process.start()
    163         error_queues.append(error_queue)
    164         processes.append(process)

~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/process.py in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/context.py in _Popen(process_obj)
    282         def _Popen(process_obj):
    283             from .popen_spawn_posix import Popen
--> 284             return Popen(process_obj)
    285 
    286     class ForkServerProcess(process.BaseProcess):

~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/popen_spawn_posix.py in __init__(self, process_obj)
     30     def __init__(self, process_obj):
     31         self._fds = []
---> 32         super().__init__(process_obj)
     33 
     34     def duplicate_for_child(self, fd):

~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/popen_fork.py in __init__(self, process_obj)
     17         self.returncode = None
     18         self.finalizer = None
---> 19         self._launch(process_obj)
     20 
     21     def duplicate_for_child(self, fd):

~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/popen_spawn_posix.py in _launch(self, process_obj)
     45         try:
     46             reduction.dump(prep_data, fp)
---> 47             reduction.dump(process_obj, fp)
     48         finally:
     49             set_spawning_popen(None)

~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

AttributeError: Can't pickle local object '_apply_to_outputs.<locals>.decorator_fn.<locals>.new_func'

Any idea what may be triggering this? My guess is that because I'm not distributing across multiple machines, the pickle is getting messed up. That's fine and just indicates I misunderstood that setting for distributed_backend, but moving on, I hit errors with the other distributed_backend settings as well.

Following https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#distributed-modes, when I hard-code distributed_backend to ddp2, I get this trace

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp2_backend.py in _resolve_task_idx(self)
     52             try:
---> 53                 self.task_idx = int(os.environ['LOCAL_RANK'])
     54             except Exception as e:

~/anaconda3/envs/forecasting/lib/python3.8/os.py in __getitem__(self, key)
    674             # raise KeyError with the original key value
--> 675             raise KeyError(key) from None
    676         return self.decodevalue(value)

KeyError: 'LOCAL_RANK'

During handling of the above exception, another exception occurred:

MisconfigurationException                 Traceback (most recent call last)
<ipython-input-29-01060df08a43> in <module>
      1 # Find optimal learning rate
----> 2 res = trainer.lr_find(
      3     tft,
      4     train_dataloader = train_dataloader,
      5     val_dataloaders = val_dataloader,

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/lr_finder.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold)
    198 
    199         # Fit, lr & loss logged in callback
--> 200         self.fit(model,
    201                  train_dataloader=train_dataloader,
    202                  val_dataloaders=val_dataloaders)

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
     46             if entering is not None:
     47                 self.state = entering
---> 48             result = fn(self, *args, **kwargs)
     49 
     50             # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
   1033         if self.use_ddp2:
   1034             self.accelerator_backend = DDP2Backend(self)
-> 1035             self.accelerator_backend.setup()
   1036             self.accelerator_backend.train(model)
   1037 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp2_backend.py in setup(self)
     43 
     44     def setup(self):
---> 45         self._resolve_task_idx()
     46 
     47     def _resolve_task_idx(self):

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp2_backend.py in _resolve_task_idx(self)
     54             except Exception as e:
     55                 m = 'ddp2 only works in SLURM or via torchelastic with the WORLD_SIZE, LOCAL_RANK, GROUP_RANK flags'
---> 56                 raise MisconfigurationException(m)
     57 
     58     def train(self, model):

MisconfigurationException: ddp2 only works in SLURM or via torchelastic with the WORLD_SIZE, LOCAL_RANK, GROUP_RANK flags

and when I hard-code distributed_backend to dp (which is what I would expect to work most readily), I get

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-01060df08a43> in <module>
      1 # Find optimal learning rate
----> 2 res = trainer.lr_find(
      3     tft,
      4     train_dataloader = train_dataloader,
      5     val_dataloaders = val_dataloader,

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/lr_finder.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold)
    198 
    199         # Fit, lr & loss logged in callback
--> 200         self.fit(model,
    201                  train_dataloader=train_dataloader,
    202                  val_dataloaders=val_dataloaders)

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
     46             if entering is not None:
     47                 self.state = entering
---> 48             result = fn(self, *args, **kwargs)
     49 
     50             # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
   1062             self.accelerator_backend = DataParallelBackend(self)
   1063             self.accelerator_backend.setup(model)
-> 1064             results = self.accelerator_backend.train()
   1065             self.accelerator_backend.teardown()
   1066 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/dp_backend.py in train(self)
     95     def train(self):
     96         model = self.trainer.model
---> 97         results = self.trainer.run_pretrain_routine(model)
     98         return results
     99 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
   1222 
   1223         # run a few val batches before training starts
-> 1224         self._run_sanity_check(ref_model, model)
   1225 
   1226         # clear cache before training

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _run_sanity_check(self, ref_model, model)
   1255             num_loaders = len(self.val_dataloaders)
   1256             max_batches = [self.num_sanity_val_steps] * num_loaders
-> 1257             eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
   1258 
   1259             # allow no returns from eval

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py in _evaluate(self, model, dataloaders, max_batches, test_mode)
    394         # ---------------------
    395         using_eval_result = len(outputs) > 0 and len(outputs[0]) > 0 and isinstance(outputs[0][0], EvalResult)
--> 396         eval_results = self.__run_eval_epoch_end(test_mode, outputs, dataloaders, using_eval_result)
    397 
    398         # log callback metrics

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py in __run_eval_epoch_end(self, test_mode, outputs, dataloaders, using_eval_result)
    488                     eval_results = self.__gather_epoch_end_eval_results(outputs)
    489 
--> 490                 eval_results = model.validation_epoch_end(eval_results)
    491                 user_reduced = True
    492 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/base_model.py in validation_epoch_end(self, outputs)
    142 
    143     def validation_epoch_end(self, outputs):
--> 144         log, _ = self.epoch_end(outputs, label="val")
    145         return log
    146 

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in epoch_end(self, outputs, label)
    611         run at epoch end for training or validation
    612         """
--> 613         log, out = super().epoch_end(outputs, label=label)
    614         if self.log_interval(label == "train") > 0:
    615             self._log_interpretation(out, label=label)

~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/base_model.py in epoch_end(self, outputs, label)
    245             outputs = [out["callback_metrics"] for out in outputs]
    246         # log average loss and metrics
--> 247         n_samples = sum([x["n_samples"] for x in outputs])
    248         avg_loss = torch.stack([x[f"{label}_loss"] * x["n_samples"] / n_samples for x in outputs]).sum()
    249         log_keys = outputs[0]["log"].keys()

TypeError: unsupported operand type(s) for +: 'int' and 'list'

When I use ddp (as recommended for pytorch, given the speedup), the pipeline freezes and running watch nvidia-smi from the terminal just shows the GPUs aren't moving and aren't loading any memory for processing.

This error is thrown using the same setup as I had in #85, which I got working on a single GPU but now that I'm doing multivariate time series across all 50 states I'd really like to use both my GPUs to speed up the runtime.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.