cchallu / n-hits Goto Github PK

View Code? Open in Web Editor NEW

165.0 165.0 26.0 1.87 MB

Python 56.51% Dockerfile 0.13% Makefile 0.13% Jupyter Notebook 43.23%

n-hits's People

Contributors

Stargazers

Watchers

n-hits's Issues

what is the file 'hyperopt' ?

Clarification regarding implementation

Hi! I have some queries about the following line of code - this seems make your implemented model somewhat different from what you specified in your paper:

n-hits/src/models/nhits/nhits.py

Line 330 in 7b12bd3

forecast = insample_y[:, -1:] # Level with Naive1

My understanding is that this line of code specifies that the final forecast also includes the first value of the lookback window, meaning you are predicting the "change in value" rather than the actual time series value. Is there any reason for doing this? Thank you for your time!

About your data processing + final result

In your code, seems you normalized the orginal data. but when you calculated the MSE and mae, you didn't transform them back to the normal scale? Then are the mse and mae wrong in such case ?

different settings (nhits vs. autoformer)

Hi! Thank you for sharing your source code.

I have some questions about the settings of NHITS and Autoformer.

I think there might be some unfair comparisons in your Tab 2 because you compared the Autoformer's reported results but used different settings in the NHITS model.

Q1: the length of the history window
you use 5*args.horizon for NHITS. But for Autoformer, you use a shorter length (say, 1*args.horizon.) Here args.horizon=96.

When using a history length of 5*96, your reported result of ECL-96 is 0.147 (I can reproduce this by re-running your released code). The Autoformer's reported result is 0.201 (use only a 96-length window).

I tried some experiments and get results as follows:

using the same setting for NHITS (96-length window), the result of ECL-96 is MSE: 0.1902 / MAE: 0.2739

it seems the length of history window is an important hyperparameter.

By the way, using 5*96-length window for NBeats model, I get a much better result of ECL-96 is MSE: 0.1340 / MAE: 0.2311

Q2: the spilt of train/val/test set
you use masks (train_mask_df, valid_mask_df, test_mask_df) to indicate the parts of train/valid/test.
However, in autoformer's setting (see https://github.com/thuml/Autoformer) the borders are

border1s = [0, num_train - self.seq_len, len(df_raw) - num_test - self.seq_len]
border2s = [num_train, num_train + num_vali, len(df_raw)]

Here, it seems you did not use the overlap part like [num_train - self.seq_len, num_train + num_vali]

So my question here is whether the same number of test samples are used for evaluation.
If not, I think it might be unfair to directly compare Autoformer's results in your Tab 2.

Clarification regarding data pre-processing

Hello,

I was trying to run N-HiTS.
Can you please let me know the motivation behind this data preprocessing step:
https://github.com/cchallu/n-hits/blob/main/src/data/datasets/ett.py#L45

Could you also provide some insight on the usage of the above mentioned code in the program.

Thank you

Follow up to "change-in-level" forecast

Hi @cchallu, this is a follow up to issue #8. I was also wondering if it would be better to use the last value of the lookback window instead of the first value in the lookback window, mainly because the first value is sometimes masked?

Clarification regarding data normalization

Hello,

I was trying to run N-HiTS with my own data using the shared colab

I tried to normalize the original EETm2 dataset and compared it with the data used in your N-HiTS model.

The size of df_train is 46641, and I followed the information given in section 4.1: Each set is normalized with the train data mean and standard deviation.

def normalize(df_csv, df_train):
result = df_csv.copy()
columns_names = list(df_csv.columns)
for feature_name in columns_names[1:]:
result[feature_name] = (df_csv[feature_name] - df_train[feature_name].mean()) / df_train[feature_name].std()
return result

My function return different result comparing to yours:
date HUFL
2016-07-01 00:00:00 0.126520
2016-07-01 00:15:00 -0.023339
2016-07-01 00:30:00 -0.098268
2016-07-01 00:45:00 -0.431177
2016-07-01 01:00:00 -0.231432
Name: HUFL, dtype: float64

and yours:
unique_id | ds | y
HUFL | 2016-07-01 00:00:00 | -0.041413
HUFL | 2016-07-01 00:15:00 | -0.185467
HUFL | 2016-07-01 00:30:00 | -0.257495
HUFL | 2016-07-01 00:45:00 | -0.577510
HUFL | 2016-07-01 01:00:00 | -0.385501

Can you please tell me more about the data normalization process?

Thanks and regards,

Sophie

Reproducing Results

Hello,

I downloaded the repository to my computer and tried to reproduce the results that were published in the paper for the traffic dataset with a prediction window of length 96. I ran the code with the following args:

--hyperopt_max_evals
10
--experiment_id
run_1

But the results were 0.504 for the MSE and 0.311 for the MAE which is significantly worse than what I was expecting to achieve. Is there anything else that needs to be done before running the code and training the model in order to reproduce the results?

Thanks in advance!

I can't see the model's detail in the code

I want see the model's detail in the code,but i found the Pytorch Lightning in the pycharm can't debug, they just run,how can i see the training data flows in the model? And it will makes me understand the model better. Thank you.

/

Question on n_time_in

Hi,

Thank you for publishing your code and also thanks for your interesting paper. I am now trying to use your code but I am not sure if I need to update the hyper opt space for n_time_in?

The current settings in nhits_multivariate is set to 'n_time_in': hp.choice('n_time_in', [5*args.horizon]) which results 960 inputs for a horizon length of 192, I was wondering is it the one used in your experiments or should I changed it to 96?

Thanks

How to make multivariate forecasting?

Hi, I read the code in n_hits_multivariate.py, but get confused of the way the datasets are loaded. In the tsdataset.py, the DataFrame Y_df is defined as 'Target time series with columns ['unique_id', 'ds', 'y']', well, take ETT dataset as an example, there are one column as 'date' and other 7 columns as 'variables of different nodes', currently I view the 'unique_id' part as the default index of panda.DataFrame, so what is the 'ds' and 'y'? What's more, it seems that N-Hits works in an univariate way in the following line:

n-hits/src/models/nhits/nhits.py

Line 330 in d882ee6

forecast = insample_y[:, -1:] # Level with Naive1

That makes me confused, how does N-Hits make multivariate predictions? By individually yielding prediction results of each univariate variable?

About training procedures and doc

Update: Additional questions

Your data pipeline seems quite non-traditional for me. At each training step, you randomly sample 256 windows from one time series as model input. A training epoch is finished by sampling each series once. I understand that it's a univariate model, but I don't see why you leave it to probability to cover the entire training span.
I tried an ablation by feeding the data in multivariate fashion, i.e. input a history of all variables, roll windows along time dimension, learning (N, S) -> (N, T) where N == num_series. The result was bad on traffic dataset. Could you help explain?
The paper says that you have lr halved three times across the training procedure. However, you mis-configured your pl_module. The default lr_schedule interval is epoch (ref. https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#configure-optimizers), which means that you actually kept training with initial lr till the end.
You chose a training step of 1000, which is conservative considering your data feeding. For example, each ts is covered at most twice using traffic dataset. Training more steps slightly improved over your reported results on traffic dataset (at least).

I hope these could help improve your model (Of course the metric presented is already impressive enough :).

===============================================
Thank you for this amazing work. I found these typo and doc issues:

n-hits/src/models/nhits/nhits.py

Lines 398 to 405 in 4e929ed

 """ 

  N-HiTS model. 

  Parameters 

  ---------- 

  n_time_in: int 

  Multiplier to get insample size. 

  Insample size = n_time_in * output_size

while documented as a multiplier, n_time_in is actually the final Lookback period

n-hits/src/models/nhits/nhits.py

Lines 248 to 250 in 4e929ed

 for i in range(len(stack_types)): 

 #print(f'| -- Stack {stack_types[i]} (#{i})') 

 for block_id in range(n_blocks[i]):

n_layers in nhits_multivariate.py should be [ 3*[2] ] rather than 9 since elements are indexed across 3 stacks
loss_hypar should be an int like 7 or 24 from its context
There are bypassed logics for exogenous variables in nhits model. I wonder if they can be put into work now?

Is backcast interpolated?

n-hits/src/models/nhits/nhits.py

Lines 55 to 68 in 4e929ed

 def forward(self, theta: t.Tensor, insample_x_t: t.Tensor, outsample_x_t: t.Tensor) -> Tuple[t.Tensor, t.Tensor]: 

 backcast = theta[:, :self.backcast_size] 

 knots = theta[:, self.backcast_size:] 

 if self.interpolation_mode=='nearest': 

 knots = knots[:,None,:] 

 forecast = F.interpolate(knots, size=self.forecast_size, mode=self.interpolation_mode) 

 forecast = forecast[:,0,:] 

 elif self.interpolation_mode=='linear': 

 knots = knots[:,None,:] 

 forecast = F.interpolate(knots, size=self.forecast_size, mode=self.interpolation_mode) #, align_corners=True) 

 forecast = forecast[:,0,:] 

 elif 'cubic' in self.interpolation_mode:

n-hits/src/models/nhits/nhits.py

Lines 263 to 266 in 4e929ed

 n_theta = (n_time_in + max(n_time_out//n_freq_downsample[i], 1) ) 

 basis = IdentityBasis(backcast_size=n_time_in, 

 forecast_size=n_time_out, 

 interpolation_mode=interpolation_mode)

n-hits/src/models/nhits/nhits.py

Lines 156 to 157 in 4e929ed

 output_layer = [nn.Linear(in_features=n_theta_hidden[-1], out_features=n_theta)] 

 layers = hidden_layers + output_layer

According to these code blocks, It seems that Interpolation is used for synthesizing forecast only and the backcast is generated directly thru MLP. But Eq. 3 of your paper 3.3 states that forecast and backcast are interpolated in a similar way. Is there any reason behind this discrepency?

Thank you for your time!

I am confuse about this line ： n_theta = (input_size + max(h//n_freq_downsample[i], 1) )

I am confuse about this line ：
n_theta = (input_size + max(h//n_freq_downsample[i], 1) )

I think the n_theta is larger than the input_size , but freq_downsample doesn't means the output_size should smaller than input_size?
I means inorder to do Interpolation-----the up-sampling?
Can you help me ? Thank you!!

	"""
	N-HiTS model.

	Parameters
	----------
	n_time_in: int
	Multiplier to get insample size.
	Insample size = n_time_in * output_size

	for i in range(len(stack_types)):
	#print(f'\| -- Stack {stack_types[i]} (#{i})')
	for block_id in range(n_blocks[i]):

	def forward(self, theta: t.Tensor, insample_x_t: t.Tensor, outsample_x_t: t.Tensor) -> Tuple[t.Tensor, t.Tensor]:

	backcast = theta[:, :self.backcast_size]
	knots = theta[:, self.backcast_size:]

	if self.interpolation_mode=='nearest':
	knots = knots[:,None,:]
	forecast = F.interpolate(knots, size=self.forecast_size, mode=self.interpolation_mode)
	forecast = forecast[:,0,:]
	elif self.interpolation_mode=='linear':
	knots = knots[:,None,:]
	forecast = F.interpolate(knots, size=self.forecast_size, mode=self.interpolation_mode) #, align_corners=True)
	forecast = forecast[:,0,:]
	elif 'cubic' in self.interpolation_mode:

	n_theta = (n_time_in + max(n_time_out//n_freq_downsample[i], 1) )
	basis = IdentityBasis(backcast_size=n_time_in,
	forecast_size=n_time_out,
	interpolation_mode=interpolation_mode)

	output_layer = [nn.Linear(in_features=n_theta_hidden[-1], out_features=n_theta)]
	layers = hidden_layers + output_layer

cchallu / n-hits Goto Github PK

n-hits's People

Contributors

Stargazers

Watchers

Forkers

n-hits's Issues

Recommend Projects

Recommend Topics

Recommend Org