I have read the Medium <a href="https://medium.com/analytics-vidhya/forecasting-in-pyt

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Here is <a href="https://colab.research.google.com/drive/1WjBbQzaivQhOldGolzymOtLmo6QX

Is it really as simple as: <div class="snippet-clipboard-content notranslate posit

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to prepare the data format? about esrnn_torch HOT 12 CLOSED

gsamaras commented on September 3, 2024

How to prepare the data format?

from esrnn_torch.

Comments (12)

kdgutier commented on September 3, 2024 2

I suggest you to move to the NeuralForecast library, and try running the N-HiTS example.

from esrnn_torch.

kdgutier commented on September 3, 2024

Hey @gsamaras

Try doing this:

X_df['ds'] = X_df['time'].values
Y_df['ds'] = Y_df['time'].values
del X_df['time'], Y_df['time']

Also from your example it is a bit difficult to know what format your time is in. You might want to take a look to the pd.to_datetime converter or use the np.date_range() function. Take a look to this StackOverflow post.

By the way, I recommend you to use the NeuralForecast library as we are migrating our attention to that repository.

from esrnn_torch.

gsamaras commented on September 3, 2024

@kdgutier my problem is not how to code it, but the logic. I had figured out the time column, but what about the others? I mean I want to predict the usage column of my data, somehow I should inject it into X_df, right?

Yes, that was my next question since I saw it in another issue, thanks. But since I opened the issue here we can continue the discussion here if you like; otherwise I can migrate it.

from esrnn_torch.

kdgutier commented on September 3, 2024

Here is Google Colab N-HiTS example.

You should rename your 'usage' -> 'y'.

from esrnn_torch.

gsamaras commented on September 3, 2024

Is it really as simple as:

X_df['ds'] = X_df['time'].values
y_df['ds'] = y_df['time'].values
X_df['x'] = X_df['bw'].values
y_df['y'] = y_df['bw'].values
X_df['unique_id']='dummy'
y_df['unique_id']='dummy'

# same for test data

@kdgutier?

Train compltes, but when I try to predict:
y_hat_df = model.predict(X_test_df)
I get:

ValueError: You are trying to merge on float64 and datetime64[ns] columns. If you wish to proceed you should use pd.concat

which I think happens because of some incompatibility between time and ds perhaps, which is another issue I guess.

from esrnn_torch.

kdgutier commented on September 3, 2024

I recommend to convert your 'time' column to a date stamp using the pd.to_datetime, a lot of methods in the Neuralforecast library rely on you sending datetime formatted 'ds'.

The line X_df['x'] = X_df['bw'].values will cause you leakage.
The dataset already considers autorregresive features by default if you restrict to send only Y_df

from esrnn_torch.

gsamaras commented on September 3, 2024

@kdgutier apologies for the late response because of the weekend. OK I think we are very close, but now predict crashes with batch size (=32). Here is the situation:

y = df.pop('y')
X = df
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
y_train = y_train.to_frame()
X_train['unique_id']='dummy'
y_train['unique_id']='dummy'
y_train['ds'] = X_train['ds']

print(X_train)
print(y_train)
# same for test data
...

which outputs:

       x                            ds unique_id
0     fe 2020-05-13 08:45:57.228000000     dummy
1     fe 2020-05-13 08:46:58.343000064     dummy
2     fe 2020-05-13 08:47:59.299000064     dummy
3     fe 2020-05-13 08:49:00.236000000     dummy
4     fe 2020-05-13 08:50:01.188999936     dummy
...   ..                           ...       ...
6887  fe 2020-05-18 06:05:54.928000000     dummy
6888  fe 2020-05-18 06:06:55.985999872     dummy
6889  fe 2020-05-18 06:07:57.731000064     dummy
6890  fe 2020-05-18 06:08:58.804999936     dummy
6891  fe 2020-05-18 06:09:59.864000000     dummy

[6892 rows x 3 columns]
              y unique_id                            ds
0     1575.6520     dummy 2020-05-13 08:45:57.228000000
1     1575.6520     dummy 2020-05-13 08:46:58.343000064
2     1527.7666     dummy 2020-05-13 08:47:59.299000064
3     1527.7666     dummy 2020-05-13 08:49:00.236000000
4     1477.7880     dummy 2020-05-13 08:50:01.188999936
...         ...       ...                           ...
6887  1675.4131     dummy 2020-05-18 06:05:54.928000000
6888  1641.9484     dummy 2020-05-18 06:06:55.985999872
6889  1646.2307     dummy 2020-05-18 06:07:57.731000064
6890  1646.2307     dummy 2020-05-18 06:08:58.804999936
6891  1650.9961     dummy 2020-05-18 06:09:59.864000000

[6892 rows x 3 columns]
       x                            ds unique_id
6892  fe 2020-05-18 06:11:00.937999872     dummy
6893  fe 2020-05-18 06:12:02.014000128     dummy
6894  fe 2020-05-18 06:13:03.060000000     dummy
6895  fe 2020-05-18 06:14:04.118000128     dummy
6896  fe 2020-05-18 06:15:05.411000064     dummy
...   ..                           ...       ...
8610  fe 2020-05-19 11:28:28.334000128     dummy
8611  fe 2020-05-19 11:29:29.504000000     dummy
8612  fe 2020-05-19 11:30:30.544000000     dummy
8613  fe 2020-05-19 11:31:31.724000000     dummy
8614  fe 2020-05-19 11:32:32.780000000     dummy

[1723 rows x 3 columns]
              y unique_id                            ds
6892  1652.7509     dummy 2020-05-18 06:11:00.937999872
6893  1616.0997     dummy 2020-05-18 06:12:02.014000128
6894  1616.0997     dummy 2020-05-18 06:13:03.060000000
6895  1725.8965     dummy 2020-05-18 06:14:04.118000128
6896  1790.9973     dummy 2020-05-18 06:15:05.411000064
...         ...       ...                           ...
8610  1007.4689     dummy 2020-05-19 11:28:28.334000128
8611  1020.8758     dummy 2020-05-19 11:29:29.504000000
8612  1020.8758     dummy 2020-05-19 11:30:30.544000000
8613  1059.2924     dummy 2020-05-19 11:31:31.724000000
8614  1025.6858     dummy 2020-05-19 11:32:32.780000000

[1723 rows x 3 columns]

Then I train exactly as explained in the Medium post:

model = ESRNN(max_epochs=3, freq_of_test=1, batch_size=32, ...)
model.fit(X_train, y_train)

and predict like this:

y_hat = model.predict(X_test)

which crashes:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-58-01b48d2fa9d2>](https://localhost:8080/#) in <module>()
      5 print(y_test.shape)
      6 
----> 7 y_hat = model.predict(X_test)
      8 
      9 # Evaluate predictions

1 frames
[/usr/local/lib/python3.7/dist-packages/ESRNN/utils/data.py](https://localhost:8080/#) in update_batch_size(self, new_batch_size)
     84   def update_batch_size(self, new_batch_size):
     85     self.batch_size = new_batch_size
---> 86     assert self.batch_size <= self.n_series
     87     self.n_batches = int(np.ceil(self.n_series / self.batch_size))
     88 

AssertionError:

What am I missing? Test size (1723) is greater than batch size (32).

from esrnn_torch.

kdgutier commented on September 3, 2024

The dataloader takes into consideration the number of 'unique_id', you have all of them as 'dummy' in your example.

from esrnn_torch.

gsamaras commented on September 3, 2024

Hmm to be honest I don't really understand how the unique_id applies in my case, where I want to do univariate time series forecasting on y.

How should I solve this problem @kdgutier please? Because it's not clear if I would have to use as many unique_ids as the batch size, and if yes, should the number of them be balanced across the dataset? I am really lost here.

from esrnn_torch.

kdgutier commented on September 3, 2024

Because the maximum batch_size that the dataloader can use here is 1, the ESRNN is a recurrent network.
It needs to sequentially visit all the observations of your series.

from esrnn_torch.

gsamaras commented on September 3, 2024

Does that mean that I am constrained in using batch size of 1?

from esrnn_torch.

kdgutier commented on September 3, 2024

If you are using a windows based model like the N-BEATS, N-HiTS or any other MLP based you can have bigger batch_size. If you are using a pure RNN model, by construction the amount of series you can use is one.
Unless you do some special work on the dataloader/series preprocessing.

from esrnn_torch.

How to prepare the data format? about esrnn_torch HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent