Tiny Dataset error while fitting about statsforecast HOT 14 CLOSED

santoshpal3004 commented on June 1, 2024

Tiny Dataset error while fitting

from statsforecast.

Comments (14)

jmoralez commented on June 1, 2024

Hey @santoshpal3004, seems like this error is raised when your serie has 3 or less samples, since the theta model can't be trained. You can specify a fallback model in the constructor to train that model instead in cases like this.

from statsforecast.

santoshpal3004 commented on June 1, 2024

Hey @jmoralez the shape of the dataframe that I am using here is about 30,000 and I also tried keeping Holtwinters as a fallback model but still get a similar looking error but this time with "ets.py". Here is the code snippet, please it would really be helpful if you can point out any correction that has to be done.

try:
    models = [HoltWinters(season_length=12, error_type='A'),
              SeasonalNaive(season_length=12),
              HistoricAverage(),
              DOT(season_length=12, decomposition_type='additive'),
              AutoTheta(season_length=12),
              AutoARIMA(season_length=12),
              AutoETS(season_length=12)
             ]
    #instaniate the model
    model = StatsForecast(models = models, 
                          freq='M', 
                          n_jobs=-1
                          fallback_model= HoltWinters(season_length=12, error_type='A')
                         )

    #train model, like in sklearn
    model.fit(df=X_train_agg.head(1000))

#this circumvents the error we get with autoarima
#try again without autoarima in the list of models
except ZeroDivisionError:
    models = [HoltWinters(season_length=12, error_type='A'),
              SeasonalNaive(season_length=12),
              HistoricAverage(),
              DOT(season_length=12, decomposition_type='additive'),
              AutoTheta(season_length=12),
              AutoETS(season_length=12)
             ]

    model = StatsForecast(models = models, 
                      freq='M', 
                      n_jobs=-1,
                      #fallback_model= HoltWinters(season_length=12, error_type='A')
                     )

    model.fit(X_train_agg)

from statsforecast.

jmoralez commented on June 1, 2024

The size in the errors refers to the size of a single serie (unique_id). So if you run for example df['unique_id'].value_counts() you should see some with 3 or less values, which are the problematic ones. In that case the only viable fallback is the Naive model, you could try that one.

from statsforecast.

santoshpal3004 commented on June 1, 2024

So just to clarify "unique_id" is the index and not a column. X_train_agg.index.nunique() yields 511 rows.

from statsforecast.

jmoralez commented on June 1, 2024

You should set it as a column, we're deprecating passing it as the index. Also the problem isn't how many unique series you have, but their sizes, so running value_counts on the unique_ids is what will tell you their sizes.

from statsforecast.

santoshpal3004 commented on June 1, 2024

Ok point noted @jmoralez but the issue still persists even after keeping all the records with value_counts more than 3. PFA the code snippet:

value_counts = new_df['unique_id'].value_counts()
valid_indices = value_counts[value_counts >= 5].index
filtered_df = new_df[new_df['unique_id'].isin(valid_indices)]
filtered_df['unique_id'].value_counts()

**OUTPUT:**
HK                   74
HK/Pharma/HP1_S01    74
HK/Pharma/HP1_J05    74
HK/Pharma/HP1_J02    74
HK/Pharma/HP1_J01    74
                     ..
HK/CHC/TD1_C07        6
HK/CHC/TD1_A10        6
HK/CHC/HP1_A10        5
HK/CHC/DP1_A10        5
HK/Pharma/HP1_P02     5
Name: unique_id, Length: 482, dtype: int64

**ERROR AFTER USING THIS df:**
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.10/site-packages/statsforecast/core.py", line 73, in fit
    fm[i, i_model] = new_model.fit(y=y, X=X)
  File "/opt/conda/lib/python3.10/site-packages/statsforecast/models.py", line 553, in fit
    self.model_ = ets_f(
  File "/opt/conda/lib/python3.10/site-packages/statsforecast/ets.py", line 1235, in ets_f
    raise NotImplementedError("tiny datasets")
NotImplementedError: tiny datasets
"""

The above exception was the direct cause of the following exception:

NotImplementedError                       Traceback (most recent call last)
Cell In[50], line 57
     50     model = StatsForecast(models = models, 
     51                           freq='M', 
     52                           n_jobs=-1
     53                           #fallback_model= HoltWinters(season_length=12, error_type='A')
     54                          )
     56     #train model, like in sklearn
---> 57     model.fit(df=filtered_df.head(1000))
     59 #this circumvents the error we get with autoarima
     60 #try again without autoarima in the list of models
     61 except ZeroDivisionError:

File /opt/conda/lib/python3.10/site-packages/statsforecast/core.py:581, in _StatsForecast.fit(self, df, sort_df)
    579     self.fitted_ = self.ga.fit(models=self.models)
    580 else:
--> 581     self.fitted_ = self._fit_parallel()
    582 return self

File /opt/conda/lib/python3.10/site-packages/statsforecast/core.py:940, in _StatsForecast._fit_parallel(self)
    938         future = executor.apply_async(ga.fit, (self.models,))
    939         futures.append(future)
--> 940     fm = np.vstack([f.get() for f in futures])
    941 return fm

File /opt/conda/lib/python3.10/site-packages/statsforecast/core.py:940, in <listcomp>(.0)
    938         future = executor.apply_async(ga.fit, (self.models,))
    939         futures.append(future)
--> 940     fm = np.vstack([f.get() for f in futures])
    941 return fm

File /opt/conda/lib/python3.10/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout)
    772     return self._value
    773 else:
--> 774     raise self._value

NotImplementedError: tiny datasets

from statsforecast.

jmoralez commented on June 1, 2024

The ets requires more than 3 samples. The easiest fix is providing a fallback model like Naive or HistoricAverage.

from statsforecast.

santoshpal3004 commented on June 1, 2024

As indicated in the previous response, I have a substantial amount of data at my disposal. However, it continues to yield the same error. Furthermore, the inclusion of a fallback model does not serve its intended purpose if I have certainty that the other models will unquestionably encounter failures. PFA the code snippet:

`value_counts = new_df['unique_id'].value_counts()
valid_indices = value_counts[value_counts >= 5].index
filtered_df = new_df[new_df['unique_id'].isin(valid_indices)]
filtered_df['unique_id'].value_counts()

OUTPUT:
HK 74
HK/Pharma/HP1_S01 74
HK/Pharma/HP1_J05 74
HK/Pharma/HP1_J02 74
HK/Pharma/HP1_J01 74
..
HK/CHC/TD1_C07 6
HK/CHC/TD1_A10 6
HK/CHC/HP1_A10 5
HK/CHC/DP1_A10 5
HK/Pharma/HP1_P02 5
Name: unique_id, Length: 482, dtype: int64`

from statsforecast.

jmoralez commented on June 1, 2024

statsforecast trains the models per serie, so it doesn't matter how much data you have in total, but how many samples each serie has. You have some series with 5 samples, for which an ets model can't be trained, so in those cases it will fail. If you specify a fallback model those series will be forecasted using that fallback model when a more complex model fails (ets in this case).

from statsforecast.

santoshpal3004 commented on June 1, 2024

What is the minimum number of samples each series should have for it to be trained using ETS based models?

from statsforecast.

jmoralez commented on June 1, 2024

Here's the relevant part of the code

statsforecast/statsforecast/ets.py

Lines 1226 to 1236 in be9db75

 npars = 2 # alpha + l0 

 if trendtype in ["A", "M"]: 

 npars += 2 # beta + b0 

 if seasontype in ["A", "M"]: 

 npars += 2 # gamma + s 

 if damped is not None: 

 npars += damped 

 # ses for non-optimized tiny datasets 

 if n <= npars + 4: 

 # we need HoltWintersZZ function 

 raise NotImplementedError("tiny datasets")

The defaults are model='ZZZ' and damped=None so if you keep those you need at least 7 samples. Keep in mind that even though it may train it probably won't be very good with that few samples, since it doesn't even cover one seasonal period (12).

from statsforecast.

santoshpal3004 commented on June 1, 2024

Ok, but I recently tried keeping the number of samples per series as 18 still was not able to fit any models. The error which I get this looks something like this:

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/opt/conda/lib/python3.10/site-packages/statsforecast/core.py", line 73, in fit
fm[i, i_model] = new_model.fit(y=y, X=X)
File "/opt/conda/lib/python3.10/site-packages/statsforecast/models.py", line 553, in fit
self.model_ = ets_f(
File "/opt/conda/lib/python3.10/site-packages/statsforecast/ets.py", line 1300, in ets_f
raise Exception("no model able to be fitted")
Exception: no model able to be fitted

from statsforecast.

jmoralez commented on June 1, 2024

Can you try keeping at least two seasonal periods (24)?

from statsforecast.

github-actions commented on June 1, 2024

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.

from statsforecast.

Tiny Dataset error while fitting about statsforecast HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	npars = 2 # alpha + l0
	if trendtype in ["A", "M"]:
	npars += 2 # beta + b0
	if seasontype in ["A", "M"]:
	npars += 2 # gamma + s
	if damped is not None:
	npars += damped
	# ses for non-optimized tiny datasets
	if n <= npars + 4:
	# we need HoltWintersZZ function
	raise NotImplementedError("tiny datasets")