Git Product home page Git Product logo

Comments (14)

jmoralez avatar jmoralez commented on June 1, 2024

Hey @santoshpal3004, seems like this error is raised when your serie has 3 or less samples, since the theta model can't be trained. You can specify a fallback model in the constructor to train that model instead in cases like this.

from statsforecast.

santoshpal3004 avatar santoshpal3004 commented on June 1, 2024

Hey @jmoralez the shape of the dataframe that I am using here is about 30,000 and I also tried keeping Holtwinters as a fallback model but still get a similar looking error but this time with "ets.py". Here is the code snippet, please it would really be helpful if you can point out any correction that has to be done.

try:
    models = [HoltWinters(season_length=12, error_type='A'),
              SeasonalNaive(season_length=12),
              HistoricAverage(),
              DOT(season_length=12, decomposition_type='additive'),
              AutoTheta(season_length=12),
              AutoARIMA(season_length=12),
              AutoETS(season_length=12)
             ]
    #instaniate the model
    model = StatsForecast(models = models, 
                          freq='M', 
                          n_jobs=-1
                          fallback_model= HoltWinters(season_length=12, error_type='A')
                         )

    #train model, like in sklearn
    model.fit(df=X_train_agg.head(1000))

#this circumvents the error we get with autoarima
#try again without autoarima in the list of models
except ZeroDivisionError:
    models = [HoltWinters(season_length=12, error_type='A'),
              SeasonalNaive(season_length=12),
              HistoricAverage(),
              DOT(season_length=12, decomposition_type='additive'),
              AutoTheta(season_length=12),
              AutoETS(season_length=12)
             ]

    model = StatsForecast(models = models, 
                      freq='M', 
                      n_jobs=-1,
                      #fallback_model= HoltWinters(season_length=12, error_type='A')
                     )

    model.fit(X_train_agg)

from statsforecast.

jmoralez avatar jmoralez commented on June 1, 2024

The size in the errors refers to the size of a single serie (unique_id). So if you run for example df['unique_id'].value_counts() you should see some with 3 or less values, which are the problematic ones. In that case the only viable fallback is the Naive model, you could try that one.

from statsforecast.

santoshpal3004 avatar santoshpal3004 commented on June 1, 2024

So just to clarify "unique_id" is the index and not a column. X_train_agg.index.nunique() yields 511 rows.

from statsforecast.

jmoralez avatar jmoralez commented on June 1, 2024

You should set it as a column, we're deprecating passing it as the index. Also the problem isn't how many unique series you have, but their sizes, so running value_counts on the unique_ids is what will tell you their sizes.

from statsforecast.

santoshpal3004 avatar santoshpal3004 commented on June 1, 2024

Ok point noted @jmoralez but the issue still persists even after keeping all the records with value_counts more than 3. PFA the code snippet:

value_counts = new_df['unique_id'].value_counts()
valid_indices = value_counts[value_counts >= 5].index
filtered_df = new_df[new_df['unique_id'].isin(valid_indices)]
filtered_df['unique_id'].value_counts()

**OUTPUT:**
HK                   74
HK/Pharma/HP1_S01    74
HK/Pharma/HP1_J05    74
HK/Pharma/HP1_J02    74
HK/Pharma/HP1_J01    74
                     ..
HK/CHC/TD1_C07        6
HK/CHC/TD1_A10        6
HK/CHC/HP1_A10        5
HK/CHC/DP1_A10        5
HK/Pharma/HP1_P02     5
Name: unique_id, Length: 482, dtype: int64

**ERROR AFTER USING THIS df:**
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.10/site-packages/statsforecast/core.py", line 73, in fit
    fm[i, i_model] = new_model.fit(y=y, X=X)
  File "/opt/conda/lib/python3.10/site-packages/statsforecast/models.py", line 553, in fit
    self.model_ = ets_f(
  File "/opt/conda/lib/python3.10/site-packages/statsforecast/ets.py", line 1235, in ets_f
    raise NotImplementedError("tiny datasets")
NotImplementedError: tiny datasets
"""

The above exception was the direct cause of the following exception:

NotImplementedError                       Traceback (most recent call last)
Cell In[50], line 57
     50     model = StatsForecast(models = models, 
     51                           freq='M', 
     52                           n_jobs=-1
     53                           #fallback_model= HoltWinters(season_length=12, error_type='A')
     54                          )
     56     #train model, like in sklearn
---> 57     model.fit(df=filtered_df.head(1000))
     59 #this circumvents the error we get with autoarima
     60 #try again without autoarima in the list of models
     61 except ZeroDivisionError:

File /opt/conda/lib/python3.10/site-packages/statsforecast/core.py:581, in _StatsForecast.fit(self, df, sort_df)
    579     self.fitted_ = self.ga.fit(models=self.models)
    580 else:
--> 581     self.fitted_ = self._fit_parallel()
    582 return self

File /opt/conda/lib/python3.10/site-packages/statsforecast/core.py:940, in _StatsForecast._fit_parallel(self)
    938         future = executor.apply_async(ga.fit, (self.models,))
    939         futures.append(future)
--> 940     fm = np.vstack([f.get() for f in futures])
    941 return fm

File /opt/conda/lib/python3.10/site-packages/statsforecast/core.py:940, in <listcomp>(.0)
    938         future = executor.apply_async(ga.fit, (self.models,))
    939         futures.append(future)
--> 940     fm = np.vstack([f.get() for f in futures])
    941 return fm

File /opt/conda/lib/python3.10/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout)
    772     return self._value
    773 else:
--> 774     raise self._value

NotImplementedError: tiny datasets

from statsforecast.

jmoralez avatar jmoralez commented on June 1, 2024

The ets requires more than 3 samples. The easiest fix is providing a fallback model like Naive or HistoricAverage.

from statsforecast.

santoshpal3004 avatar santoshpal3004 commented on June 1, 2024

As indicated in the previous response, I have a substantial amount of data at my disposal. However, it continues to yield the same error. Furthermore, the inclusion of a fallback model does not serve its intended purpose if I have certainty that the other models will unquestionably encounter failures. PFA the code snippet:

`value_counts = new_df['unique_id'].value_counts()
valid_indices = value_counts[value_counts >= 5].index
filtered_df = new_df[new_df['unique_id'].isin(valid_indices)]
filtered_df['unique_id'].value_counts()

OUTPUT:
HK 74
HK/Pharma/HP1_S01 74
HK/Pharma/HP1_J05 74
HK/Pharma/HP1_J02 74
HK/Pharma/HP1_J01 74
..
HK/CHC/TD1_C07 6
HK/CHC/TD1_A10 6
HK/CHC/HP1_A10 5
HK/CHC/DP1_A10 5
HK/Pharma/HP1_P02 5
Name: unique_id, Length: 482, dtype: int64`

from statsforecast.

jmoralez avatar jmoralez commented on June 1, 2024

statsforecast trains the models per serie, so it doesn't matter how much data you have in total, but how many samples each serie has. You have some series with 5 samples, for which an ets model can't be trained, so in those cases it will fail. If you specify a fallback model those series will be forecasted using that fallback model when a more complex model fails (ets in this case).

from statsforecast.

santoshpal3004 avatar santoshpal3004 commented on June 1, 2024

What is the minimum number of samples each series should have for it to be trained using ETS based models?

from statsforecast.

jmoralez avatar jmoralez commented on June 1, 2024

Here's the relevant part of the code

npars = 2 # alpha + l0
if trendtype in ["A", "M"]:
npars += 2 # beta + b0
if seasontype in ["A", "M"]:
npars += 2 # gamma + s
if damped is not None:
npars += damped
# ses for non-optimized tiny datasets
if n <= npars + 4:
# we need HoltWintersZZ function
raise NotImplementedError("tiny datasets")

The defaults are model='ZZZ' and damped=None so if you keep those you need at least 7 samples. Keep in mind that even though it may train it probably won't be very good with that few samples, since it doesn't even cover one seasonal period (12).

from statsforecast.

santoshpal3004 avatar santoshpal3004 commented on June 1, 2024

Ok, but I recently tried keeping the number of samples per series as 18 still was not able to fit any models. The error which I get this looks something like this:

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/opt/conda/lib/python3.10/site-packages/statsforecast/core.py", line 73, in fit
fm[i, i_model] = new_model.fit(y=y, X=X)
File "/opt/conda/lib/python3.10/site-packages/statsforecast/models.py", line 553, in fit
self.model_ = ets_f(
File "/opt/conda/lib/python3.10/site-packages/statsforecast/ets.py", line 1300, in ets_f
raise Exception("no model able to be fitted")
Exception: no model able to be fitted

from statsforecast.

jmoralez avatar jmoralez commented on June 1, 2024

Can you try keeping at least two seasonal periods (24)?

from statsforecast.

github-actions avatar github-actions commented on June 1, 2024

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.

from statsforecast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.