First of all, thx for the great tool ^^ here's the code that produce

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

RuntimeError: the model must be trained first about adtk HOT 6 CLOSED

arundo commented on August 23, 2024

RuntimeError: the model must be trained first

from adtk.

Comments (6)

tailaiw commented on August 23, 2024

@FGG100y That's strange. I ran the following code which I believe is equivalent to what you described above, and it didn't give me any error.

Which version of ADTK are you using?

import numpy as np
import pandas as pd

from adtk.detector import ThresholdAD
from adtk.detector import QuantileAD
from adtk.detector import InterQuartileRangeAD
from adtk.detector import PersistAD
from adtk.detector import LevelShiftAD
from adtk.detector import VolatilityShiftAD
from adtk.detector import SeasonalAD
from adtk.detector import AutoregressionAD
from adtk.data import split_train_test


def get_detector(adname="ThresholdAD"):
    detectors = {"ThresholdAD": ThresholdAD,
                 "QuantileAD": QuantileAD,
                 "InterQuartileRangeAD": InterQuartileRangeAD,
                 "PersistAD": PersistAD,
                 "LevelShiftAD": LevelShiftAD,
                 "VolatilityShiftAD": VolatilityShiftAD,
                 "SeasonalAD": SeasonalAD,
                 "AutoregressionAD": AutoregressionAD,
                 }
    return detectors.get(adname)

def ad_detector(dname, train_data=None, test_data=None, **kwargs):
    Ad = get_detector(dname)
    ad = Ad(**kwargs)
    train_anoms = ad.fit_detect(train_data)
    test_anoms = ad.detect(test_data)
    return train_anoms, test_anoms

data = pd.Series(np.sin(np.arange(100)), index=pd.date_range(start="2020-02-02", periods=100, freq="D"))

s_train, s_test = split_train_test(data, mode=3, n_splits=2)
train_anoms, test_anoms = [], []
for train, test in zip(s_train, s_test):
    train_anom, test_anom = ad_detector(dname='SeasonalAD',
                                        train_data=data,
                                        test_data=data.squeeze(),
                                        c=1, side='both')
    # collect the results
    train_anoms.append(train_anom)
    test_anoms.append(test_anom)

from adtk.

FGG100y commented on August 23, 2024

@tailaiw Thank you for your reply. The adtk version: 0.5.2
I used the same syntheses data as yours, and it reported no error. So I believed it's something wrong with my data.
And this was how I deal with the data(preprocessing):

# replace the NaNs with the median deal to some extreme larger abnormal values
# if not replace the NaNs, adtk reported "NaNs between valid values were not allowed"
quantiles = data.quantile([0.01, 0.99]).values.flatten()
q_high, q_low = quantiles[1], quantiles[0]
data[data[fname.split('_')[-1]] < q_low] = NaN
data[data[fname.split('_')[-1]] > q_high] = NaN
data = data.replace(NaN, data.median())

The split-train-test timeseries:

Am I missing something in adtk Docs, or there is something wrong with the data?

from adtk.

FGG100y commented on August 23, 2024

@tailaiw
And this was the data that I used in this case:
ts_debug.txt

from adtk.

tailaiw commented on August 23, 2024

@FGG100y
It looks the problem is related to the fact that your input is a Dataframe instead of a Series object. I will look into this. It is probably a bug. Thanks for catching this!

I noticed your data is univariate. So before we fix the problem, what you can do is putting your data in a Series instead of a single-column DataFrame. I replaced the synthetic data with your data (i.e. replacing the line of data generation with data = pd.read_csv("./ts_debug.txt", parse_dates=True, squeeze=True, index_col=0). It returns no error. If I load the data with option squeeze=False, i.e. loading the data into a DataFrame, it will hit the RuntimeError you mentioned.

from adtk.

FGG100y commented on August 23, 2024

@tailaiw
Like your suggestions^^, and I have solved my problems.
Thanks a lot.

from adtk.

tailaiw commented on August 23, 2024

@FGG100y We dived into the problem you mentioned and found the problem is as follows:

ADTK contains univariate models and multivariate models. The models you were using are all univariate models. By design, if a univariate model is applied to a DataFrame, it treats each column of the DataFrame as an independent time series.

If a model is trained with a Series and is applied to a DataFrame, ADTK will apply the model to each column independently and returns a concatenated DataFrame as output.

If a model is trained with DataFrame (say with columns "A", "B", and "C"), what happens on the backend is that ADTK trains 3 models respectively. If the model object is then applied to a DataFrame with the same column names, ADTK will match the trained models with columns automatically. We found this design convenient for the case where a certain type of model is applied to a large number of time series.

If a model trained by a DataFrame is applied to a Series or a DataFrame with different column names, ADTK will throw an error because it cannot find the matching. This is what caused the error you encountered (note that your training data is a DataFrame while your testing data is a Series because you used squeeze method).

This logic was not well tested or documented, and the error message was misleading. Thanks to your issue, we noticed this problem and fixed it. We just released a patch v0.5.4 to address this issue. If you see anything missed, please feel free to reopen this issue.

from adtk.

RuntimeError: the model must be trained first about adtk HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent