Comments (6)
@FGG100y That's strange. I ran the following code which I believe is equivalent to what you described above, and it didn't give me any error.
Which version of ADTK are you using?
import numpy as np
import pandas as pd
from adtk.detector import ThresholdAD
from adtk.detector import QuantileAD
from adtk.detector import InterQuartileRangeAD
from adtk.detector import PersistAD
from adtk.detector import LevelShiftAD
from adtk.detector import VolatilityShiftAD
from adtk.detector import SeasonalAD
from adtk.detector import AutoregressionAD
from adtk.data import split_train_test
def get_detector(adname="ThresholdAD"):
detectors = {"ThresholdAD": ThresholdAD,
"QuantileAD": QuantileAD,
"InterQuartileRangeAD": InterQuartileRangeAD,
"PersistAD": PersistAD,
"LevelShiftAD": LevelShiftAD,
"VolatilityShiftAD": VolatilityShiftAD,
"SeasonalAD": SeasonalAD,
"AutoregressionAD": AutoregressionAD,
}
return detectors.get(adname)
def ad_detector(dname, train_data=None, test_data=None, **kwargs):
Ad = get_detector(dname)
ad = Ad(**kwargs)
train_anoms = ad.fit_detect(train_data)
test_anoms = ad.detect(test_data)
return train_anoms, test_anoms
data = pd.Series(np.sin(np.arange(100)), index=pd.date_range(start="2020-02-02", periods=100, freq="D"))
s_train, s_test = split_train_test(data, mode=3, n_splits=2)
train_anoms, test_anoms = [], []
for train, test in zip(s_train, s_test):
train_anom, test_anom = ad_detector(dname='SeasonalAD',
train_data=data,
test_data=data.squeeze(),
c=1, side='both')
# collect the results
train_anoms.append(train_anom)
test_anoms.append(test_anom)
from adtk.
@tailaiw Thank you for your reply. The adtk version: 0.5.2
I used the same syntheses data as yours, and it reported no error. So I believed it's something wrong with my data.
And this was how I deal with the data(preprocessing):
# replace the NaNs with the median deal to some extreme larger abnormal values
# if not replace the NaNs, adtk reported "NaNs between valid values were not allowed"
quantiles = data.quantile([0.01, 0.99]).values.flatten()
q_high, q_low = quantiles[1], quantiles[0]
data[data[fname.split('_')[-1]] < q_low] = NaN
data[data[fname.split('_')[-1]] > q_high] = NaN
data = data.replace(NaN, data.median())
The split-train-test timeseries:
Am I missing something in adtk Docs, or there is something wrong with the data?
from adtk.
@tailaiw
And this was the data that I used in this case:
ts_debug.txt
from adtk.
@FGG100y
It looks the problem is related to the fact that your input is a Dataframe instead of a Series object. I will look into this. It is probably a bug. Thanks for catching this!
I noticed your data is univariate. So before we fix the problem, what you can do is putting your data in a Series instead of a single-column DataFrame. I replaced the synthetic data with your data (i.e. replacing the line of data generation with data = pd.read_csv("./ts_debug.txt", parse_dates=True, squeeze=True, index_col=0)
. It returns no error. If I load the data with option squeeze=False
, i.e. loading the data into a DataFrame, it will hit the RuntimeError you mentioned.
from adtk.
@tailaiw
Like your suggestions^^, and I have solved my problems.
Thanks a lot.
from adtk.
@FGG100y We dived into the problem you mentioned and found the problem is as follows:
ADTK contains univariate models and multivariate models. The models you were using are all univariate models. By design, if a univariate model is applied to a DataFrame, it treats each column of the DataFrame as an independent time series.
If a model is trained with a Series and is applied to a DataFrame, ADTK will apply the model to each column independently and returns a concatenated DataFrame as output.
If a model is trained with DataFrame (say with columns "A", "B", and "C"), what happens on the backend is that ADTK trains 3 models respectively. If the model object is then applied to a DataFrame with the same column names, ADTK will match the trained models with columns automatically. We found this design convenient for the case where a certain type of model is applied to a large number of time series.
If a model trained by a DataFrame is applied to a Series or a DataFrame with different column names, ADTK will throw an error because it cannot find the matching. This is what caused the error you encountered (note that your training data is a DataFrame while your testing data is a Series because you used squeeze
method).
This logic was not well tested or documented, and the error message was misleading. Thanks to your issue, we noticed this problem and fixed it. We just released a patch v0.5.4 to address this issue. If you see anything missed, please feel free to reopen this issue.
from adtk.
Related Issues (20)
- Can anomalies from one detector be applied before series is passed to next detector/assembly? HOT 3
- It looks like I found a bug in the `__doc__` of the class `DoubleRollingAggregate` in the module `Transformers`
- Where can I find information on how the detector algorithms are developed?
- [Question]:What is the output type of `anomalies` when I use Outlierdetector? HOT 2
- return anomaly scores
- Retrieve informations Pipeline [QUESTION]
- flowchart has problem
- Identify no change in time series HOT 1
- Quantile AD and Threshold based AD criteria
- How to plot more than one line with anomalies in one graph?
- PcaAD Returning Different Results for Same Inputs
- data stream
- ADKT FutureWarning HOT 1
- RuntimeError: Series does not follow any known frequency (e.g. second, minute, hour, day, week, month, year, etc. HOT 1
- pandas removed deprecated `Series.iteritems()`, `DataFrame.iteritems()`, use `obj.items` instead HOT 2
- dataframe can not plot HOT 3
- I encountered a strange error when using adtk's plot method: ValueError: Multi-dimensional indexing (e.g. obj[:, None]) is no longer supported. Convert to a numpy array before indexing instead
- In the file of [https://github.com/arundo/adtk/blob/v0.6.2/tests/test_visualization.ipynb](https://github.com/arundo/adtk/blob/v0.6.2/tests/test_visualization.ipynb), The example does not use df, does it mean it does not support df?
- 'seaborn-whitegrid' is not a valid package style HOT 6
- Monthly Data (Freq = 720) = 30 days, but which month has 30 days.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adtk.