Git Product home page Git Product logo

couta's Introduction

COUTA - time series anomaly detection

Implementation of "Calibrated One-class classification-based Unsupervised Time series Anomaly detection" (COUTA for short).
The full paper is available at link.
Please consider citing our paper if you use this repository. ๐Ÿ˜‰

@article{xu2022deep,
  title={Calibrated One-class Classification for Unsupervised Time Series Anomaly
Detection},
  author={Xu, Hongzuo and Wang, Yijie and Jian, Songlei and Liao, Qing and Wang, Yongjun and Pang, Guansong},
  journal={arXiv preprint arXiv:2207.12201},
  year={2022}
}

Environment

main packages

torch==1.10.1+cu113  
numpy==1.20.3  
pandas==1.3.3  
scipy==1.4.1  
scikit-learn==1.1.1  

we provide a requirements.txt in our repository.

Takeaways

APIs

COUTA provides easy APIs in a sklearn/pyod style, that is, we can first instantiate the model class by giving the parameters

from src.algorithms.couta_algo import COUTA
model_configs = {'sequence_length': 50, 'stride': 1}
model = COUTA(**model_configs)

then, the instantiated model can be used to fit and predict data, please use dataframes of pandas as input data

model.fit(train_df)
score_dic = model.predict(test_df)
score = score_dic['score_t']

We use a dictionary as our prediction output for the sake of consistency with an evaluation work of time series anomaly detection link
score_t is a vector that indicates anomaly scores of each time observation in the testing dataframe, and a higher value represents a higher likehood to be an anomaly

model save and load

Training by feeding the save_model_path parameter, the model will be saved in this path

from src.algorithms.couta_algo import COUTA
path = 'saved_models/couta.pth'
model_configs = {'sequence_length': 50, 'stride': 1, 'save_model_path': path}
model = COUTA(**model_configs)
model.fit(train_df)

Then, couta can be used without fitting.

from src.algorithms.couta_algo import COUTA
path = 'saved_models/couta.pth'
model_configs = {'load_model_path': path}
model = COUTA(**model_configs)
model.predict(test_df)

Datasets used in our paper

  • Due to the license issue of these datasets, we provide download links here. We also offer the preprocessing script in data_preprocessing.ipynb. You can easily generate processed datasets that can be directly fed into our pipeline by downloading original data and running this notebook. *

The used datasets can be downloaded from:

Reproduction of experiment results

Experiments of the effectivness (4.2)

After handling the used datasets, you can use main.py to perform COUTA on different time series datasets, we use six datasets in our paper, and --data can be chosen from [ASD, SMD, SWaT, WaQ, Epilepsy, DSADS].

For example, perform COUTA on the ASD dataset by

python main.py --data ASD --algo COUTA

or you can directly use script_effectivenss.sh

Generalization test (4.3)

we include the used synthetic datasets in data_processed/

python main_showcase.py --type point
python main_showcase.py --type pattern

two anomaly score npy files are generated, you can use experiment_generalization_ability.ipynb to visualize the data and our results.

Robustness (4.4)

use src/experiments/data_contaminated_generator_dsads.py and src/experiments/data_contaminated_generator_ep.py to generate datasets with various contamination ratios
use main.py to perform COUTA on these datasets, or directly execute script_robustness.sh

Ablation study (4.5)

change the --algo argument to COUTA_wto_umc, COUTA_wto_nac, or Canonical, e.g.,

python main.py --algo COUTA_wto_umc --data ASD

use script_effectiveness.sh also produce detection results of ablated variants

Others

As for the sensitivity test (4.6), please adjust the parameters in the yaml file.
As for the scalability test (4.7), the produced result files also contain execution time.

Competing methods

All of the anomaly detectors in our paper are implemented in Python. We list their publicly available implementations below.

couta's People

Contributors

jdk-21 avatar xuhongzuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

couta's Issues

Draw the loss curve

@xuhongzuo . I want to use pd.DataFrame to save loss, loss_oc and val_loss in each epoch, and save it as a csv file. Could you add the code for this function?

train

Hi, I would like to ask how to retrain this model. I tried to modify the code, but no matter how I modified it, the results were always the same as in the paper and did not change.

question about data perturbation

Hello, @xuhongzuo. This is a great work. I am interested with data perturbation(which also called data augmentation in other paper). Data augmentation is a popular work, and this technique is often used in contrastive learning. I noticed that this is used in recent time series anomaly detection work, such as COCA, TFAD. In the paper, you use several simple transformations to generate abnormal samples. Where is the code for this processing?

Saving model

Hello,
First, thank you for your work, I've been using USAD so far but your solution seems to give similar or even better results on my data, while being easier to tune/train

I'm not really familar with models built that way, so I'm kinda stuck on one point, if you could guide me I'd be grateful:
I trained my model on colab, but need to make it available on another computer (inference on CPU only) for experimentation purpose.

Here is the struggle : I Can't find the proper way to save the model to be used elsewhere.

What I tried :

  • dill/pickle/torch.save the trained model object, rearrange functions,... but there's always at least one of the dependencies that cannot be interpreted. I know it's not the best pratice to do so anyway
  • Just saved the 'net' state_dict, recreate the COUTA instance and load dict but too much things are done in fit(), so I can't reload weights if the model is not fitted before

Any help will be appreciated, thank you in advance

train

Hi, I have a problem during training. When I removed the backpropagation and parameter updates during training, the model was still able to train correctly and produced similar results to the paper.
3fd67cfd49042567b66f2e96e6a8b11

1662359927067

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.