Git Product home page Git Product logo

ts-ad-datasets's Introduction

Public Datasets for Time Series Anomaly Detection

Time Series Anomaly Detection Datasets

Here I summarized some datasets publicly available for time series anomaly detection.

1. Outlier Detection DataSets (ODDS)

ODDS webpage is here. Note that the datasets contains not only time series, but also other data types (videos, texts, and graphs).

2. Kaggle Credit Card Fraud Detection DataSet (CCFD)

Mainpage is here. The dataset contains transactions made by credit cards in September 2013 by European cardholders, yet due to privacy and security reasons, what we see is the result of a PCA transformation.

3. Yahoo Time Series Anomaly Detection Benchmark

Request access to this dataset here.

Contains 4 folders, A1, A2, A3, A4.

A1Benchmark is based on the real production traffic to some of the Yahoo! properties. The other 3 benchmarks are based on synthetic time-series. A2 and A3 Benchmarks include outliers, while the A4Benchmark includes change-point anomalies. The bechmarks based on real-data have property and geos removed. Fields in each data file are delimited with (",") characters.

4. Numenta Anomaly Benchmark (NAB)

Description of NAB can be found here.

Dataset repository is here.

5. Secure Water Treatment (SWaT) Dataset

Multivariate time series datasets collected by “iTrust, Centre for Research in Cyber Security, Singapore University of Technology and Design”. See website here to request access to the dataset and check usage requirements.

6. Water Distribution (WADI) Dataset

Also collected by “iTrust, Centre for Research in Cyber Security, Singapore University of Technology and Design”. See website here to request access to the dataset (it can actually be requested at the same time as when requesting for SWaT) and check usage requirements.

7. Server Machine Dataset (SMD)

Dataset released here as a part of the authors' repository of their KDD 2019 paper "Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network".

8. UCR Time Series Anomaly Archive

Contains over 250 datasets. The link to download the dataset is here.

The maintainers of the archive also recommend reading the following papers "The UEA multivariate time series classification archive, 2018" and "Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress" before using the dataset.

9. Soil Moisture Active Passive (SMAP) Satellite Dataset

Dataset webpage is here. Check the dataset description here.

wget https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip

cd data && wget https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv

10. Mars Science Laboratory (MSL) Curiosity Rover Dataset

Dataset webpage is here.

wget https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip

cd data && wget https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv

11. Skoltech Anomaly Benchmark (SKAB)

Dataset repo is here.

12. Artificial Intelligence for IT Operations (AIOps) Challenge Datasets

Datasets maintained by the Netman Lab at Tsinghua University, their group's GitHub profile can be found here.

The KPI dataset from their 2018 challenge is here, and the 2020 data is here.

13. Pooled Server Metric (PSM) Dataset

This dataset was collected by eBay, and was released here in their repository of an anomaly detection model they proposed named RANSynCoders.

14. PhysioNet Open Access Databases

Check the PhysioNet Data webpage here. These datasets are all medicine-related.

One of the datasets MIT-BIH Supraventricular Arrhythmia Database was seen used in a VLDB 2022 paper TranAD: deep transformer networks for anomaly detection in multivariate time series data.

15. Datasets Related to Power Systems from IEEE Dataport

a) CYBER-PHYSICAL DATASET OF HARDWARE-IN-THE-LOOP CYBER-PHYSICAL POWER SYSTEMS TESTBED UNDER MITM ATTACKS

Dataset main page is here.

This dataset is collect by performing different Man-in-the-Middle (MiTM) attacks in the synthetic cyber-physical electric grid in RESLab Testbed at Texas AM University, US.

b) DATASET OF PORT SCANNING ATTACKS ON EMULATION TESTBED AND HARDWARE-IN-THE-LOOP TESTBED

Dataset main page is here.

The dataset is generated by performing four scenarios of port scanning attacks on a 8-substation supervisory control and data acquisition (SCADA) system at three different environments, including the minimega at Sandia National Lab (SNL), the Common Open Research Emulator (CORE) at Texas A&M University, and the hardware-in-the-loop RESLab Testbed at Texas A&M University.

c) ICS DATASET FOR SMART GRID ANOMALY DETECTION

Dataset main page is here. Dataset contains both normal traffic and communication with anomalies (cyber attacks, link failure, etc.).

16. Water Quality Dataset at GECCO 2018 Challenge

Download the dataset here.

17. Application Server Dataset (ASD)

The dataset can be found here which is within the code repository of a KDD 2021 paper.

Time Series Classification Datasets That Could Potentially Be Used for Anomaly Detection

Another common way I see people do is to use time series classification datasets for anomaly detection - you can preprocess the datasets by select one or a few minority classses and label them as anomalies.

1. UCI Machine Learning Repository Dataset - Time Series Classification

Look for time series datasets for classification tasks on the UCI repo webpage here here.

2. UEA & UCR Time Series Classification Repository

Dataset mainpage is here.

3. Industrial Control System (ICS) Cyber Attack Datasets

Dataset webpage is here.

4. Ausgrid Solar Home Electricity Dataset

The dataset main page is here. The dataset providers have published a paper Residential load and rooftop PV generation: an Australian distribution network dataset describing their dataset. There also exists an GitHub repo that analyzes this dataset's characteristics. There is a paper that uses this dataset for anomaly detection purposes titled "Anomaly Detection in Smart Meter Data for Preventing Potential Smart Grid Imbalance" here.

ts-ad-datasets's People

Contributors

elisejiuqizhang avatar

Stargazers

 avatar  avatar

Forkers

jc0624

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.