Git Product home page Git Product logo

anomaly-detection-resources's Introduction

Anomaly Detection Learning Resources


Outlier Detection (also known as Anomaly Detection) is an exciting yet challenging field. It aims to identify outlying data objects, and has been proven critical in many fields, such as credit card fraud analytics, network intrusion detection, and mechanical unit defect detection.

In this repository, you could find:

  1. Books & Academic Papers
  2. Learning Materials, e.g., online courses and videos
  3. Outlier Datasets
  4. Open-source Libraries & Demo Codes
  5. Paper Downloader (under development): a Python script to download open access papers listed in this repository.

More items will be added to the repository. Please feel free to add other key resources by opening an issue report, submitting a pull request, or dropping me an email @ ([email protected]). Enjoy reading!


Table of Contents


1. Books & Tutorials

1.1. Books

Outlier Analysis by Charu Aggarwal: Classical text book covering most of the outlier analysis techniques. A must-read for people in the field of outlier detection. [Preview.pdf]

Outlier Ensembles: An Introduction by Charu Aggarwal and Saket Sathe: Great intro book for ensemble learning in outlier analysis.

Data Mining: Concepts and Techniques (3rd) by Jiawei Han and Micheline Kamber and Jian Pei: Chapter 12 discusses outlier detection with many key points. [Google Search]

1.2. Tutorials

Tutorial Title Venue Year Ref Materials
Outlier detection techniques ACM SIGKDD 2010 2010 1 [PDF]
Anomaly Detection: A Tutorial ICDM 2011 2011 2 [PDF]
Data mining for anomaly detection PKDD 2008 2008 3 [See Video]

2. Courses/Seminars/Videos

Coursera Introduction to Anomaly Detection (by IBM): [See Video]

Coursera Real-Time Cyber Threat Detection and Mitigation partly covers the topic: [See Video]

Coursera Machine Learning by Andrew Ng also partly covers the topic:

Udemy Outlier Detection Algorithms in Data Mining and Data Science: [See Video]

Stanford Data Mining for Cyber Security also covers part of anomaly detection techniques. [See Video]


3. Toolbox & Datasets

3.1. Multivariate Data

[Python] Python Outlier Detection (PyOD): PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. It contains more than 20 detection algorithms, including emerging deep learning models and outlier ensembles.

[Python] Scikit-learn Novelty and Outlier Detection. It supports some popular algorithms like LOF, Isolation Forest, and One-class SVM.

[Matlab] Anomaly Detection Toolbox - Beta: A collection of popular outlier detection algorithms in Matlab.

[Java] ELKI: Environment for Developing KDD-Applications Supported by Index-Structures: ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection.

[Java] RapidMiner Anomaly Detection Extension: The Anomaly Detection Extension for RapidMiner comprises the most well know unsupervised anomaly detection algorithms, assigning individual anomaly scores to data rows of example sets. It allows you to find data, which is significantly different from the normal, without the need for the data being labeled.

[R] outliers package: A collection of some tests commonly used for identifying outliers in R.

3.2. Time series outlier detection

[Python] datastream.io: An open-source framework for real-time anomaly detection using Python, Elasticsearch and Kibana.

[Python] skyline: Skyline is a near real time anomaly detection system.

[Python] banpei: Banpei is a Python package of the anomaly detection.

[R] AnomalyDetection: AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend.

3.3. Datasets

ELKI Outlier Datasets: https://elki-project.github.io/datasets/outlier

Outlier Detection DataSets (ODDS): http://odds.cs.stonybrook.edu/#table1

Unsupervised Anomaly Detection Dataverse: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF

Anomaly Detection Meta-Analysis Benchmarks: https://ir.library.oregonstate.edu/concern/datasets/47429f155


4. Papers

4.1. Overview & Survey Papers

Paper Title Year Ref Materials
Anomaly detection: A survey 2009 4 [PDF]
A survey of outlier detection methodologies 2004 5 [PDF]
On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study 2016 6 [HTML], [SLIDES]
Outlier detection: applications and techniques 2012 7 [PDF]
A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data 2016 8 [PDF]
Research Issues in Outlier Detection 2019 9 [HTML]

4.2. Key Algorithms

Abbreviation Paper Title Year Ref Materials
kNN Efficient algorithms for mining outliers from large data sets 2000 10 [PDF]
KNN Fast outlier detection in high dimensional spaces 2002 11 [HTML]
LOF LOF: identifying density-based local outliers 2000 12 [PDF]
IForest Isolation forest 2008 13 [PDF]
OCSVM Time-series novelty detection using one-class support vector machines 2003 14 [PDF]
AutoEncoder Ensemble Outlier detection with autoencoder ensembles 2017 15 [PDF]

4.3. Graph & Network Outlier Detection

Paper Title Year Ref Materials
Graph based anomaly detection and description: a survey 2015 16 [PDF]
Anomaly detection in dynamic networks: a survey 2015 17 [PDF]

4.4. Time Series Outlier Detection

Gupta, M., Gao, J., Aggarwal, C.C. and Han, J., 2014. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering, 26(9), pp.2250-2267. [PDF]

4.5. Feature Selection in Outlier Detection

Pang, G., Cao, L., Chen, L. and Liu, H., 2016, December. Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In Data Mining (ICDM), 2016 IEEE 16th International Conference on (pp. 410-419). IEEE. [PDF]

Pang, G., Cao, L., Chen, L. and Liu, H., 2017, August. Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 2585-2591). AAAI Press. [PDF]

4.6. High-dimensional & Subspace Outliers

Zimek, A., Schubert, E. and Kriegel, H.P., 2012. A survey on unsupervised outlier detection in high‐dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), pp.363-387. [Downloadable Link]

Pang, G., Cao, L., Chen, L. and Liu, H., 2018. Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018. [PDF]

4.7. Outlier Ensembles

Paper Title Year Ref Materials
Outlier ensembles: position paper 2013 18 [PDF]
Ensembles for unsupervised outlier detection: challenges and research questions a position paper 2014 19 [PDF]
An Unsupervised Boosting Strategy for Outlier Detection Ensembles 2018 20 [HTML]

4.8. Outlier Detection in Evolving Data

Salehi, Mahsa & Rashidi, Lida. (2018). A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]. ACM SIGKDD Explorations Newsletter. 20. 13-23. [PDF]

Emaad Manzoor, Hemank Lamba, Leman Akoglu. Outlier Detection in Feature-Evolving Data Streams. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018. [PDF] [Github]

4.9. Representation Learning in Outlier Detection

Paper Title Year Ref Materials
Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection 2018 21 [PDF]
Learning representations for outlier detection on a budget 2015 22 [PDF]
XGBOD: improving supervised outlier detection with unsupervised representation learning 2018 23 [PDF]

4.10. Interpretability

Paper Title Year Ref Materials
Explaining Anomalies in Groups with Characterizing Subspace Rules 2018 24 [PDF]
Beyond Outlier Detection: LookOut for Pictorial Explanation 2018 25 [PDF]
Contextual outlier interpretation 2018 26 [PDF]
Mining multidimensional contextual outliers from categorical relational data 2015 27 [PDF]
Discriminative features for identifying and interpreting outliers 2014 28 [PDF]

4.11. Social Media Anomaly Detection

Yu, R., Qiu, H., Wen, Z., Lin, C. and Liu, Y., 2016. A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter, 18(1), pp.1-14. [PDF]

Yu, R., He, X. and Liu, Y., 2015. Glad: group anomaly detection in social media analysis. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(2), p.18. [PDF]

4.12. Outlier Detection in Other fields

Kannan, R., Woo, H., Aggarwal, C.C. and Park, H., 2017, June. Outlier detection for text data. In Proceedings of the 2017 SIAM International Conference on Data Mining (pp. 489-497). Society for Industrial and Applied Mathematics. [PDF]

4.13. Outlier Detection Applications

Field Paper Title Year Ref Materials
Security A survey of distance and similarity measures used within network intrusion anomaly detection 2015 29 [PDF]
Security Anomaly-based network intrusion detection: Techniques, systems and challenges 2009 30 [PDF]
Finance A survey of anomaly detection techniques in financial domain 2016 31 [PDF]

5. Key Conferences/Workshops/Journals

5.1. Conferences & Workshops

ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Note: SIGKDD usually has an Outlier Detection Workshop (ODD), see ODD 2018.

ACM International Conference on Management of Data (SIGMOD)

The Web Conference (WWW)

IEEE International Conference on Data Mining (ICDM)

SIAM International Conference on Data Mining (SDM)

IEEE International Conference on Data Engineering (ICDE)

ACM InternationalConference on Information and Knowledge Management (CIKM)

ACM International Conference on Web Search and Data Mining (WSDM)

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)

The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

5.2. Journals

ACM Transactions on Knowledge Discovery from Data (TKDD)

IEEE Transactions on Knowledge and Data Engineering (TKDE)

ACM SIGKDD Explorations Newsletter

Data Mining and Knowledge Discovery

Knowledge and Information Systems (KAIS)


References


  1. Kriegel, H.P., Kröger, P. and Zimek, A., 2010. Outlier detection techniques. Tutorial at ACM SIGKDD 2010.

  2. Chawla, S. and Chandola, V., 2011, Anomaly Detection: A Tutorial. Tutorial at ICDM 2011.

  3. Lazarevic, A., Banerjee, A., Chandola, V., Kumar, V. and Srivastava, J., 2008, September. Data mining for anomaly detection. Tutorial at ECML PKDD 2008.

  4. Chandola, V., Banerjee, A. and Kumar, V., 2009. Anomaly detection: A survey. ACM computing surveys , 41(3), p.15.

  5. Hodge, V. and Austin, J., 2004. A survey of outlier detection methodologies. Artificial intelligence review, 22(2), pp.85-126.

  6. Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I. and Houle, M.E., 2016. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4), pp.891-927.

  7. Singh, K., & Upadhyaya, S. (2012). Outlier detection: applications and techniques. International Journal of Computer Science Issues (IJCSI), 9(1), 307.

  8. Goldstein, M. and Uchida, S., 2016. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS one, 11(4), p.e0152173.

  9. Suri, N.R. and Athithan, G., 2019. Research Issues in Outlier Detection. In Outlier Detection: Techniques and Applications, pp. 29-51. Springer, Cham.

  10. Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. ACM Sigmod Record, 29(2), pp. 427-438).

  11. Angiulli, F. and Pizzuti, C., 2002, August. Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery pp. 15-27.

  12. Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. ACM Sigmod Record, 29(2), pp. 93-104.

  13. Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining, pp. 413-422. IEEE.

  14. Ma, J. and Perkins, S., 2003, July. Time-series novelty detection using one-class support vector machines. In IJCNN' 03, pp. 1741-1745. IEEE.

  15. Chen, J., Sathe, S., Aggarwal, C. and Turaga, D., 2017, June. Outlier detection with autoencoder ensembles. SIAM International Conference on Data Mining, pp. 90-98. Society for Industrial and Applied Mathematics.

  16. Akoglu, L., Tong, H. and Koutra, D., 2015. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 29(3), pp.626-688.

  17. Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C. and Samatova, N.F., 2015. Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics, 7(3), pp.223-247.

  18. Aggarwal, C.C., 2013. Outlier ensembles: position paper. ACM SIGKDD Explorations Newsletter, 14(2), pp.49-58.

  19. Zimek, A., Campello, R.J. and Sander, J., 2014. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM Sigkdd Explorations Newsletter, 15(1), pp.11-22.

  20. Campos, G.O., Zimek, A. and Meira, W., 2018, June. An Unsupervised Boosting Strategy for Outlier Detection Ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 564-576). Springer, Cham.

  21. Pang, G., Cao, L., Chen, L. and Liu, H., 2018. Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018.

  22. Micenková, B., McWilliams, B. and Assent, I., 2015. Learning representations for outlier detection on a budget. arXiv preprint arXiv:1507.08104.

  23. Zhao, Y. and Hryniewicki, M.K., 2018, July. XGBOD: improving supervised outlier detection with unsupervised representation learning. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE.

  24. Macha, M. and Akoglu, L., 2018. Explaining anomalies in groups with characterizing subspace rules. Data Mining and Knowledge Discovery, 32(5), pp.1444-1480.

  25. Gupta, N., Eswaran, D., Shah, N., Akoglu, L. and Faloutsos, C., Beyond Outlier Detection: LookOut for Pictorial Explanation. ECML PKDD 2018.

  26. Liu, N., Shin, D. and Hu, X., 2017. Contextual outlier interpretation. In International Joint Conference on Artificial Intelligence (IJCAI-18), pp.2461-2467.

  27. Tang, G., Pei, J., Bailey, J. and Dong, G., 2015. Mining multidimensional contextual outliers from categorical relational data. Intelligent Data Analysis, 19(5), pp.1171-1192.

  28. Dang, X.H., Assent, I., Ng, R.T., Zimek, A. and Schubert, E., 2014, March. Discriminative features for identifying and interpreting outliers. In International Conference on Data Engineering (ICDE). IEEE.

  29. Weller-Fahy, D.J., Borghetti, B.J. and Sodemann, A.A., 2015. A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Communications Surveys & Tutorials, 17(1), pp.70-91.

  30. Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G. and Vázquez, E., 2009. Anomaly-based network intrusion detection: Techniques, systems and challenges. computers & security, 28(1-2), pp.18-28.

  31. Ahmed, M., Mahmood, A.N. and Islam, M.R., 2016. A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems, 55, pp.278-288.

anomaly-detection-resources's People

Contributors

mesaugat avatar yzhao062 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.