Git Product home page Git Product logo

shaxianhe / w2020-data599-capstone-projects-ubc-udl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ubco-mds-2020-labs/w2020-data599-capstone-projects-ubc-udl

0.0 0.0 0.0 375.51 MB

Provides a proposed approach for a near real-time anomaly detection system with Urban Data Lab's time series database. The approach uses an LSTM for detection with InfluxDB using open-source software.

Jupyter Notebook 99.99% Python 0.01% R 0.01%

w2020-data599-capstone-projects-ubc-udl's Introduction

Real-time Anomaly Detection for Building Sensors

This project was completed as the Capstone Project for the UBC Okanagan Master of Data Science (MDS) degree. The project was completed for Urban Data Lab by Nathan Smith, Mitch Harris, and Ryan Koenig. The project was completed in a 7-week timeline from proposal to final report.

The project provides a proposed approach for a near real-time anomaly detection system with Urban Data Lab's time series database. The approach includes a framework for anomaly detection model training and prediction with InfluxDB using open-source software.

This README provides on overview of the project repository and is organized into:

There are READMEs provided in subfolders for additional information.

The main project code package can be found here and a walk-through in notebook format using a test environment can be found here.

Note that this repository was used for all project documents (not limited to code).

Project Description

Background

The Urban Data Lab (UDL) advances data access, data management and data analytics capabilities on the University of British Columbia (UBC) campus with a goal of addressing campus-wide sustainability challenges. UDL has access to the UBC Energy and Water Services (EWS) SkySpark analytics platform that collects data from buildings on the UBC campus including information such as heating, ventilation and air conditioning (HVAC) equipment and energy data. UDL stores data from SkySpark in their own database using InfluxDB and have noticed potentially erroneous data reporting from SkySpark. There is currently no system in place with InfluxDB to flag these data. The project goal was to develop a real-time anomaly detection system using open-source tools that could be used with InfluxDB.

Project Concept

Anomaly Detection Framework

The approach used in this study provides near real-time anomaly detection with InfluxDB. Model training is completed by querying sensor data on an infrequent basis (for example monthly), training, and saving the models. Anomaly detection occurs on a continuous basis by reading recent data, loading and running the previously trained models, and writing predictions to InfluxDB. A subset of Campus Energy Center (CEC) boiler sensors available in SkySpark was selected for the study to test this approach.

Anomaly Detection Framework

Anomaly Detection Model

A long short-term memory recurrent neural network with an encoder-decoder architecture (LSTM-ED) is used for anomaly detection. This was selected as it provides a general model with good performance in recent studies. The generalizability of the LSTM-ED is important given the wide variety of sensor types available to UDL. The model is trained in an unsupervised approach using sequence reconstruction of input data. Anomaly predictions are then based on identifying data with high sequence reconstruction error using a simple maximum error threshold rule. The LSTM-ED was found to have good initial performance on the selected subset of CEC sensors. A data pattern was identified that the model had trouble detecting but using more sophisticated anomaly error/threshold identification rules should improve model performance and allow detection of this pattern.

One issue that was observed late in the project timeline was instability of the model due to the non-deterministic nature of the LSTM_ED. This will need to be resolved to ensure there is not a high-level of manual effort required to reset anomaly thresholds when models are retrained. A more sophisticated anomaly error/threshold identification rule (such as assessing the distribution of error instead of using a simple error threshold rule) would also improve model stability.

LSTM-ED

Dashboard and Notification System

A dashboard and notification system were also implemented with the anomaly detection model in a test InfluxDB environment. The dashboard can be built directly in InfluxDB and provides a simple display of sensor data overlaid with anomalous flagged data. The notification system also uses built-in InfluxDB functionality and was configured to send notifications for data predicted as anomalous.

dashboard

dashboard

Conclusion

This study provides an initial open-source anomaly detection approach that can be used by UDL with InfluxDB. The approach is general and should be applicable to a variety of sensors. Additional studies that can be considered for next steps include implementing the model online for several test sensor and monitoring performance, improving the model anomaly detection threshold method, comparison of the LSTM-ED with additional models, testing additional sensors, and building a more complex dashboard and notification system as required.

Ideally, the detection system could ultimately be used to provide campus and building managers with real-time or near real-time notifications of potential issues in system operations reducing operational costs, downtime, and unexpected maintenance.

Code

The code directory contains the project code used to build the anomaly detection model. The directory also contains tools used to complete the project and a test environment used for the detection framework.

The directory has:

  • docker-files includes several docker setups used to run InfluxDB/Telegraf locally
  • labeller-app is a Shiny App that can be used to visualize sensor data and graphically select and update labelling of the data as normal or anomalous
  • model provides the python functions and scripts used for anomaly detection
  • misc provides misc files used in development
  • results includes results from LSTM-ED model testing on the CEC sensors used in the study
  • test-env provides a detailed jupyter notebook walk-through of the anomaly detection framework in a test InfluxDB environment

A more detailed README is available in the directory and there are READMEs within each of the directory subfolders.

Research

The research directory provides various documents and code associated with researching/exploring various aspects of the project during development. These include:

  • EDA - exploratory data analysis completed at the start of the project to provide an understanding of CEC sensor data
  • dashboard-research - initial research on dashboards with InfluxDB or Grafana to understand the level of effort associated with the task
  • model-methods - initial research and selection of the model to be used for the study (includes various test files)
  • papers - papers referenced in the study
  • SR-testing - testing spectral residual transformations to assess if the transformation could be useful in anomaly identification
  • streaming-methods - various tests on streaming methods that could be used with InfluxDB

A more detailed README is provided in the directory.

Data

Includes data used to test the anomaly detection model and code:

  • unlabelled-skyspark-data - these data were manually downloaded from the SkySpark user interface
  • labelled-skyspark-data - data manually labelled as anomalous or normal
  • testing-data - data that has been manually altered to replicate a set of a data from an original sensor to support code testing

A more detailed README is provided in the directory.

Project Documents

Project deliverables for the MDS:

  • proposal - proposal report and presentation
  • final-report - final report, executive summary, and presentation

Project Management

Various project management documents/tools used including:

  • meeting-minutes - client meeting minutes and sprint planning documents
  • personal-logs - each team-members daily time logs, summary of weekly time spent in weekly-summary.xlsx as well as a README with a brief summary of each week
  • weekly-updates - weekly presentations to UBC supervisors

w2020-data599-capstone-projects-ubc-udl's People

Contributors

wraysmith avatar rykoe avatar mqharris avatar github-classroom[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.