Git Product home page Git Product logo

nab's Introduction

The Numenta Anomaly Benchmark Build Status

Welcome. This repository contains the data and scripts necessary to replicate the results in the forthcoming Numenta Anomaly Benchmark (NAB) conference publication. Also provided are the tools to run NAB scoring on your own anomaly detection algorithms; see the NAB entry points info. Competitive results tied to open source code will be posted in the wiki on the Scoreboard. Let us know about your work by submitting a pull request.

This readme is a brief overview and contains details for setting up NAB. Please refer to the NAB Whitepaper in the wiki for more details about NAB scoring, data, motivation, etc.

Corpus

The NAB corpus of timeseries data files is designed to provide data for research in streaming anomaly detection. It is comprised of both artificial and real-world timeseries data containing labeled anomalous periods of behavior.

All data are ordered, timestamped, single-valued metrics collected at 5-minute intervals.

Much of the real-world data are values from AWS server metrics as collected by the [AmazonCloudwatch service](https://aws.amazon .com/documentation/cloudwatch/). Example metrics include CPU Utilization, Network Bytes In, and Disk Read Bytes. There are also real world sensor readings from some large machines.

All data is included in the repository. We are in the process of adding more data, and actively searching for more data. The NAB version will be updated whenever new data is added to the corpus; NAB is currently in v0.8.

Task

Detect anomalous behavior in streaming data in real-time and provide useful alerts.

Your anomaly detector must be able to handle streaming data. Post-hoc analysis is insufficient for this task. All classifications must be done as if the data is being presented for the first time, in real time. Anomalies must be detected within a reasonable amount of time.

This benchmark is representative of a task in human time-scales. Per-record classification should take place in less than 5 minutes. Anomaly detection should happen as quickly as possible following the onset of an anomaly.

It is insufficient to just catch all anomalies. A detector with a high false positive rate is of little use. I.e. many false positives will reduce or eliminate an institution's willingness to use your technique; you must minimize the cost of using your detection technique.

Installing NAB 0.8

Supported Platforms

  • OSX 10.9 and higher
  • Amazon Linux (via AMI)

Other platforms may work but have not been tested.

Initial requirements

You need to manually install the following:

Download this repository
cd ~/
git clone https://github.com/numenta/NAB.git
Install the Python requirements
cd NAB
(sudo) pip install -r requirements.txt

This will install the additional required modules pandas and simplejson.

Install NAB
(sudo) python setup.py develop

Or with manual PYTHONPATH setup, rather than sudo:

python setup.py develop --prefix=/some/other/path/

Usage

Run NAB
cd /path/to/nab
python run.py

This will produce results files for the anomaly detection methods. Included in the repo are the Numenta anomaly detection method, as well as methods from the Etsy Skyline anomaly detection library, and a random detector. This will also pass those results files to the scoring script to generate final NAB scores.

For details on how to run your own detector please see the NAB Entry Points diagram in the wiki.

To view a description of the command line options please enter

python run.py --help 

Once NAB is finalized (not yet!) to replicate results exactly you will need a specific version of NuPIC:

cd /path/to/nupic/
git checkout -b nab {TAG NAME}

Then follow build directions in the NuPIC "README.md".

Data

Data and results files

This repo contains a corpus of 32 data files of time-series data. The format of the CSV files is specified in Appendix F of the NAB Whitepaper. The detector under test will read in, and be scored on, all data files in the corpus. The format of results files is also specified in the whitepaper posted in the wiki.

Data and results visualization

There is currently a simple data visualizer available, useful in hand labeling datasets. To use it do the following:

First generate the list of data files and result files:

ls -1 */*/*.csv | grep data > data_file_paths.txt 
ls -1 */*/*/*.csv | grep results | grep -v test_results > results_file_paths.txt

From the NAB directory, type:

python -m SimpleHTTPServer 12345

Then, open Chrome and type this into the url window:

localhost:12345/nab_visualizer.html

To view data, click on "look at data", click in query window and press RETURN key. This should show all the data files. You can also filter the files by keyword with the query window; it will filter for filenames that contain the entered characters.

To get a string of the timestamp at a data point, simply click on the data point.

To zoom in on a region of data, drag the cursor to highlight the section of interest. To zoom back out, double-click the screen.

To view result files, click on "look at results" first.

There is a plotting script available in the scripts directory, which will generate plots via the plotly API; requires a (free) API key. To generate examples, run from the NAB directory:

python scripts/plot.py

Modify the script to plot specific NAB data and/or results files.

nab's People

Contributors

iandanforth avatar boltzmannbrain avatar simjega avatar subutai avatar tomsilver avatar rhyolight avatar scottpurdy avatar chetan51 avatar breznak avatar

Watchers

James Cloos avatar Mauricio Roman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.