Git Product home page Git Product logo

moda's Introduction

Moda

Models and evaluation framework for trending topics detection and anomaly detection.

Moda provides an interface for evaluating models on either univariate or multi-category time-series datasets. It further allows the user to add additional models using a scikit-learn style API. All models provided in Moda were adapted to a multi-category scenario using by wrapping a univariate model to run on multiple categories. It further allows the evaluation of models using either a train/test split or a time-series cross validation.

Installation

pip install moda

Usage

Turning a raw dataset into a moda dataset:

moda uses a MultiIndex to hold the datestamp and category. All models have been adapted to accept such structure. The input dataset is assumed to have an entry per row and a datestamp column called 'date'. An additional 'category' column is optional. As a first step, the dataset is aggregated to a fixed size time interval, and a new dataset with a 'date','category' (optional) and 'value' columns is created. A MultiIndex of 'date' (pandas DatetimeIndex) and 'category' is the dataset's index.

import pandas as pd
from moda.dataprep import raw_to_ts, ts_to_range

DATAPATH = "example/SF_data/SF311-2008.csv"
# The full dataset can be downloaded from here: https://data.sfgov.org/City-Infrastructure/311-Cases/vw6y-z8j6/data
TIME_RANGE = "24H" # Aggregate all events in the raw data into 3 hour intervals

# Read raw file
raw = pd.read_csv(DATAPATH)

# Turn the raw data into a time series (with date as a pandas DatetimeIndex)
ts = raw_to_ts(raw)

# Aggregate items per time and category, given a time interval
ranged_ts = ts_to_range(ts,time_range=TIME_RANGE)

Run a model:

Run one model, and extract metrics using a manually labeled set

from moda.evaluators import get_metrics_for_all_categories, get_final_metrics
from moda.dataprep import read_data
from moda.models import STLTrendinessDetector

model = STLTrendinessDetector(freq='24H', 
                              min_value=10,
                              anomaly_type='residual',
                              num_of_std=3, lo_delta=0)

# Take the entire time series and evaluate anomalies on all of it or just the last window(s)
prediction = model.predict(dataset)
raw_metrics = get_metrics_for_all_categories(dataset[['value']], prediction[['prediction']], dataset[['label']],
                                             window_size_for_metrics=1)
metrics = get_final_metrics(raw_metrics)

## Plot results for each category
model.plot(labels=dataset['label'])

Model evaluation

Example for a train/test split and evaluation

from moda.evaluators import get_metrics_for_all_categories, get_final_metrics
from moda.dataprep import read_data
from moda.models import STLTrendinessDetector

dataset = read_data("datasets/SF24H_labeled.csv")
print(dataset.head())

model = STLTrendinessDetector(freq='24H', 
                              min_value=10,
                              anomaly_type='residual',
                              num_of_std=3, lo_delta=0)

# Take the entire time series and evaluate anomalies on all of it or just the last window(s)
prediction = model.predict(dataset)
raw_metrics = get_metrics_for_all_categories(dataset[['value']], prediction[['prediction']], dataset[['label']],
                                             window_size_for_metrics=1)
metrics = get_final_metrics(raw_metrics)
print('f1 = {}'.format(metrics['f1']))
print('precision = {}'.format(metrics['precision']))
print('recall = {}'.format(metrics['recall']))

## Plot results for each category
#model.plot(labels=dataset['label'])   

Examples

A jupyter notebook with this example can be found here.

A more detailed example which includes an exploratory data analysis can be found here

Models currently included:

  1. Moving average based seasonality decomposition (MA adapted for trendiness detection)

A wrapper on statsmodel's seasonal_decompose. A naive decomposition which uses a moving average to remove the trend, and a convolution filter to detect seasonality. The result is a time series of residuals. In order to detect anomalies and interesting trends in the time series, we look for outliers on the decomposed trend series and the residuals series. Points are considered outliers if their value is higher than a number of standard deviations of the historical values in a previous window. We evaluated different policies for trendiness prediction: 1. residual anomaly only, 2. trend anomaly only, residual OR trend anomaly, residual AND trend anomaly. This is the baseline model, which gives decent results when seasonality is more or less constant.

  1. Seasonality and trend decomposition using Loess (Adapted STL)

STL uses iterative Loess smoothing to obtain an estimate of the trend and then Loess smoothing again to extract a changing additive seasonal component. It can handle any type of seasonality, and the seasonality value can change over time. We used the same anomaly detection mechanism as the moving-average based seasonal decomposition. Wrapper on (https://github.com/jrmontag/STLDecompose) Use this model when trend and seasonality have a more complex pattern. It usually outperforms the moving average model.

Example output plot for STL: STL The left hand side shows the origin (top) and decomposed time series (Seasonal, trend, residual) The right hand side shows anomalies found on the residuals time series (top), trend, prediction (combination of residuals and trend anomalies), and ground truth (bottom).

  1. Azure anomaly detector

Use the Azure Anomaly Detector cognitive service as a black box for detecting anomalies. Azure Anomaly finder provides an upper bound that can be used to estimate the degree of anomaly. This model is useful when the anomalies have a relatively complex structure

  1. Twitter

A wrapper on Twitter's AnomalyDetection package (https://github.com/Marcnuth/AnomalyDetection) This model is similar to (1) and (2), but has a more sophisticated way of detecting the anomalies once the time series is analyzed.

  1. LSTMs

Trains a forecasting LSTM model, and compares the prediction value at time t vs. the actual value at time t. Then, estimate the difference by comparison to the standard deviation of previous differences. This is useful only when there exists enough data for representing the time series pattern.

An example on running LSTMs can be found here

Runing tests and linting

Moda uses pytest for testing. In order to run tests, just call pytest from moda's main directory. For linting, this module uses PEP8 conventions.

moda's People

Contributors

omri374 avatar

Watchers

James Cloos avatar Michael VS avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.