Git Product home page Git Product logo

pyhampel's Introduction

pyhampel

This package implements Hampel filter for time series data.

What is a Hampel Filter?

The Hampel filter is an algorithm that uses median absolute deviation (MAD) and a sliding window to detect outliers and replace them with rolling median values.

What Are Uses For Hampel Filters?

  • Detecting outliers.
  • Filter noisy data by replacing outliers with median.

How Does it Work?

For each point, a median and standard deviation are calculated using all neighboring values within a window of size, w. If the point of interest lies outside of a specified number of standard deviations from the median it is flagged as an outlier.


Hampel Filter With Window Centered

HampelFilterWindowCentered

  • For each time step, the median of the points in the window is calculated, resulting in a rolling median set of points.
  • Also, for each time step, the median absolute deviation is calculated, resulting in a rolling median absolute deviation set of points.
  • Using a scaling factor to estimate standard deviation based on MAD, we calculate a number of deviations from the rolling median. If the original point falls outside of the range of the median +/- deviation, then it is an outlier and the filtered value is the rolling median value calculated earlier.
  • In this particular case, the rolling window is centered on the point being evaluated with points to the left and right of center used in the calculation.

Hampel Filter With Point on Leading Edge

This option is more sensitive to changes in value from one period to the next.

HampelFilterWindowCentered

  • For each time step, the median of the points in the window is calculated, resulting in a rolling median set of points.
  • Also, for each time step, the median absolute deviation is calculated, resulting in a rolling median absolute deviation set of points.
  • Using a scaling factor to estimate standard deviation based on MAD, we calculate a number of deviations from the rolling median. If the original point falls outside of the range of the median +/- deviation, then it is an outlier and the filtered value is the rolling median value calculated earlier.
  • In this particular case, the leading edge of the rolling window is on the point being evaluated with points to the left used in the calculation.

Window Size and Standard Devation?

There is no universal window size or threshold to use. The window size and threshold will need to be determined based on the characteristics of the data and the application of the Hampel filter.

How do I use it?

Input a dataframe with time series data and pyhampel will generate a new dataframe that adds columns for filtered data, outlier values, and a boolean flag indicating if a data point is an outlier.

pyhampel's People

Contributors

dwervin avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

pyhampel's Issues

suggestion: orders of magnitude speed-up using Numba JIT

In my machine, the following demonstrates more than 100x speed-up.

import pandas as pd
import numpy as np
import timeit
import hampel # assumes `pip install hampel`

def much_faster_hampel(series, window_length, threshold=3):
  # Please refer to [1] for `hampel_filter_forloop_numba` implementation.
  # [1] https://gist.github.com/adrianomitre/2ced5544421eca031180782cc30798af 
  result, outlier_indices = hampel_filter_forloop_numba(series.values, window_length, threshold)
  result[outlier_indices] = np.nan
  return result

ts_len = 2**8
win_sz = 7

ts = pd.Series(np.random.rand(ts_len))

# causes JIT compilation
much_faster_hampel(ts, win_sz)

timeit.timeit(lambda: hampel.hampel(ts, win_sz), number=100)
timeit.timeit(lambda: much_faster_hampel(ts, win_sz), number=100)

Please refer to this gist for hampel_filter_forloop_numba implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.