Git Product home page Git Product logo

tsextract's Introduction

logo

tsExtract: Time Series Preprocessing Library

tsExtract is a time series preprocessing library. Using sliding windows, tsExtract allows for the conversion of time series data to a form that can be fed into standard machine learning regression algorithms like Linear Regression, Decision Trees Regression as well as Deep Learning.

enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here
Open Source Helpers Downloads Downloads Downloads

Installation

pip

pip install tsextract

conda

conda install -c cydal tsextract

Main Features

  • Take sliding window of data and with that, create additional columns representing the window.
  • Perform differencing on windowed data to remove non-stationarity.
  • Calculate statistics on windowed and differenced data. These include temporal and spectral statistics functions.
  • Plot visualisations. These include -
    • Actual vs Predicted line and scatter plots
    • Lag correlation

Usage

print(df.head())
Date DAYTON_MW
2004-12-31 01:00:00 1596.0
2004-12-31 02:00:00 1517.0
2004-12-31 03:00:00 1486.0
2004-12-31 04:00:00 1469.0
2004-12-31 05:00:00 1472.0

Using the main build_features function

build_features takes in 4 arguments -

  • Data: Time series data in 1d.

  • Request Dictionary: Dictionary with the function type and parameters

  • Include_tzero (optional) - This gives the option on whether to include the column t+0. Can be quite handy when implementing difference networks.

  • target_lag - Sets lag value. If predicting 10 hours into the future, then a value of 10 should be included. Default is 3.

from tsextract.feature_extraction.extract import build_features

features_request = {
    "window":[10]
}

features = build_features(df["DAYTON_MW"], features_request, include_tzero=False)

The example above sends in a request for a sliding window size of 10. What is returned is a dataframe with 10 columns equal to the window size passed in. The final column is the target column with values shifted 3 time steps in the future.

enter image description here

Features

  • window: Takes sliding window of the data. Parameter(s) passed in as a list. A single value will take a sliding window corresponding to that value. A parameter of 10 will take windows from 1 to 10. If [5, 10] is passed in instead, then a window of 5 to 10 time steps will be taken instead.

  • window_statistic: This performs windowing like above, but then applies specified statistic operation to reduce matrix to a vector of 1d.

  • difference/momentum/force: Performs differencing by subtracting from the value in the present time step, the value in the previous time step. The parameter expected is a list of size 2 or 3. Just like in windowing, the first value refers to the window size. Two windowing values may also be passed in for windows in that range. The final value is the lag, this refers to the differencing lag for subtraction. A difference lag of 1 means values are subtracted from immediate past values (t3-t2, t2-t1, t1-t0 e.t.c) while a difference lag of 3 will subtract from 3 time steps before (t6-t3, t5-t2, t4-t1 e.t.c). Momentum & Force are 2nd & 3rd order differences.

  • difference_statistic/momentum_statistic/force_statistic: Similarly, this performs the operations described above, but then applies the specified statistic.

from tsextract.feature_extraction.extract import build_features
from tsextract.domain.statistics import median, mean, skew, kurtosis
from tsextract.domain.temporal import abs_energy

features_request = {
    "window":[2], 
    "window_statistic":[24, median], 
    "difference":[12, 10],
    "difference_statistic":[15, 10, abs_energy], 
}

features = build_features(df["DAYTON_MW"], features_request, include_tzero=True, target_lag=3)

enter image description here

Summary Statistics

As described above, rather than take raw windowing or differencing matrix values, it is possible to take some summary statistic of it. See supported features.

Statistics Temporal Spectral
Mean Absolute Energy Spectral Centroid
Median AUC
Range Mean Absolute Difference
Standard Deviation Moment
Minimum Autocorrelation
Maximum Zero Crossing Rate
Range
Variance
Kurtosis
Skew
IQR
MAE
RMSE

Dependencies

  • pandas >= 1.0.3
  • seaborn >= 0.10.1
  • statsmodels >= 0.11.1
  • scipy >= 1.5.0
  • matplotlib >= 3.2.1
  • numpy >= 1.16.4

License

GNU GPL V3

Contribute

Contributors of all experience levels are welcome. Please see the contributing guide.

Article

https://sijpapi.medium.com/preprocessing-time-series-data-for-supervised-learning-2e27493f44ae

Source Code

You can get the latest source code

git clone https://github.com/cydal/tsExtract.git

tsextract's People

Contributors

cydal avatar rogomes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tsextract's Issues

Momentum & Force Losses

In addition to using standard metrics like rmse/mae for optimization/learning, it is also possible to simultaneously use momentum & force losses. These are especially helpful not just in reducing the error, but also to reduce lag. Work especially well with momentum & force features.

Can also help with data noise as a result of differencing performed on force & momentum features.

This would need to be implemented to work with standard ML libraries like Sklearn, Keras & Pytorch.

Evaluation plots

To add functionality for viewing lag correlation, actual vs predicted & scatter plots.

Requesting Multiple instances of same feature

Because the input is in a dictionary format, with the key representing the operation requested and the value being a list for the operation parameters - e.g to perform a differencing operation with window size of 24 and lag of 10 -

feature_dict = { "difference": [24, 10] }

As it is, only one operation may be included at a time. A different differencing request could not be made. One solution to this would be to allow the differencing operations to be passed in as additional elements in the same list. In addition to the above example, also include a differencing for windows between 30 & 40, with a lag of 7 would be -

feature_dict = { "difference": [[24, 10], [30, 40, 7]] }

Still todo is to implement a disjointed window request.

As it is, when a window of size 3 is requested, the function returns T-1, T-2, & T-3. The user should be able to for example, request a window of T-5 to T-10, or T-12 to T-24. So rather than requesting a window size from present to a specified time in the past, the user should be able to request windowing from a time other than the present, to any other time before that.

New function to build predictors

At the moment, build_features generates the predictors as well as the target variable. To make predictions, we need to also be able to generate the predictors for a timestamp. To summarize, after building a model and we wish to make a prediction for a new time step, how do we generate the required column values to feed to the trained model? build_features builds for training, a different function is needed for inference.

Add tests

Add automated tests with other time series data.

differencing computation

Explore: At the moment, when differencing is requested, first a sliding window is extracted, then differencing is performed on that matrix, this may be more efficiently implemented if first a differencing is done on the 1-d data, then a sliding window applied to that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.