Git Product home page Git Product logo

compression_patern_detection's Introduction

Financial Data Pattern Recognition Using The Analysis Of Compressed Representation Of Time Series

Introduction

An approach to recognizing patterns in financial data by analyzing its compressed representation, diverging from traditional methods due to the potential for increased efficiency.

This particular method could be easily adopted with hardware accelerators.

The current implementation works exclusively on synthetic data generated by the generate_synthetic_data function.

Original White Paper

Please refer to: Zhiying Jiang, Matthew Y.R. Yang, Mikhail Tsirlin, Raphael Tang, Yiqin Dai, and Jimmy Lin 2023. “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors - link to the Paper.

This project is inspired by a method of text classification using compression techniques. The original paper explores text classification with a focus on compressors, including gzip, and their application in various settings, such as cross-entropy and kNN methods.

Background

While the inspiration for this project stems from the ACL 2023 paper on text classification using compressors, this approach is distinct. Instead of directly classifying text, we focus on quantifying and identifying patterns in financial data by examining its compressed representation.

Compression techniques are particularly suited for financial data because they can capture intricate patterns and redundancies, providing a condensed yet informative view of the data.

Methodology

  • Synthetic Data generation: Synthetic financial datasets with embedded "Head and Shoulders" patterns are generated for testing and validation. This is an extremely simple approach.
  • Analysis of Compressed Data: Examination of financial data in its compressed form.
    • Compression algorithms inherently capture the underlying patterns within input data, making this approach crucial for unearthing and understanding these patterns and associated trends.
  • Compression-Based Metrics: The Normalized Compression Distance (NCD) and Compression Ratio of the pattern + window derive the insights from the compressed data. These metrics are chosen because they provide a standardized way to quantify the efficiency and pattern similarities in compressed data.
  • Cosine Similarity: We calculate the cosine similarity between compressed financial data patterns and the reference pattern. This measure helps determine the similarity between the two patterns, with values closer to 1 indicating higher similarity.

Usage

Execute main.py to run the pattern recognition on synthetic data and visualize the results.

Known Issues

  • The detection mechanism is tailored to work exclusively with the simplest form of synthetic data generated with generate_synthetic_data function.
  • At this moment, even a simple trend generate_synthetic_data_with_trend makes the detection fail.

Future Work

The project is in its early stages and is open to refinements. Feedback and collaboration are encouraged, especially to enhance the methodology and address current limitations.

Test output

Test output

Test output

Test output knn

Test output knn

compression_patern_detection's People

Contributors

lohi-synthesizers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.