Git Product home page Git Product logo

pandas-historical's Introduction


pandas_historical

Beautiful way to store historical metrics compressed in pandas dataframe.

codecov CI

What is it

Library to store metrics or logs in a format where only the dates of value changes are stored.
This drastically reduces the space required to store historical values.

It also allows you to get a slice of the historical values for a certain date.

compression example

Install it from PyPI

pip install pandas_historical

Usage

Take a historical dataframe

Let's take a table of historical values. For example currency rates.

Suppose we periodically scrap currency rates from a certain site,
and then write the resulting value with the date of scraping to the table.

import pandas as pd

currencies_scraping = pd.DataFrame([
    {
        'date': '2022-02-17',
        'key': 'DOLLAR',
        'value': 78,
        'scraping_id': 123
    },
    {
        'date': '2022-02-18',
        'key': 'DOLLAR',
        'value': 78,
        'scraping_id': 123
    },
    {
        'date': '2022-02-19',
        'key': 'DOLLAR',
        'value': 78,
        'scraping_id': 123
    },
    {
        'date': '2022-02-20',
        'key': 'DOLLAR',
        'value': 78,
        'scraping_id': 123
    },
    {
        'date': '2022-02-21',
        'key': 'DOLLAR',
        'value': 78,
        'scraping_id': 123
    },
    {
        'date': '2022-02-21',
        'key': 'EURO',
        'value': 87,
        'scraping_id': 124
    },
    {
        'date': '2022-02-22',
        'key': 'DOLLAR',
        'value': 78,
        'scraping_id': 124
    },
    {
        'date': '2022-02-28',
        'key': 'DOLLAR',
        'value': 105,
        'scraping_id': 124
    },
    {
        'date': '2022-03-06',
        'key': 'DOLLAR',
        'value': 139,
        'scraping_id': 125
    },
    {
        'date': '2022-03-07',
        'key': 'EURO',
        'value': 148,
        'scraping_id': 125
    }
])
currencies_scraping
date key value scraping_id
0 2022-02-18 DOLLAR 78 123
1 2022-02-18 DOLLAR 78 123
2 2022-02-19 DOLLAR 78 123
3 2022-02-20 DOLLAR 78 123
4 2022-02-21 DOLLAR 78 123
5 2022-02-21 EURO 87 123
6 2022-02-22 DOLLAR 78 124
7 2022-02-28 DOLLAR 105 124
8 2022-03-06 DOLLAR 139 125
9 2022-03-07 EURO 148 125

Now let's apply pandas_historical to it

Now let's turn this table into a table that stores only the dates when the values appeared or changed.

from pandas_historical import make_value_change_events_df

value_change_events_df = make_value_change_events_df(currencies_scraping)
value_change_events_df

Take a look at. Some of the rows is missing.
In not compressed dataframe you have 9 rows, in compressed โ€” 4
If you'll have more days without value changed, you'll have more compression rate.

date key value scraping_id
0 2022-02-18 DOLLAR 78 123
1 2022-02-28 DOLLAR 105 124
2 2022-03-06 DOLLAR 139 125
3 2022-02-21 EURO 87 123
4 2022-03-07 EURO 148 125

Now let's add the new values we got from the last scraping.

from pandas_historical import update_value_change_events_df

new_values = pd.DataFrame([
    {
        'date': '2022-03-10',
        'key': 'DOLLAR',
        'value': 105,
        'scraping_id': 127
    },
    {
        'date': '2022-03-11',
        'key': 'DOLLAR',
        'value': 113,
        'scraping_id': 127
    },
    {
        'date': '2022-03-11',
        'key': 'EURO',
        'value': 144,
        'scraping_id': 127
    }
])
value_change_events_df = update_value_change_events_df(
    value_change_events_df, new_values
)
value_change_events_df

You can see that of the two records with the dollar rate for 2022-02-28 and 2022-03-10, only 2022-02-28 remains
because in the final dataframe remain only dates of changes and the first occurrence of values

date key value scraping_id
0 2022-02-18 DOLLAR 78 123
1 2022-02-28 DOLLAR 105 124
2 2022-03-06 DOLLAR 139 125
3 2022-03-11 DOLLAR 113 127
4 2022-02-21 EURO 87 123
5 2022-03-07 EURO 148 125
6 2022-03-11 EURO 144 127
from pandas_historical import get_historical_state

get_historical_state(value_change_events_df)
date key value scraping_id
1 2022-03-11 00:00:00 DOLLAR 113 127
2 2022-03-11 00:00:00 EURO 144 127
get_history_state(value_change_events_df, state_date='2022-03-07')
date key value scraping_id
1 2022-03-06 00:00:00 DOLLAR 139 124
2 2022-03-07 00:00:00 EURO 144 125

Development

Read the CONTRIBUTING.md file.

pandas-historical's People

Contributors

dvvolynkin avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.