distribution-pandas-dtype's Introduction

Distribution `dtype` for Pandas

A small proof-of-concept about using a custom Pandas dtype to store probability distributions.

...

Getting started

Install the package from source:

pip install git+https://github.com/jojolebarjos/distribution-pandas-dtype.git

Additionally, you may want to install pyarrow, to support serialization as Parquet files:

pip install pyarrow

At import, the extension dtypes are registered into the pandas ecosystem. Behind the scenes, the data is stored as a structured NumPy array, which is designed to store C-style structures. The multi-values nature of these objects require some care, and does not play nicely with some indexing operations; as such, Pandas does not accept structured NumPy dtypes for their built-in series.

To circumvent this limitation, there are several approaches to initialize distribution series. The simplest one is to create a zero-initialized series, and update the fields separately:

import distribution

x = pd.Series(index=range(5), dtype="dist[lognorm]")
x.dist["mu"] = [1, 2, 3, 4, 5]
x.dist["sigma"] = 0.1

Relevant links

Recommend Projects

jojolebarjos / distribution-pandas-dtype Goto Github PK

distribution-pandas-dtype's Introduction

Distribution `dtype` for Pandas

Getting started

Relevant links

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

jojolebarjos / distribution-pandas-dtype Goto Github PK

distribution-pandas-dtype's Introduction

Distribution dtype for Pandas

Getting started

Relevant links

Recommend Projects

Recommend Topics

Recommend Org

Distribution `dtype` for Pandas