pydens, density estimation in python

pydens provides a unified interface to several density estimation packages, including an implementation of classifier-adjusted density estimation. Examples in the /notebooks directory include

Applications of density estimation include

Detecting data drift: The reliability of a trained model's prediction at a new data point depends on the similarity between the new point and the training data. A density function trained on the training data can serve as a warning of data drift if the evaluated density at the new point is exceptionally low. One way to focus such an analysis is to train and evaluate the density using only several of the most-important features in the model.
Mode detection: Locating regions of high density is a first step to efficiently allocate resources to address an epidemic, market a product, etc.
Feature engineering: The density at a point with respect to any subset of the dimensions of a feature space can encode useful information.
Anomaly/novelty/outlier detection: A "point of low density" is a common working definition of "anomaly", although it's not the only one. (In astrostatistics, for example, a density spike may draw attention as a possible galaxy.)

Evaluating the performance of a density estimator is not straightforward. We rely on a mix of simulation, real-data sanity checks, and cross-validation in special cases, as detailed in our evaluation guide.

Installation

Not yet on pypi or conda forge, but installation is still easy with pip:

pip install git+https://github.com/zkurtz/pydens.git#egg=pydens

License

MIT. See LICENSE.

Related work

A case has been made for extending boosted trees to include density estimation. See also Liu and Wong (2014) and Li, Yang, Wong (2016)
A review of density estimation packages in R appears not to find any approach that can handle more than 6 features
A 'nearest neighbors' fastkde
Random forests
Outlier detection with sklearn
Intersection of density estimation and generative adversarial networks

Wishlist

Infrastructure:

expand code testing coverage
define new simulations

Tutorials, starting with

how CADE works
density estimation trees

Density estmation:

Implement a dimensionality-reduction pre-processing method. Extreme multicolinearly is a potential failure mode in CADE due to the feature independence assumption for the naive density estimate.
Merge the best of the tree-based methods of LightGBM, detpack, Schmidberger and Frank, and astropy.stats.bayesian_blocks.

b2220333 / pydens Goto Github PK

pydens's Introduction

pydens, density estimation in python

Installation

License

Related work

Wishlist

pydens's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent