Git Product home page Git Product logo

digestible's Introduction

digestible CircleCI

A modern C++ implementation of a merging t-digest data structure.

t-digests are data structures that provide highly accurate statistics for a large number of inputs in a small memory footprint. Things like minimum, maximum, average, quantile, and continuous distribution are readily calculated. Think improved histogram: compact way of accurately summarizing data where the bucket sizes need not be set in advance.

Notable features include:
  • No runtime memory allocation.
  • Strict, user-defined upper bound on structure size.
  • Written in modern C++ (c++17).
  • Templated mean and weight data types - save memory for smaller input ranges with a smaller data type.
  • Speed (average 61ns insertion time per element with a c. 2019 Intel i7)
Requirements:
  • A C++17 compatible compiler (developed with clang 7 using --std=c++17).
  • CMake (for building utility and running tests).

Sample Code

#include "digestible/digestible.h"

using namespace std;
using namespace digestible;

// Structure will have at most 20 elements.
// Also default to values of type float and weights of type unsigned.
tdigest digest(20);

// Add some data into the structure.
digest.insert(5);
digest.insert(10);
digest.insert(4);

// digestible buffers input data; explicitly merge it before querying statistics.
digest.merge();

printf("50th quantile is: %f\n", digest.quantile(50));

Additional examples can be found in the tests directory.

digestible Utility

To assist in integrating digestible in your implementation, please see utility directory.

Compression Factor

As the lone parameter to the digestible constructor, compression factor requires some discussion. Internally t-digests use the compression factor to distribute values throughout the input range. When combined with an internal distribution function a t-digest will distribute data such that the head and tail have fewer points and the middle has significantly more. Higher compression factors will yield better accuracy at the expense of increased memory usage. The digestible utility can be used to help strike this balance.

Data Structure Size

For a given compression factor f, value type size v, and weight type size w, the approximate t-digest size in bytes is:

f * max(v, w) * 8 + 64

Acknowledgments

This work would not be possible without the original algorithm and reference implementation by tdunning. A local copy of the original t-digest paper can be found here.

digestible's People

Contributors

cliffc-spirent avatar derangedmonkeyninja avatar djoyner avatar dmortondev avatar yutkin avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

digestible's Issues

quantile(50) returns 0 with two nonzero items

#include <cstdint>
#include <stdio.h>

using rtt_container = digestible::tdigest<uint64_t, unsigned>;

int main()
{
        rtt_container rtts = rtt_container(16);
        rtts.insert(100);
        rtts.insert(1000);
        rtts.merge();
        printf("size: %zu\n", rtts.size());
        printf("min: %lu\n", rtts.min());
        printf("quantile(49.99): %g\n", rtts.quantile(49.99));
        printf("quantile(50): %g\n", rtts.quantile(50.0));
        printf("max: %lu\n", rtts.max());
        return 0;
}

outputs:

size: 2
min: 100
quantile(49.99): 100
quantile(50): 0
max: 1000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.