camdavidsonpilon / tdigest Goto Github PK
View Code? Open in Web Editor NEWt-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
License: MIT License
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
License: MIT License
Hey there. Great work on this! I just wanted to let you know I've included your implementation in a benchmark I've started here. So far it is the most accurate method, but alas not the fastest.
Do you have a recommended serialization/deserialization strategy for passing these digests?
pickle would work but before I add an attack vector i was wondering if you had another solution.
I ran the tests below and found out that on PyPy tdigest is horribly slow.
# -*- coding: utf-8 -*-
from __future__ import print_function
import sys
from tdigest import TDigest
from numpy.random import randint, random
from time import time
def make_tdigest(items):
result = TDigest()
for _ in range(items):
result.update(random())
return result
def make_tdigest2(items):
result = TDigest()
result.batch_update(random(items))
return result
def tdigests(count, factory):
i = 0
for _ in range(count):
i+=1
if i%100==0:
print('generated items:', i)
yield dict(timestamp=randint(1,15), tdigest=factory(500))
if __name__=='__main__':
print('running test in', sys.version)
print('generating tdigests in batch')
start = time()
result = [t for t in tdigests(100, make_tdigest2)]
end = time() - start
print('generating tdigests took:', end)
print('----------')
print('generating tdigests one by one')
start = time()
tdigests = [t for t in tdigests(100, make_tdigest)]
end = time() - start
print('generating tdigests took:', end)
print('----------')
==========
PyPy
running test in 2.7.13 (0e7ea4fe15e82d5124e805e2e4a37cae1a402d4b, Jan 06 2018, 12:46:49)
[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
generating tdigests in batch
generated items: 100
generating tdigests took: 32.5672068596
----------
generating tdigests one by one
generated items: 100
generating tdigests took: 17.4209430218
----------
==================
Python
running test in 2.7.14 (default, Mar 9 2018, 23:57:12)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
generating tdigests in batch
generated items: 100
generating tdigests took: 4.16117596626
----------
generating tdigests one by one
generated items: 100
generating tdigests took: 2.38711595535
----------
I've repeated the test in the official PyPy docker container with PyPy 6.0.0 (compatible with python 3) with the same outcome: https://hub.docker.com/_/pypy/
Any ideas?
I have
my_set.zip.
When I'm using the java code:
import com.tdunning.math.stats.TDigest;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;
import java.io.*;
import java.util.stream.StreamSupport;
public class TDigestTry {
public static void main(String[] args) throws IOException {
ClassLoader classLoader = TDigestTry.class.getClassLoader();
File file = new File(classLoader.getResource("my_set.csv").getFile());
Reader in = new FileReader(file);
Iterable<CSVRecord> records = CSVFormat.EXCEL.withHeader().parse(in);
TDigest digest = TDigest.createAvlTreeDigest(20);
StreamSupport.stream(records.spliterator(), false)
.map(record -> new Double(record.get("change rate")))
.forEach(digest::add);
System.out.println(digest.quantile(0.05));
System.out.println(digest.quantile(0.95));
}
}
I'm getting the results:
3.0
5.0
But when I'm this code:
from pathlib import Path
import pandas
from tdigest import TDigest
if __name__ == '__main__':
frame = pandas.read_csv(Path(__file__).parents[0].joinpath("resources").joinpath("my_set.csv"))
digest = TDigest()
digest.batch_update(frame["change rate"].values)
print(f"Quantile 0.05 = {digest.percentile(5)};\t\tQuantile 0.95 = {digest.percentile(95)}")
I'm getting the results:
Quantile 0.05 = 2.6495903059149586; Quantile 0.95 = 3689686.790917569
How come there's a large difference between between the 0.95 quantiles?
P.S
same results when I use:
for value in frame["change rate"].values:
digest.update(value)
from tdigest import TDigest
t = TDigest()
t.batch_update(range(10000))
print t.percentile(.50)
# returns something pretty far away from 5000
Hi I noticed that the code is not following up with its Java partner.
Are you still maintaining it?
Environment
Operating System: Mac OSX
Python Version: Python 3.5.4
How did you install tdigest: pip
Error is:
DELC02RC08VG8WN:tdigest-0.5.2.2 priyagupta$ pip3 install tdigest
Requirement already satisfied: tdigest in /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tdigest-0.5.2.2-py3.5.egg (0.5.2.2)
Collecting accumulation_tree (from tdigest)
Using cached https://files.pythonhosted.org/packages/e9/18/73c11ed9d379b5efea5cabcce4b53762ee4b0c3aea42bd944e992f8ee307/accumulation_tree-0.6.tar.gz
ERROR: Complete output from command python setup.py egg_info:
ERROR: Download error on https://pypi.org/simple/cython/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719) -- Some packages may not be found!
Couldn't find index page for 'cython' (maybe misspelled?)
Download error on https://pypi.org/simple/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719) -- Some packages may not be found!
No local packages or working download links found for cython
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/bv/n02l1vdn6mn_rvn5hlq454mjgfm0_c/T/pip-install-jmrecmdu/accumulation-tree/setup.py", line 28, in
Extension('accumulation_tree.accumulation_tree', ['accumulation_tree/accumulation_tree.pyx'])
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/init.py", line 144, in setup
_install_setup_requires(attrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/init.py", line 139, in _install_setup_requires
dist.fetch_build_eggs(dist.setup_requires)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/dist.py", line 717, in fetch_build_eggs
replace_conflicting=True,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/init.py", line 782, in resolve
replace_conflicting=replace_conflicting
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/init.py", line 1065, in best_match
return self.obtain(req, installer)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/init.py", line 1077, in obtain
return installer(requirement)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/dist.py", line 784, in fetch_build_egg
return cmd.easy_install(req)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 673, in easy_install
raise DistutilsError(msg)
distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython')
----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/bv/n02l1vdn6mn_rvn5hlq454mjgfm0_c/T/pip-install-jmrecmdu/accumulation-tree/
See https://github.com/CamDavidsonPilon/tdigest/blob/master/pyspark_example.py
"sc" - I assume is a spark connection
Can you please add a tag to the git project for release 0.5.2.2?
>>> from tdigest import TDigest as TD
>>> td = TD()
>>> td.update(1)
>>> td.quantile(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/tdigest/tdigest.py", line 184, in quantile
delta = (c_i.mean - self.C.prev_item(key)[1].mean) / 2.
File "/usr/local/lib/python2.7/dist-packages/bintrees/abctree.py", line 684, in prev_item
raise KeyError(str(key))
KeyError: '1.0'
a little bit of inspection shows that quantile
breaks if there's only one Centroid in self.C
.
It would be really nice if tags were used in this project so that one could easily see if the current version 0.4.1.0 on PyPI actually refers to which commit and therefore also if a certain bugfix is included or not.
I found this issue with median but then noticed that there are other percentiles which have the same issue. In this case there should not be any negative percentiles as all seen values are positive... furthermore a large negative median when 40th and 60th percentiles are positive is non-sensical. I wonder if this is a bug or a known limitation?
Please see below for an example:
from tdigest.tdigest import TDigest
import numpy as np
vals = [ 8.11780000e+04, 2.14100000e+03, 8.29710000e+04,
7.81110000e+04, 2.30000000e+02, 5.27661000e+05,
2.63252000e+05, 9.16950000e+04, 1.08515000e+05,
6.26000000e+02, 7.90000000e+02, 1.24600000e+03,
4.31357000e+05, 4.64951000e+05, 1.30155000e+05,
3.21239000e+05, 7.13940000e+04, 8.27000000e+02,
1.18700000e+03, 8.00000000e+02, 5.29984000e+05,
4.57174000e+05, 8.13000000e+02, 3.67000000e+02,
5.25310000e+04, 5.62000000e+02, 4.50359000e+05,
1.94000000e+03, 1.36000000e+02, 5.36088000e+05,
4.45300000e+03, 8.06000000e+02, 4.64000000e+02,
1.44000000e+02, 6.54000000e+02, 1.63800000e+03]
td = TDigest()
print("{: >10} {: >10} {: >10} {: >10}".format("value", "median", "td_median", "error"))
for i, val in enumerate(vals):
td.update(val)
actual_median = np.median(vals[:i+1])
td_median = td.percentile(50)
print("%10.0f %10.0f %10.2f %10.2f%%" % (val, actual_median, td_median, abs(td_median - actual_median)/actual_median * 100), end="")
print(("{: >10.0f} " * 9).format(*[td.percentile(x) for x in np.linspace(10, 90, 9)]))
print("")
Results in:
value median td_median error
81178 81178 81178.00 0.00% 81178 81178 81178 81178 81178 81178 81178 81178 81178
2141 41660 81178.00 94.86% 2141 2141 2141 2141 81178 81178 81178 81178 81178
82971 81178 81178.00 0.00% 2141 2141 2141 69054 81178 93302 82971 82971 82971
78111 79644 79963.00 0.40% 2141 2141 66255 82063 79963 80935 81907 82971 82971
230 78111 78111.00 0.00% 230 -17329 2141 58352 78111 79963 81178 82971 82971
527661 79644 79963.00 0.40% 230 -9541 13823 74159 79963 81421 15999 149943 527661
263252 81178 81178.00 0.00% 230 -1753 62304 89967 81178 55660 119386 285487 527661
91695 82074 80341.75 2.11% 230 6035 74159 80449 80342 84549 100709 241454 527661
108515 82971 82971.00 0.00% 230 13823 86015 81421 82971 90418 91359 200380 527661
626 82074 80341.75 2.11% 148 -17230 58352 79963 80342 85309 65626 158466 527661
790 81178 81178.00 0.00% 514 563 -5591 74159 81178 83497 94249 134249 347081
1246 79644 79963.00 0.40% 542 759 1314 13671 79963 81393 90418 117093 326124
431357 81178 81178.00 0.00% 570 821 1516 66255 81178 84549 74204 247110 457798
464951 82074 80341.75 2.11% 598 883 -9389 82063 80342 90418 134249 401102 469766
130155 82971 82971.00 0.00% 626 908 2141 79963 82971 98900 130155 380932 464951
321239 87333 85309.00 2.32% 654 1043 13671 80935 85309 110438 234589 346455 460136
71394 82971 82971.00 0.00% 682 1178 56200 79579 82971 102746 161102 329644 455321
827 82074 80341.75 2.11% 710 850 -1366 76643 80342 95527 137892 312834 450505
1187 81178 81178.00 0.00% 738 887 1341 75193 81178 90418 114681 296023 445690
800 79644 79963.00 0.40% 746 730 1008 52402 79963 85309 91471 279213 440875
529984 81178 81178.00 0.00% 755 769 1151 67596 81178 92972 145629 346455 484212
457174 82074 80341.75 2.11% 764 808 1294 82790 80342 102746 253698 438154 475524
813 81178 81178.00 0.00% 773 814 1271 59999 81178 95527 225035 424560 472000
367 79644 79963.00 0.40% 605 803 1124 5648 79963 90418 153366 410967 468475
52531 78111 78111.00 0.00% 626 806 1187 35218 78111 85309 130155 397373 464951
562 74752 75665.00 1.22% 575 797 883 -423 75665 83497 106944 346455 461427
450359 78111 78111.00 0.00% 588 799 1103 9834 78111 87863 161102 437813 457902
1940 74752 75665.00 1.22% 601 801 1166 -5448 75665 84549 137892 424901 454378
136 71394 71394.00 0.00% 433 816 864 1985 71394 82445 114681 411989 450854
536088 74752 75665.00 1.22% 497 794 1082 -10507 75665 85309 215481 443905 511403
4453 71394 71394.00 0.00% 510 797 1145 2015 71394 83497 145629 450725 479048
806 61962 64999.00 4.90% 523 799 846 2074 64999 81393 122418 437813 475524
464 52531 52531.00 0.00% 444 799 817 1806 52531 81907 99208 424901 472000
144 28492 35795.75 25.63% 355 660 810 1284 35796 80935 114284 411989 468475
654 4453 4453.00 0.00% 367 613 806 1058 4453 79963 108515 399077 464951
1638 3297 -8144.50 347.03% 379 629 808 1223 -8144 78600 102746 346455 461427
Just saying hi. This is very nice work indeed. The application of your work at http://dev.microprediction.org/crawling.html may be more than obvious as this is essentially an online CDF estimation contest (or collection of the same). I've put tdigest top of my list at https://github.com/microprediction/microprediction/projects/4 to ensure it is included. This will, by the way, generate plenty of comparative data that might help your research publications. Happy to explain further. See also http://dev.microprediction.org/july.html
The implementation of the trimmed mean estimate (trimmed_mean method) doesn't look right. The estimate seems way off from the real value. Here is an example:
import numpy as np
from tdigest import TDigest
Creat 10000 samples of random uniform distributition.
x = np.random.random(size=10000)*100
Create a T-Digest for this
d = TDigest()
d.batch_update(x)
Estimate the trimmed mean of X that above the 25% percentile.
tm_estimate = d.trimmed_mean(25,100)
print(tm_estimate)
75.0410094085
Now, find the real 25% percentile and compujte the real trimmed mean.
x_25 = np.percentile(x,25)
x_trimmed = x[x>=x_25]
tm_real = x_trimmed.mean()
print(tm_real)
62.3013933259
I upgraded to 0.5.2 but tdigest.version still returns 0.5.0.
It should be as simple as updating init.py in the tdigest directory.
I have a basic question about serialization and deserialization. How do you suggest that this is done? I ask because my instinct was json.dumps(t.to_dict()) but on the reverse trip, the current implementation of update_from_dict seems to be recalibrating from scratch unless I misunderstand. It is very slow. How should one quickly save and load many tdigests?
Hi, more of a question than an issue but I'm curious what scale function has been used in your implementation. On lines 101/102 you have the threshold function that defines the maximum centroid weight. Comparing this to the paper:
https://arxiv.org/pdf/1903.09921.pdf
your expression is close to the that given in section 5.1 (using the k2 scale function) but, if the definition of Z(n) is what it says in the paper than they're not the same. Can you provide some details on this?
Also thanks so much for the python implementation, great piece of work!
Following program produces negative trimmed mean:
import random
import tdigest
td = tdigest.TDigest()
for i in range(100):
td.update(random.random())
for i in range(10):
td.update(i*100)
mean = td.trimmed_mean(10, 99)
print(mean, td)
Output
-488.7492907267765 <T-Digest: n=110, centroids=110>
It seems like half the time just the one test, test_uniform fails, and other times it's fine.
This issue is just to track that problem with test_uniform. We should figure out what's wrong and fix it.
Dear Cam,
Have you figured out how to serialise the python version of t-digests?
Thanks,
Alex
Hi,
I believe it will be nice and useful to add also conda distribution. (can help with that if needed)
$ ~/opt/venv/bin/pip download --no-deps tdigest
Collecting tdigest
Downloading https://files.pythonhosted.org/packages/27/41/b714941a6dba3760ddf2c2604daabbb578bcd6063f57ecdbe2c1d8ce4a79/tdigest-0.5.2.1-py2.py3-none-any.whl
Saved ./tdigest-0.5.2.1-py2.py3-none-any.whl
Successfully downloaded tdigest
You are using pip version 19.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
$ unzip -l tdigest-0.5.2.1-py2.py3-none-any.whl
Archive: tdigest-0.5.2.1-py2.py3-none-any.whl
Length Date Time Name
--------- ---------- ----- ----
155 2016-08-27 01:30 MANIFEST
4034 2018-03-12 13:33 README.md
1089 2015-03-17 05:04 LICENSE.txt
13056 2018-05-05 13:07 tdigest/tdigest.pyc
861 2017-02-20 02:41 tdigest/test_convergence_of_ks_statistic_over_adding.py
53 2018-05-05 13:24 tdigest/__init__.py
10342 2018-05-05 13:24 tdigest/tdigest.py
248 2018-05-05 13:07 tdigest/__init__.pyc
1827 2018-05-05 13:07 tdigest/__pycache__/test_convergence_of_ks_statistic_over_adding.cpython-27-PYTEST.pyc
4035 2018-05-05 13:25 tdigest-0.5.2.1.dist-info/DESCRIPTION.rst
995 2018-05-05 13:25 tdigest-0.5.2.1.dist-info/metadata.json
8 2018-05-05 13:25 tdigest-0.5.2.1.dist-info/top_level.txt
110 2018-05-05 13:25 tdigest-0.5.2.1.dist-info/WHEEL
4907 2018-05-05 13:25 tdigest-0.5.2.1.dist-info/METADATA
1269 2018-05-05 13:25 tdigest-0.5.2.1.dist-info/RECORD
--------- -------
42989 15 files
wheels should only contain source, though this contains python2.x pyc files
(this is easy to fix, a simple re-release with modern versions of wheel / setuptools / pip will not produce wheels in this manner)
Hi, thanks for the t-digest implementation for python!
I used this for my work and I found in the end, computing t-digest and merging t-digest becoming the bottleneck. So I read the original paper and implemented an another version of it(using the algorithm in the paper). Then I found the performance is better (around 50-100 times faster). I think the improvement part is that we can have some buffer and merge hundred of values into t-digest at once.
I wonder if I could have a PR to this repo and add an alternative implementation to it? So I can use that in my day to day work, thanks.
digest = TDigest()
digest.batch_update([62.0, 202.0, 1415.0, 1433.0])
digest.percentile(0.25)
Returns -136.25
. This is because in https://github.com/CamDavidsonPilon/tdigest/blob/master/tdigest/tdigest.py#L166-L167, delta
is computed as the mean of the means of the neighbouring centroids and is used as the slope to linearly approximate the quantile between the two centroids. In the following line, m_i + ((p - t) / k - 1/2)*delta
is negative because delta
is very large, and p - t = 0
and thus the expression evaluates to m_i + (-1/2)*delta
which is negative.
Recently, tdigest
started to use accumulation_tree
to accelerate lookups. Sadly, accumulation_tree
is licensed under GPL3+ (as per setup.py), which means that tdigest
may not be able to use the MIT license. Also, projects using tdigest
will automatically fall under GPL3 license as well.
I'm interested in the case where a variable takes on discrete values. I created tdigest notebook to illustrate what might be an interesting issue.
Suppose I have sampled many rolls of a die. If I add a tiny amount of noise then tdigest works just fine as a nice representation of the data, with quite an accurate cdf and percentiles.
However, if you run the same spreadsheet with HACK=False then only six centroids are created. This leads to gross inaccuracy in both cdf and percentiles.
I am wondering if there could be a trick here, in order for tdigest to be able to handle cases like this without my hack.
Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/usr/local/lib/python2.7/dist-packages/tdigest/tdigest.py", line 112, in update self._add_centroid(Centroid(x, w)) File "/usr/local/lib/python2.7/dist-packages/tdigest/tdigest.py", line 67, in _add_centroid self.C.insert(centroid.mean, centroid) File "accumulation_tree/accumulation_tree.pyx", line 233, in accumulation_tree.accumulation_tree._AccumulationTree.insert TypeError: unbound method _cython_3_0_0a9.cython_function_or_method object must be called with RBTree instance as first argument (got AccumulationTree instance instead)
my pip env
accumulation-tree==0.6.2 amqp==1.4.6 ansible==1.8 anyjson==0.3.3 APNSWrapper==0.6.1 astroid==1.4.8 atfork==0.1.2 Babel==1.3 backports.functools-lru-cache==1.2.1 backports.ssl-match-hostname==3.4.0.2 beautifulsoup4==4.1.3 billiard==3.3.0.20 biplist==0.6 boilerpipe==1.2.0 boilerpipy==0.2.1b0 boto==2.42.0 bz2file==0.98 cached-property==1.2.0 cachetools==2.0.1 cassandra-driver==3.0.0a1 celery==3.1.18 certifi==2015.4.28 cffi==1.1.0 chardet==2.3.0 Cheetah==2.4.4 chromium-compact-language-detector==0.31415 chronos-python==0.34.0 click==5.1 colorama==0.3.2 configparser==3.5.0 coverage==4.3.4 cssselect==0.9.1 cssutils==1.0 Cython==0.17.4 DateUtils==0.5.2 decorator==3.4.0 Django==1.3.1 django-ajax-selects==1.3.5 django-cache-machine==0.6 django-debug-toolbar-django13==0.8.4 django-mysqlpool==0.1.post8 dnspython==1.12.0 docker-py==1.7.2 docutils==0.8.1 dpkt==1.6 elasticsearch==1.3.0 elasticsearch-dsl==0.0.4.dev0 feedparser==5.1.3 fixture==1.5 Flask==0.10.1 Flask-Admin==1.1.0 flower==0.8.2 flup==1.0.2 functools32==3.2.3.post2 furl==0.3.6 futures==2.1.6 gensim==0.13.1 geojson==1.0.9 glob2==0.5 google-auth==1.3.0 gunicorn==19.3.0 h2==2.4.1 hpack==2.3.0 html2text==2015.6.21 html5lib==1.0b1 httpagentparser==1.1.2 httplib2==0.9.2 hyper==0.7.0 hyperframe==3.2.0 ImageHash==0.3 impyla==0.9.1 iotop==0.6 isort==4.2.5 itsdangerous==0.24 jenkinsapi==0.3.3 Jinja2==2.7.3 jsl==0.2.4 jsonpath-rw==1.4.0 jsonschema==2.5.1 kazoo==2.0 Keras==1.0.6 kombu==3.0.26 lazy-object-proxy==1.2.2 librabbitmq==1.0.0 lipton==0.2.0 lockfile==0.8 luigi==2.3.0 lxml==2.3.4 Mako==1.0.0 Markdown==2.1.1 MarkupSafe==0.23 matplotlib==1.2.0 mccabe==0.5.2 mmh3==2.3 mock==1.0.1 mockredispy==2.9.0.9 mrjob==0.4.1 msgpack-python==0.2.4 mysql-replication==0.1.0 nose==1.2.1 numexpr==2.4rc2 numpy==1.11.1 objgraph==1.7.2 opentracing==1.3.0 orderedmultidict==0.7.1 pandas==0.14.1 peewee==2.6.4 PGen==0.2.1 phonenumbers==7.7.5 pika==0.9.8 Pillow==2.5.1 ply==3.9 premailer==2.5.1 protobuf==2.6.1 publicsuffix==1.0.5 pudb==2013.5.1 py-pypcap==1.1.2 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycparser==2.13 pycurl==7.19.3.1 pyinotify==0.9.4 pylibmc==1.2.3 pylint==1.6.4 pylint-django==0.7.2 pylint-flask==0.5 pylint-plugin-utils==0.2.6 pylint-redis==0.1 pymongo==2.2 PyMySQL==0.5 PyNLPIR==0.4.6 pyparsing==1.5.7 PyStemmer==1.3.0 python-cjson==1.0.5 python-consul==0.4.0 python-daemon==1.5.5 python-dateutil==2.4.0 python-memcached==1.48 pytz==2014.4 pyudorandom==1.0.0 PyYAML==3.11 raven==5.2.0 recordtype==1.1 redis==2.10.1 redis-shard==0.1.6 requests==2.10.0 rsa==3.4.2 schedule==0.1.11 schematics==1.0.post0 scikit-learn==0.14.1 scipy==0.18.0 simplejson==2.3.0 six==1.10.0 slackclient==0.15 smart-open==1.3.3 SQLAlchemy==0.7.6 sqlparse==0.1.5 tables==3.1.1 tailer==0.3 tdigest==0.5.2.2 Theano==0.8.2 thrift==0.8.0 tldextract==1.7.1 tornado==4.1 twilio==6.3.dev0 ua-parser==0.3.6 ujson==1.33 urllib3==1.10 urlparse2==1.1.1 urwid==1.1.2 user-agents==0.3.2 virtualenv==13.0.3 voluptuous==0.8.4 web.py==0.34 websocket-client==0.32.0 Werkzeug==0.10.4 wrapt==1.10.8 WTForms==2.0.2 XlsxWriter==0.7.2 yappi==0.94 zkpython==0.4.2 ZooKeeper==0.4
When trying to batch update a tdigest object with a set of specific data, it results in a run time error due to a stack overflow in NodeStack.push().
A small example of this can be see at https://gist.github.com/pgr/9b2c4e745a45142eb88b
tdigest version: a70e3bd
bintrees is v2.0.2
sys.version is ['2.7.8 |Anaconda 2.2.0 (32-bit)| (default, Jul 2 2014, 15:13:35) [MSC v.1500 32 bit (Intel)]']
0.5.1.0 sdist is missing on PyPi, as well python2 wheels.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.