https://gist.github.com/3624548 <p d

You might want to try bn.move_nanstd() instead of <co

Same bug. <div class="snippet-clipboard-content notranslate position-relative over

bottleneck.move_std() produces nans, doesn't match pandas.rolling_std() about bottleneck HOT 21 CLOSED

pydata commented on July 30, 2024

bottleneck.move_std() produces nans, doesn't match pandas.rolling_std()

from bottleneck.

Comments (21)

kwgoodman commented on July 30, 2024

I don't know what you mean by produces NaNs. Could you give a short example showing the input (x) and what you would like bottleneck to return?

from bottleneck.

erg commented on July 30, 2024

Test.pyx has the x vector in the gist link with exact instructions to reproduce the results. The results should look like the factor version or the pandas version without the nans. Closed on accident...

from bottleneck.

kwgoodman commented on July 30, 2024

Too much work to go through your example. Perhaps you can make a brief example as below.

Are these the NaNs you don't like? (I can't change that behavior)

In [8]: a = [1,2,10,12]
In [9]: bn.move_std(a, 4)
Out[9]: array([        nan,         nan,         nan,  4.81534007])

Do you want to normalize by N:

In [10]: bn.move_std(a, 4)[-1]
Out[10]: 4.815340071064556
In [11]: np.std(a)
Out[11]: 4.815340071064556

Or N-1?

In [14]: bn.move_std(a, 4, ddof=1)[-1]
Out[14]: 5.5602757725374259
In [15]: np.std(a, ddof=1)
Out[15]: 5.5602757725374259

from bottleneck.

erg commented on July 30, 2024

To reproduce it, just copy/paste this entire thing into the shell:

git clone git://gist.github.com/3624548.git && cd 3624548 && ipython2
import numpy as np
import bottleneck as bn

x = np.load('testx.npy')
bn.move_std(x,3)

from bottleneck.

erg commented on July 30, 2024

The preceding nans are not what I'm talking about--there are nans in the middle of the data. However, pandas has a kwarg num_periods=1 to start calculating even from the start. It would be awesome if bottleneck supported this feature, too.

import pandas


In [8]: pandas.rolling_mean(np.array([1,2,3,4,5]),3)
Out[8]: array([ nan,  nan,   2.,   3.,   4.])

In [9]: pandas.rolling_mean(np.array([1,2,3,4,5]),3, min_periods=1)
Out[9]: array([ 1. ,  1.5,  2. ,  3. ,  4. ])

from bottleneck.

erg commented on July 30, 2024

And finally, I'm aware of the sample standard deviation vs population standard deviation. I suppose that accounts for the difference in the non-nan values. Maybe the docs should say you're using the sample std?

Edit: I guess you're calculating the biased std, N=0. There's no description of ddof in the parameters list.

from bottleneck.

kwgoodman commented on July 30, 2024

Yes, that's a bug (good find!). The docstring should include the description of ddof, which we can copy from np.std:

ddof  int, optional
    Means Delta Degrees of Freedom.  The divisor used in calculations
    is ``N - ddof``, where ``N`` represents the number of elements.
    By default `ddof` is zero.

from bottleneck.

kwgoodman commented on July 30, 2024

I added ddof to the docstrings of move_std() and move_nanstd():

1abdfa7

from bottleneck.

kwgoodman commented on July 30, 2024

You might want to try bn.move_nanstd() instead of bn.move_std().

from bottleneck.

erg commented on July 30, 2024

Same bug.

In [5]: bn.move_nanstd(x,3)
Out[5]: 
array([             nan,              nan,   1.55715950e-04,
         5.61672553e-04,   6.29558646e-04,   4.04292042e-04,
         4.04292042e-04,              nan,              nan,
         1.34956920e-04,   1.34956920e-04,   2.69682220e-04,
         2.69682220e-04,              nan,              nan,
         3.59747798e-04,   3.59747798e-04,   4.51234346e-05,
         6.10358168e-04,   6.31667620e-04,              nan,
         2.26028251e-04,   2.26028251e-04,   0.00000000e+00,
         2.71807335e-04,   5.10356704e-04,   2.96900396e-04,
         4.52446992e-05,   4.52229970e-05,   4.52229970e-05,
         3.61714576e-04,   3.61714576e-04,   9.06372853e-05,
         9.06372853e-05,              nan,   4.54059450e-04,
         4.54059450e-04,   2.72330746e-04,   2.72330746e-04,
                    nan,   1.81518876e-04,   1.81518876e-04,
                    nan,   1.81641276e-04,   1.81641276e-04,
         4.09126392e-04,   4.09126392e-04,   3.36062087e-11,
         2.27731653e-04,   2.27731653e-04,   4.55947887e-04,
         4.05285108e-04,   2.77122055e-04,   1.82255759e-04,
...

from bottleneck.

kwgoodman commented on July 30, 2024

If there are three nans in a row in x then what should the std be when the window is 3?

from bottleneck.

erg commented on July 30, 2024

There are no nans in my x at all. But you need the full precision, hence loading it from the file as described above.

In [7]: print x
[  0.00000000e+00   1.90748689e-04  -1.90675932e-04   1.14481963e-03
   1.14481963e-03   2.87186699e-04   2.87186699e-04   2.87186699e-04
   2.87186699e-04   8.99839353e-07   8.99839353e-07   5.72982219e-04
   5.72982219e-04   5.72982219e-04   5.72982219e-04  -1.90158103e-04
  -1.90158103e-04  -9.44368435e-05  -1.43440622e-03  -1.43440622e-03
  -1.43440622e-03  -9.54927888e-04  -9.54927888e-04  -9.54927888e-04
  -3.78337459e-04   2.93963731e-04   1.97985231e-04   1.97985231e-04
   1.02052767e-04   1.02052767e-04  -6.65259721e-04  -6.65259721e-04
  -4.72989004e-04  -4.72989004e-04  -4.72989004e-04   4.90216544e-04
   4.90216544e-04   1.06791730e-03   1.06791730e-03   1.06791730e-03
   1.45297698e-03   1.45297698e-03   1.45297698e-03   1.83829631e-03
   1.83829631e-03   2.70618445e-03   2.70618445e-03   2.70618445e-03
   2.22309266e-03   2.22309266e-03   3.19030419e-03   2.90042111e-03
   2.51379826e-03   2.51379826e-03   3.86710904e-03   4.25324670e-03
   4.73573247e-03   4.73573247e-03   4.73573247e-03   4.73573247e-03
   4.73573247e-03   4.35022746e-03   4.35022746e-03   4.35022746e-03
   4.35022746e-03   3.96479674e-03   3.96479674e-03   3.96479674e-03
   3.96479674e-03   3.96479674e-03   3.96479674e-03   3.96479674e-03

from bottleneck.

erg commented on July 30, 2024

If there were nans in the x input, I suppose I don't care what move_std returns at all. :)

from bottleneck.

erg commented on July 30, 2024

It might be that you're taking the sqrt of a negative number.

From move_std.pyx:


            y[i0] = sqrt((a2sum - asum * asum / count) / (count - ddof))

from bottleneck.

erg commented on July 30, 2024

Yes, it's taking the sqrt of negative numbers!

In [2]: import numpy as np
import bottleneck as bn

x = np.load('testx.npy')
bn.move_std(x,3)
0.00000024742859987833254792788557907545765601753374, 0.00086156009635718323012854025222395648597739636898, 3, 0
-0.00000000000000000000012352563814125214061742603683
0.00000024742859987833254792788557907545765601753374, 0.00086156009635718323012854025222395648597739636898, 3, 0
-0.00000000000000000000012352563814125214061742603683
0.00000098492587061058721337463046918703213350454462, 0.00171894665764582782810365735315372148761525750160, 3, 0
-0.00000000000000000000014117215787571671534142474463
0.00000098492587061058721337463046918703213350454462, 0.00171894665764582782810365735315372148761525750160, 3, 0
-0.00000000000000000000014117215787571671534142474463
0.00000617256357447203218380502726114755773778597359, -0.00430321864694510961002471560732374200597405433655, 3, 0
-0.00000000000000000000028234431575143343068284948926
0.00000067115579297346920246844980892375609471400821, -0.00141896701121640295140124976569495629519224166870, 3, 0
-0.00000000000000000000042351647362715016953416125034
0.00000342134204903411354143196246302416341222851770, 0.00320375188600839633235040082581690512597560882568, 3, 0
-0.00000000000000000000183523805238431727592863466376
0.00000633342630963094281159458942553897031757514924, 0.00435893093876157239341395666087919380515813827515, 3, 0
-0.00000000000000000000141172157875716729447356954499
Out[2]: 
array([             nan,              nan,   1.55715950e-04,
         5.61672553e-04,   6.29558646e-04,   4.04292042e-04,
         4.04292042e-04,              nan,              nan,
         1.34956920e-04,   1.34956920e-04,   2.69682220e-04,
         2.69682220e-04,              nan,              nan,
         3.59747798e-04,   3.59747798e-04,   4.51234346e-05,
         6.10358168e-04,   6.31667620e-04,              nan,
         2.26028251e-04,   2.26028251e-04,   0.00000000e+00,
         2.71807335e-04,   5.10356704e-04,   2.96900396e-04,
         4.52446992e-05,   4.52229970e-05,   4.52229970e-05,
         3.61714576e-04,   3.61714576e-04,   9.06372853e-05,
         9.06372853e-05,              nan,   4.54059450e-04,
         4.54059450e-04,   2.72330746e-04,   2.72330746e-04,
                    nan,   1.81518876e-04,   1.81518876e-04,
                    nan,   1.81641276e-04,   1.81641276e-04,
         4.09126392e-04,   4.09126392e-04,   3.36062087e-11,

Here's the print statements I added:

        if count == window:
            foo = (a2sum - asum * asum / count) / (count - ddof)
            if foo < 0.:
                print "%.50f, %.50f, %d, %d" % (a2sum, asum, count, ddof)
                print "%.50f" % foo
            y[i0] = sqrt(foo)
        else:
            y[i0] = NAN

from bottleneck.

erg commented on July 30, 2024

Note that these negative numbers are also less than negative zero, or else sqrt could handle them.

from bottleneck.

kwgoodman commented on July 30, 2024

Yep, we found the same thing. We plan to set the output to 0 when a2sum - asum * asum / count is negative.

from bottleneck.

erg commented on July 30, 2024

Awesome. What about adding a move_var and min_periods=1 functionality?

from bottleneck.

erg commented on July 30, 2024

Pandas also has rolling_kurtosis and rolling_skew.

from bottleneck.

kwgoodman commented on July 30, 2024

The negative sqrt issue is fixed:

6fb6f73

from bottleneck.

kwgoodman commented on July 30, 2024

docstring is fixed and negative sqrt bug is fixed, so I'm closing this issue. You can open a new issue with the min_periods feature request if you like.

from bottleneck.

bottleneck.move_std() produces nans, doesn't match pandas.rolling_std() about bottleneck HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent