Git Product home page Git Product logo

Comments (21)

kwgoodman avatar kwgoodman commented on July 30, 2024

I don't know what you mean by produces NaNs. Could you give a short example showing the input (x) and what you would like bottleneck to return?

from bottleneck.

erg avatar erg commented on July 30, 2024

Test.pyx has the x vector in the gist link with exact instructions to reproduce the results. The results should look like the factor version or the pandas version without the nans. Closed on accident...

from bottleneck.

kwgoodman avatar kwgoodman commented on July 30, 2024

Too much work to go through your example. Perhaps you can make a brief example as below.

Are these the NaNs you don't like? (I can't change that behavior)

In [8]: a = [1,2,10,12]
In [9]: bn.move_std(a, 4)
Out[9]: array([        nan,         nan,         nan,  4.81534007])

Do you want to normalize by N:

In [10]: bn.move_std(a, 4)[-1]
Out[10]: 4.815340071064556
In [11]: np.std(a)
Out[11]: 4.815340071064556

Or N-1?

In [14]: bn.move_std(a, 4, ddof=1)[-1]
Out[14]: 5.5602757725374259
In [15]: np.std(a, ddof=1)
Out[15]: 5.5602757725374259

from bottleneck.

erg avatar erg commented on July 30, 2024

To reproduce it, just copy/paste this entire thing into the shell:

git clone git://gist.github.com/3624548.git && cd 3624548 && ipython2
import numpy as np
import bottleneck as bn

x = np.load('testx.npy')
bn.move_std(x,3)

from bottleneck.

erg avatar erg commented on July 30, 2024

The preceding nans are not what I'm talking about--there are nans in the middle of the data. However, pandas has a kwarg num_periods=1 to start calculating even from the start. It would be awesome if bottleneck supported this feature, too.

import pandas


In [8]: pandas.rolling_mean(np.array([1,2,3,4,5]),3)
Out[8]: array([ nan,  nan,   2.,   3.,   4.])

In [9]: pandas.rolling_mean(np.array([1,2,3,4,5]),3, min_periods=1)
Out[9]: array([ 1. ,  1.5,  2. ,  3. ,  4. ])

from bottleneck.

erg avatar erg commented on July 30, 2024

And finally, I'm aware of the sample standard deviation vs population standard deviation. I suppose that accounts for the difference in the non-nan values. Maybe the docs should say you're using the sample std?

Edit: I guess you're calculating the biased std, N=0. There's no description of ddof in the parameters list.

from bottleneck.

kwgoodman avatar kwgoodman commented on July 30, 2024

Yes, that's a bug (good find!). The docstring should include the description of ddof, which we can copy from np.std:

ddof  int, optional
    Means Delta Degrees of Freedom.  The divisor used in calculations
    is ``N - ddof``, where ``N`` represents the number of elements.
    By default `ddof` is zero.

from bottleneck.

kwgoodman avatar kwgoodman commented on July 30, 2024

I added ddof to the docstrings of move_std() and move_nanstd():

1abdfa7

from bottleneck.

kwgoodman avatar kwgoodman commented on July 30, 2024

You might want to try bn.move_nanstd() instead of bn.move_std().

from bottleneck.

erg avatar erg commented on July 30, 2024

Same bug.

In [5]: bn.move_nanstd(x,3)
Out[5]: 
array([             nan,              nan,   1.55715950e-04,
         5.61672553e-04,   6.29558646e-04,   4.04292042e-04,
         4.04292042e-04,              nan,              nan,
         1.34956920e-04,   1.34956920e-04,   2.69682220e-04,
         2.69682220e-04,              nan,              nan,
         3.59747798e-04,   3.59747798e-04,   4.51234346e-05,
         6.10358168e-04,   6.31667620e-04,              nan,
         2.26028251e-04,   2.26028251e-04,   0.00000000e+00,
         2.71807335e-04,   5.10356704e-04,   2.96900396e-04,
         4.52446992e-05,   4.52229970e-05,   4.52229970e-05,
         3.61714576e-04,   3.61714576e-04,   9.06372853e-05,
         9.06372853e-05,              nan,   4.54059450e-04,
         4.54059450e-04,   2.72330746e-04,   2.72330746e-04,
                    nan,   1.81518876e-04,   1.81518876e-04,
                    nan,   1.81641276e-04,   1.81641276e-04,
         4.09126392e-04,   4.09126392e-04,   3.36062087e-11,
         2.27731653e-04,   2.27731653e-04,   4.55947887e-04,
         4.05285108e-04,   2.77122055e-04,   1.82255759e-04,
...

from bottleneck.

kwgoodman avatar kwgoodman commented on July 30, 2024

If there are three nans in a row in x then what should the std be when the window is 3?

from bottleneck.

erg avatar erg commented on July 30, 2024

There are no nans in my x at all. But you need the full precision, hence loading it from the file as described above.

In [7]: print x
[  0.00000000e+00   1.90748689e-04  -1.90675932e-04   1.14481963e-03
   1.14481963e-03   2.87186699e-04   2.87186699e-04   2.87186699e-04
   2.87186699e-04   8.99839353e-07   8.99839353e-07   5.72982219e-04
   5.72982219e-04   5.72982219e-04   5.72982219e-04  -1.90158103e-04
  -1.90158103e-04  -9.44368435e-05  -1.43440622e-03  -1.43440622e-03
  -1.43440622e-03  -9.54927888e-04  -9.54927888e-04  -9.54927888e-04
  -3.78337459e-04   2.93963731e-04   1.97985231e-04   1.97985231e-04
   1.02052767e-04   1.02052767e-04  -6.65259721e-04  -6.65259721e-04
  -4.72989004e-04  -4.72989004e-04  -4.72989004e-04   4.90216544e-04
   4.90216544e-04   1.06791730e-03   1.06791730e-03   1.06791730e-03
   1.45297698e-03   1.45297698e-03   1.45297698e-03   1.83829631e-03
   1.83829631e-03   2.70618445e-03   2.70618445e-03   2.70618445e-03
   2.22309266e-03   2.22309266e-03   3.19030419e-03   2.90042111e-03
   2.51379826e-03   2.51379826e-03   3.86710904e-03   4.25324670e-03
   4.73573247e-03   4.73573247e-03   4.73573247e-03   4.73573247e-03
   4.73573247e-03   4.35022746e-03   4.35022746e-03   4.35022746e-03
   4.35022746e-03   3.96479674e-03   3.96479674e-03   3.96479674e-03
   3.96479674e-03   3.96479674e-03   3.96479674e-03   3.96479674e-03

from bottleneck.

erg avatar erg commented on July 30, 2024

If there were nans in the x input, I suppose I don't care what move_std returns at all. :)

from bottleneck.

erg avatar erg commented on July 30, 2024

It might be that you're taking the sqrt of a negative number.

From move_std.pyx:


            y[i0] = sqrt((a2sum - asum * asum / count) / (count - ddof))

from bottleneck.

erg avatar erg commented on July 30, 2024

Yes, it's taking the sqrt of negative numbers!

In [2]: import numpy as np
import bottleneck as bn

x = np.load('testx.npy')
bn.move_std(x,3)
0.00000024742859987833254792788557907545765601753374, 0.00086156009635718323012854025222395648597739636898, 3, 0
-0.00000000000000000000012352563814125214061742603683
0.00000024742859987833254792788557907545765601753374, 0.00086156009635718323012854025222395648597739636898, 3, 0
-0.00000000000000000000012352563814125214061742603683
0.00000098492587061058721337463046918703213350454462, 0.00171894665764582782810365735315372148761525750160, 3, 0
-0.00000000000000000000014117215787571671534142474463
0.00000098492587061058721337463046918703213350454462, 0.00171894665764582782810365735315372148761525750160, 3, 0
-0.00000000000000000000014117215787571671534142474463
0.00000617256357447203218380502726114755773778597359, -0.00430321864694510961002471560732374200597405433655, 3, 0
-0.00000000000000000000028234431575143343068284948926
0.00000067115579297346920246844980892375609471400821, -0.00141896701121640295140124976569495629519224166870, 3, 0
-0.00000000000000000000042351647362715016953416125034
0.00000342134204903411354143196246302416341222851770, 0.00320375188600839633235040082581690512597560882568, 3, 0
-0.00000000000000000000183523805238431727592863466376
0.00000633342630963094281159458942553897031757514924, 0.00435893093876157239341395666087919380515813827515, 3, 0
-0.00000000000000000000141172157875716729447356954499
Out[2]: 
array([             nan,              nan,   1.55715950e-04,
         5.61672553e-04,   6.29558646e-04,   4.04292042e-04,
         4.04292042e-04,              nan,              nan,
         1.34956920e-04,   1.34956920e-04,   2.69682220e-04,
         2.69682220e-04,              nan,              nan,
         3.59747798e-04,   3.59747798e-04,   4.51234346e-05,
         6.10358168e-04,   6.31667620e-04,              nan,
         2.26028251e-04,   2.26028251e-04,   0.00000000e+00,
         2.71807335e-04,   5.10356704e-04,   2.96900396e-04,
         4.52446992e-05,   4.52229970e-05,   4.52229970e-05,
         3.61714576e-04,   3.61714576e-04,   9.06372853e-05,
         9.06372853e-05,              nan,   4.54059450e-04,
         4.54059450e-04,   2.72330746e-04,   2.72330746e-04,
                    nan,   1.81518876e-04,   1.81518876e-04,
                    nan,   1.81641276e-04,   1.81641276e-04,
         4.09126392e-04,   4.09126392e-04,   3.36062087e-11,

Here's the print statements I added:

        if count == window:
            foo = (a2sum - asum * asum / count) / (count - ddof)
            if foo < 0.:
                print "%.50f, %.50f, %d, %d" % (a2sum, asum, count, ddof)
                print "%.50f" % foo
            y[i0] = sqrt(foo)
        else:
            y[i0] = NAN

from bottleneck.

erg avatar erg commented on July 30, 2024

Note that these negative numbers are also less than negative zero, or else sqrt could handle them.

from bottleneck.

kwgoodman avatar kwgoodman commented on July 30, 2024

Yep, we found the same thing. We plan to set the output to 0 when a2sum - asum * asum / count is negative.

from bottleneck.

erg avatar erg commented on July 30, 2024

Awesome. What about adding a move_var and min_periods=1 functionality?

from bottleneck.

erg avatar erg commented on July 30, 2024

Pandas also has rolling_kurtosis and rolling_skew.

from bottleneck.

kwgoodman avatar kwgoodman commented on July 30, 2024

The negative sqrt issue is fixed:

6fb6f73

from bottleneck.

kwgoodman avatar kwgoodman commented on July 30, 2024

docstring is fixed and negative sqrt bug is fixed, so I'm closing this issue. You can open a new issue with the min_periods feature request if you like.

from bottleneck.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.