Comments (21)
I don't know what you mean by produces NaNs. Could you give a short example showing the input (x) and what you would like bottleneck to return?
from bottleneck.
Test.pyx has the x vector in the gist link with exact instructions to reproduce the results. The results should look like the factor version or the pandas version without the nans. Closed on accident...
from bottleneck.
Too much work to go through your example. Perhaps you can make a brief example as below.
Are these the NaNs you don't like? (I can't change that behavior)
In [8]: a = [1,2,10,12]
In [9]: bn.move_std(a, 4)
Out[9]: array([ nan, nan, nan, 4.81534007])
Do you want to normalize by N:
In [10]: bn.move_std(a, 4)[-1]
Out[10]: 4.815340071064556
In [11]: np.std(a)
Out[11]: 4.815340071064556
Or N-1?
In [14]: bn.move_std(a, 4, ddof=1)[-1]
Out[14]: 5.5602757725374259
In [15]: np.std(a, ddof=1)
Out[15]: 5.5602757725374259
from bottleneck.
To reproduce it, just copy/paste this entire thing into the shell:
git clone git://gist.github.com/3624548.git && cd 3624548 && ipython2
import numpy as np
import bottleneck as bn
x = np.load('testx.npy')
bn.move_std(x,3)
from bottleneck.
The preceding nans are not what I'm talking about--there are nans in the middle of the data. However, pandas has a kwarg num_periods=1
to start calculating even from the start. It would be awesome if bottleneck supported this feature, too.
import pandas
In [8]: pandas.rolling_mean(np.array([1,2,3,4,5]),3)
Out[8]: array([ nan, nan, 2., 3., 4.])
In [9]: pandas.rolling_mean(np.array([1,2,3,4,5]),3, min_periods=1)
Out[9]: array([ 1. , 1.5, 2. , 3. , 4. ])
from bottleneck.
And finally, I'm aware of the sample standard deviation vs population standard deviation. I suppose that accounts for the difference in the non-nan values. Maybe the docs should say you're using the sample std?
Edit: I guess you're calculating the biased std, N=0. There's no description of ddof in the parameters list.
from bottleneck.
Yes, that's a bug (good find!). The docstring should include the description of ddof, which we can copy from np.std:
ddof int, optional
Means Delta Degrees of Freedom. The divisor used in calculations
is ``N - ddof``, where ``N`` represents the number of elements.
By default `ddof` is zero.
from bottleneck.
I added ddof
to the docstrings of move_std() and move_nanstd():
from bottleneck.
You might want to try bn.move_nanstd()
instead of bn.move_std()
.
from bottleneck.
Same bug.
In [5]: bn.move_nanstd(x,3)
Out[5]:
array([ nan, nan, 1.55715950e-04,
5.61672553e-04, 6.29558646e-04, 4.04292042e-04,
4.04292042e-04, nan, nan,
1.34956920e-04, 1.34956920e-04, 2.69682220e-04,
2.69682220e-04, nan, nan,
3.59747798e-04, 3.59747798e-04, 4.51234346e-05,
6.10358168e-04, 6.31667620e-04, nan,
2.26028251e-04, 2.26028251e-04, 0.00000000e+00,
2.71807335e-04, 5.10356704e-04, 2.96900396e-04,
4.52446992e-05, 4.52229970e-05, 4.52229970e-05,
3.61714576e-04, 3.61714576e-04, 9.06372853e-05,
9.06372853e-05, nan, 4.54059450e-04,
4.54059450e-04, 2.72330746e-04, 2.72330746e-04,
nan, 1.81518876e-04, 1.81518876e-04,
nan, 1.81641276e-04, 1.81641276e-04,
4.09126392e-04, 4.09126392e-04, 3.36062087e-11,
2.27731653e-04, 2.27731653e-04, 4.55947887e-04,
4.05285108e-04, 2.77122055e-04, 1.82255759e-04,
...
from bottleneck.
If there are three nans in a row in x then what should the std be when the window is 3?
from bottleneck.
There are no nans in my x at all. But you need the full precision, hence loading it from the file as described above.
In [7]: print x
[ 0.00000000e+00 1.90748689e-04 -1.90675932e-04 1.14481963e-03
1.14481963e-03 2.87186699e-04 2.87186699e-04 2.87186699e-04
2.87186699e-04 8.99839353e-07 8.99839353e-07 5.72982219e-04
5.72982219e-04 5.72982219e-04 5.72982219e-04 -1.90158103e-04
-1.90158103e-04 -9.44368435e-05 -1.43440622e-03 -1.43440622e-03
-1.43440622e-03 -9.54927888e-04 -9.54927888e-04 -9.54927888e-04
-3.78337459e-04 2.93963731e-04 1.97985231e-04 1.97985231e-04
1.02052767e-04 1.02052767e-04 -6.65259721e-04 -6.65259721e-04
-4.72989004e-04 -4.72989004e-04 -4.72989004e-04 4.90216544e-04
4.90216544e-04 1.06791730e-03 1.06791730e-03 1.06791730e-03
1.45297698e-03 1.45297698e-03 1.45297698e-03 1.83829631e-03
1.83829631e-03 2.70618445e-03 2.70618445e-03 2.70618445e-03
2.22309266e-03 2.22309266e-03 3.19030419e-03 2.90042111e-03
2.51379826e-03 2.51379826e-03 3.86710904e-03 4.25324670e-03
4.73573247e-03 4.73573247e-03 4.73573247e-03 4.73573247e-03
4.73573247e-03 4.35022746e-03 4.35022746e-03 4.35022746e-03
4.35022746e-03 3.96479674e-03 3.96479674e-03 3.96479674e-03
3.96479674e-03 3.96479674e-03 3.96479674e-03 3.96479674e-03
from bottleneck.
If there were nans in the x input, I suppose I don't care what move_std
returns at all. :)
from bottleneck.
It might be that you're taking the sqrt
of a negative number.
From move_std.pyx:
y[i0] = sqrt((a2sum - asum * asum / count) / (count - ddof))
from bottleneck.
Yes, it's taking the sqrt
of negative numbers!
In [2]: import numpy as np
import bottleneck as bn
x = np.load('testx.npy')
bn.move_std(x,3)
0.00000024742859987833254792788557907545765601753374, 0.00086156009635718323012854025222395648597739636898, 3, 0
-0.00000000000000000000012352563814125214061742603683
0.00000024742859987833254792788557907545765601753374, 0.00086156009635718323012854025222395648597739636898, 3, 0
-0.00000000000000000000012352563814125214061742603683
0.00000098492587061058721337463046918703213350454462, 0.00171894665764582782810365735315372148761525750160, 3, 0
-0.00000000000000000000014117215787571671534142474463
0.00000098492587061058721337463046918703213350454462, 0.00171894665764582782810365735315372148761525750160, 3, 0
-0.00000000000000000000014117215787571671534142474463
0.00000617256357447203218380502726114755773778597359, -0.00430321864694510961002471560732374200597405433655, 3, 0
-0.00000000000000000000028234431575143343068284948926
0.00000067115579297346920246844980892375609471400821, -0.00141896701121640295140124976569495629519224166870, 3, 0
-0.00000000000000000000042351647362715016953416125034
0.00000342134204903411354143196246302416341222851770, 0.00320375188600839633235040082581690512597560882568, 3, 0
-0.00000000000000000000183523805238431727592863466376
0.00000633342630963094281159458942553897031757514924, 0.00435893093876157239341395666087919380515813827515, 3, 0
-0.00000000000000000000141172157875716729447356954499
Out[2]:
array([ nan, nan, 1.55715950e-04,
5.61672553e-04, 6.29558646e-04, 4.04292042e-04,
4.04292042e-04, nan, nan,
1.34956920e-04, 1.34956920e-04, 2.69682220e-04,
2.69682220e-04, nan, nan,
3.59747798e-04, 3.59747798e-04, 4.51234346e-05,
6.10358168e-04, 6.31667620e-04, nan,
2.26028251e-04, 2.26028251e-04, 0.00000000e+00,
2.71807335e-04, 5.10356704e-04, 2.96900396e-04,
4.52446992e-05, 4.52229970e-05, 4.52229970e-05,
3.61714576e-04, 3.61714576e-04, 9.06372853e-05,
9.06372853e-05, nan, 4.54059450e-04,
4.54059450e-04, 2.72330746e-04, 2.72330746e-04,
nan, 1.81518876e-04, 1.81518876e-04,
nan, 1.81641276e-04, 1.81641276e-04,
4.09126392e-04, 4.09126392e-04, 3.36062087e-11,
Here's the print statements I added:
if count == window:
foo = (a2sum - asum * asum / count) / (count - ddof)
if foo < 0.:
print "%.50f, %.50f, %d, %d" % (a2sum, asum, count, ddof)
print "%.50f" % foo
y[i0] = sqrt(foo)
else:
y[i0] = NAN
from bottleneck.
Note that these negative numbers are also less than negative zero, or else sqrt
could handle them.
from bottleneck.
Yep, we found the same thing. We plan to set the output to 0 when a2sum - asum * asum / count
is negative.
from bottleneck.
Awesome. What about adding a move_var
and min_periods=1
functionality?
from bottleneck.
Pandas also has rolling_kurtosis
and rolling_skew
.
from bottleneck.
The negative sqrt issue is fixed:
from bottleneck.
docstring is fixed and negative sqrt bug is fixed, so I'm closing this issue. You can open a new issue with the min_periods
feature request if you like.
from bottleneck.
Related Issues (20)
- [BUG] Segmentation fault when working on a transposed numpy array with first dimension 1. HOT 2
- Build failing for arm64 HOT 5
- Virutalenv support for building bottleneck HOT 2
- np.bool is deprecated.
- [BUG] Failed to install bottleneck via `pip install .` HOT 2
- [Enhancement] Weighted Median function
- Python crashed in fuzzing test of 7 APIs HOT 1
- Wrong rolling window calculations
- [Enhancement] Add move_skew and move_kurt HOT 2
- I think no Bug - just a unexperienced elderly user... having Vers. 1.2.1 HOT 3
- Continuous fuzzing by way of OSS-Fuzz
- [BUG] NumPy 1.24: test regression `ValueError: cannot convert float NaN to integer` HOT 6
- 👋 Additional maintainers wanted! HOT 2
- Can you add 'nanquantile' to this project? HOT 2
- Ignore length check HOT 1
- use move_mean and set window&min_count =1, diff result
- [ENHANCEMENT] move_rank is twice as slow as pandas.rolling.rank
- [BUG]would it be possible to have Python-3.12 wheels ? HOT 2
- [BUG] bottleneck gives erroneous standard deviation in Pandas with float32 array. HOT 4
- [BUG]Can't install on MacOSX 14.1.1 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bottleneck.