pydata / bottleneck Goto Github PK
View Code? Open in Web Editor NEWFast NumPy array functions written in C
License: BSD 2-Clause "Simplified" License
Fast NumPy array functions written in C
License: BSD 2-Clause "Simplified" License
Add optional overwrite_input (default False) input parameter to median() and nanmedian(). Should behave the same as in np.median().
The low-level functions nanstd_3d_int32_axis1 and nanstd_3d_int64_axis1, called by bottleneck.nanstd() for 3d input, wrote beyond the memory owned by the output array if arr.shape[1] == 0 and arr.shape[0] > arr.shape[2], where arr is the input array.
Thanks to Christoph Gohlke for finding an example to demonstrate the bug.
erg@ommegang ~/python/bottleneck $ [master*] make all
rm -rf bottleneck/src/*~ bottleneck/src/*.so bottleneck/src/*.c bottleneck/src/*.o bottleneck/src/*.html bottleneck/src/build bottleneck/src/../*.so
rm -rf bottleneck/src/func/32bit/*.c bottleneck/src/func/64bit/*.c
rm -rf bottleneck/src/move/32bit/*.c bottleneck/src/move/64bit/*.c
rm -rf bottleneck/src/func/32bit/*.pyx bottleneck/src/func/64bit/*.pyx
rm -rf bottleneck/src/move/32bit/*.pyx bottleneck/src/move/64bit/*.pyx
python -c "from bottleneck.src.makepyx import makepyx; makepyx()"
cython bottleneck/src/func/32bit/func.pyx
cython bottleneck/src/func/64bit/func.pyx
cython bottleneck/src/move/32bit/move.pyx
cython bottleneck/src/move/64bit/move.pyx
rm -rf bottleneck/src/../func.so
python bottleneck/src/func/setup.py build_ext --inplace
running build_ext
skipping 'bottleneck/src/func/64bit/func.c' Cython extension (up-to-date)
building 'func' extension
gcc -pthread -fno-strict-aliasing -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -fPIC -I/usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/func/64bit/func.c -o build/temp.linux-x86_64-2.7/bottleneck/src/func/64bit/func.o
gcc -pthread -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu build/temp.linux-x86_64-2.7/bottleneck/src/func/64bit/func.o -L/usr/lib -lpython2.7 -o /home/erg/python/bottleneck/func.so
rm -rf bottleneck/src/../move.so
python bottleneck/src/move/setup.py build_ext --inplace
running build_ext
skipping 'bottleneck/src/move/64bit/move.c' Cython extension (up-to-date)
building 'move' extension
gcc -pthread -fno-strict-aliasing -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -fPIC -I/usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/move/64bit/move.c -o build/temp.linux-x86_64-2.7/bottleneck/src/move/64bit/move.o
gcc -pthread -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu build/temp.linux-x86_64-2.7/bottleneck/src/move/64bit/move.o -L/usr/lib -lpython2.7 -o /home/erg/python/bottleneck/move.so
python -c "import bottleneck;bottleneck.test(extra_argv=['--processes=4'])"
Running unit tests for bottleneck
NumPy version 1.6.2
NumPy is installed in /usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy
Python version 2.7.3 (default, Apr 24 2012, 00:00:54) [GCC 4.7.0 20120414 (prerelease)]
nose version 1.1.2
.......................................nose.plugins.multiprocess: ERROR: Worker 2 error running test or returning results
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 688, in __runner
test(result)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
return self.run(*arg, **kw)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
test(orig)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
return self.run(*arg, **kw)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
test(orig)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
return self.run(*arg, **kw)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 767, in run
self.tasks, test)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 511, in addtask
testQueue.put((test_addr,arg), block=False)
File "<string>", line 2, in put
File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
conn.send((self._id, methodname, args, kwds))
PicklingError: Can't pickle <built-in function nansum>: import of module func failed
....E...................................................
======================================================================
ERROR: Failure: PicklingError (Can't pickle <built-in function nansum>: import of module func failed)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 688, in __runner
test(result)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
return self.run(*arg, **kw)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
test(orig)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
return self.run(*arg, **kw)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
test(orig)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
return self.run(*arg, **kw)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 767, in run
self.tasks, test)
File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 511, in addtask
testQueue.put((test_addr,arg), block=False)
File "<string>", line 2, in put
File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
conn.send((self._id, methodname, args, kwds))
PicklingError: Can't pickle <built-in function nansum>: import of module func failed
----------------------------------------------------------------------
Ran 95 tests in 9.141s
FAILED (errors=1)
erg@ommegang ~/python/bottleneck $ [master*] uname -a
Linux ommegang 3.5.3-1-ARCH #1 SMP PREEMPT Sun Aug 26 09:14:51 CEST 2012 x86_64 GNU/Linux
I just installed bottleneck with easy_install on Windows with MingW 4.? (32bit python on 64 bit computer)
Works without problems, test suite passes but there are a few compiler warnings
Searching for bottleneck
Reading http://pypi.python.org/simple/bottleneck/
Reading http://berkeleyanalytics.com/bottleneck
Best match: Bottleneck 0.5.0
Downloading http://pypi.python.org/packages/source/B/Bottleneck/Bottleneck-0.5.0.tar.gz#md5=65f2b3be0ca74b859392d554a48a0906
Processing Bottleneck-0.5.0.tar.gz
Running Bottleneck-0.5.0\setup.py -q bdist_egg --dist-dir c:\users\josef\appdata\local\temp\easy_install-hagefy\Bottleneck-0.5.0\egg-dist-tmp-hf63me
package init file 'bottleneck/tests\__init__.py' not found (or not a regular file)
package init file 'bottleneck/src/func\__init__.py' not found (or not a regular file)
package init file 'bottleneck/src/move\__init__.py' not found (or not a regular file)
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseArgtupleInvalid':
bottleneck/src/func/32bit/func.c:247114: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247114: warning: format '%s' expects type 'char *', but argument 5 has type 'Py_ssize_t'
bottleneck/src/func/32bit/func.c:247114: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247114: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseNeedMoreValuesError':
bottleneck/src/func/32bit/func.c:247124: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247124: warning: format '%s' expects type 'char *', but argument 3 has type 'Py_ssize_t'
bottleneck/src/func/32bit/func.c:247124: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseTooManyValuesError':
bottleneck/src/func/32bit/func.c:247132: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247132: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: At top level:
C:\Python26\lib\site-packages\numpy\core\include/numpy/__ufunc_api.h:196: warning: '_import_umath' defined but not used
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseArgtupleInvalid':
bottleneck/src/move/32bit/move.c:203854: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203854: warning: format '%s' expects type 'char *', but argument 5 has type 'Py_ssize_t'
bottleneck/src/move/32bit/move.c:203854: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203854: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseNeedMoreValuesError':
bottleneck/src/move/32bit/move.c:203956: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203956: warning: format '%s' expects type 'char *', but argument 3 has type 'Py_ssize_t'
bottleneck/src/move/32bit/move.c:203956: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseTooManyValuesError':
bottleneck/src/move/32bit/move.c:203964: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203964: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: At top level:
C:\Python26\lib\site-packages\numpy\core\include/numpy/__ufunc_api.h:196: warning: '_import_umath' defined but not used
zip_safe flag not set; analyzing archive contents...
Adding bottleneck 0.5.0 to easy-install.pth file
Installed c:\python26\lib\site-packages\bottleneck-0.5.0-py2.6-win32.egg
Processing dependencies for bottleneck
Finished processing dependencies for bottleneck
E:\>python
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import bottleneck as bo
>>> bo.test()
<...>
NumPy version 1.5.1
NumPy is installed in C:\Python26\lib\site-packages\numpy
Python version 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)]
nose version 1.0.0
................................................................................
----------------------------------------------------------------------
Ran 80 tests in 36.863s
OK
<nose.result.TextTestResult run=80 errors=0 failures=0>
Hello,
I'm not pretending that somebody will track down the cause but I thought I'd share the situation.
I'm trying to install bottleneck on a VirtualBox 64bits Ubuntu 12.10 guest.
python setup.py install
ends with the following lines:
bottleneck/src/func/64bit/func.c: At top level:
/home/scrosta/virtualenv/local/lib/python2.7/site-packages/numpy/core/include/numpy/__ufunc_api.h:226:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 4
where of course the gcc: internal compiler error: Killed (program cc1)
is the ugly bit.
Also when trying a make all
from the repository gives exactly the same error:
/home/scrosta/virtualenv/local/lib/python2.7/site-packages/numpy/core/include/numpy/__ufunc_api.h:226:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 4
make: *** [funcs] Error 1
After installing on OS X 10.8 Mountain Lion, I get this error:
>>> from bottleneck.move import *
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: dlopen(/Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so, 2): Symbol not found: _get_largest_child
Referenced from: /Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so
Expected in: flat namespace
in /Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so
Any ideas?
>>> import numpy
>>> numpy.__version__
'1.6.2'
Add a moving window exponentially weighted average.
bn.anynan(arr, axis) and bn.allnan(arr, axis) might be useful functions to have in bottleneck.
Here's some proof of concept code for anynan:
@cython.boundscheck(False)
@cython.wraparound(False)
def anynan_axis1(np.ndarray[np.float64_t, ndim=2] a):
cdef:
int n0, n1, i, j, f
np.npy_intp *dim
np.float64_t aij
np.ndarray[np.uint8_t, ndim=1, cast=True] y
dim = PyArray_DIMS(a)
n0 = dim[0]
n1 = dim[1]
cdef np.npy_intp *dims = [n0]
y = PyArray_EMPTY(1, dims, NPY_BOOL, 0)
for i in range(n0):
f = 1
for j in range(n1):
aij = a[i, j]
if aij != aij:
y[i] = 1
f = 0
break
if f == 1:
y[i] = 0
return y
We'd have to teach the templating function loop_cdef() to create empty bool return arrays.
$ python setup.py sdist
does not add the makefile to the source distribution.
Every time I modify a cython pyx file, the corresponding change to the auto-generated C file is huge. The repo is growing in size quickly.
So, should I only update the C files at each release? In between releases users can use the makefile.
Or maybe I shouldn't include the c files in the repo. Instead generate them when I make the source distribution?
Suggestions?
Having recently moved over to a new computer, I needed to reinstall/rebuild several packages I have been using. I have run into a brick wall with bottleneck. Below is my command and the error messages on a Ubuntu 11.04 machine. It appears that there are some init.py files expected that no longer exists. I cloned the repo this afternoon.
ben@tigger:~/Programs/bottleneck$ python setup.py build
running build
running build_py
package init file 'bottleneck/tests/init.py' not found (or not a regular file)
package init file 'bottleneck/src/func/init.py' not found (or not a regular file)
package init file 'bottleneck/src/move/init.py' not found (or not a regular file)
package init file 'bottleneck/tests/init.py' not found (or not a regular file)
package init file 'bottleneck/src/func/init.py' not found (or not a regular file)
package init file 'bottleneck/src/move/init.py' not found (or not a regular file)
running build_ext
building 'func' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/home/ben/Programs/numpy/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/func/32bit/func.c -o build/temp.linux-i686-2.7/bottleneck/src/func/32bit/func.o
gcc: bottleneck/src/func/32bit/func.c: No such file or directory
gcc: no input files
error: command 'gcc' failed with exit status 1
Segmentation fault with Cython 0.14.1 but not 0.13.
See this thread: http://groups.google.com/group/cython-users/browse_thread/thread/23f373a65a2a6391
Fixed with commit 89bad25
Good:
>> bn.nanmedian(np.array([np.nan, 1, np.nan]))
1.0
Bad:
>> bn.nanmedian(np.array([np.nan, np.nan, 1]))
-inf
Only an issue for odd number of elements. And only for all NaNs except the last value.
Bottleneck uses the numpy function PyArray_FillWithScalar to (typically) fill an array with np.nan. The function PyArray_FillWithScalar returns -1 if the fill was unsuccessful but bottlneck does not check the return value. Should check and then raise an exception if the return value is -1.
Docstring does not explain that median and nanmedian modify the input array. Keep docstring; change code to not modify input.
Figure out a clean way to make a single source distribution for both 32 bit and 64 bit OS.
nanvar() has same bug that was fixed in Bottleneck 0.4.1 for nanstd().
Thanks again to Christoph Gohlke for finding the bug.
A couple of requests came from the scipy mailing list to remove the dependency of Bottleneck on scipy:
On Wed, Dec 1, 2010 at 3:49 PM, T J wrote:
On Tue, Nov 30, 2010 at 7:04 PM, Keith Goodman wrote:
I bumped the Bottleneck requirements from "NumPy, SciPy" to "NumPy
1.5.1+, SciPy 0.8.0+". I think that is fair to do for a brand new
project.If SciPy is only used in the benchmarks/tests, then why not make it an
optional benchmark/test that runs only if SciPy is present?
nose.SkipTest should be useful here. I frequently run software on
machines that only have NumPy installed.
Seems like a strange discussion to have on the scipy list :)
I don't want to have a hole in my unit test coverage. But I could copy
over the nan functions in scipy stats. And I guess the benchmark could
use those too. And then skip moving window benchmarks against
scipy.ndimage for those who don't have scipy installed.
As discussed recently (January) on the numpy list, be careful when using C ints for indexing into arrays. A C int can overflow for very large but-some-people-use-them arrays.
I think Bottleneck uses:
cdef Py_ssize_t idx
for indexing. So that should be safe for large arrays. BUT it does use ints for the size of the array:
cdef np.npy_intp *dim
dim = PyArray_DIMS(a)
cdef int n = dim[0]
for idx in range(n):
...
That could be a problem since n could overflow. So switch to:
cdef Py_ssize_t n = dim[0]
But first try to confirm there is a problem by working on very large arrays. And that the issue is fixed by the above or similar.
Output of some functions depend on the default integer dtype of NumPy on your OS. Bottleneck determines the default int dtype when it templates the functions. Therefore the C file that I include in the releases on PYPI were for 64 bit OS since my computer is 64 bit.
Need to make two source releases: 32 and 64 bit.
Mean --> Sum
I[3] bn.func.nansum_2d_float32_axis0?
nansum_2d_float32_axis0(ndarray a)
Mean of 2d array with dtype=float32 along axis=0 ignoring NaNs.
Some functions choke on size zero input arrays or give the wrong output. Need to fix and to add unit tests.
A replace() function would be handy. Here are some timings of a prototype:
I[31] a = np.random.rand(10000)
I[32] a[a>0.8] = np.nan
I[33] timeit np.nan_to_num(a)
1000 loops, best of 3: 300 us per loop
I[34] a = np.random.rand(10000)
I[35] a[a>0.8] = np.nan
I[36] timeit mask = np.isnan(a); a[mask] = 0
10000 loops, best of 3: 50.9 us per loop
I[37] a = np.random.rand(10000)
I[38] a[a>0.8] = np.nan
I[39] timeit mask = np.isnan(a); np.putmask(a, mask, 0)
10000 loops, best of 3: 32.9 us per loop
I[40] a = np.random.rand(10000)
I[41] a[a>0.8] = np.nan
I[42] timeit replace(a, np.nan, 0)
100000 loops, best of 3: 8.57 us per loop
And here's the prototype:
@cython.boundscheck(False)
@cython.wraparound(False)
def replace(np.ndarray[np.float64_t, ndim=1] a, double r, double w):
"replace elements of 1d numpy array of dtype=np.float64."
cdef Py_ssize_t i
cdef int a0 = a.shape[0]
cdef np.float64_t ai
if r == r:
for i in range(a0):
ai = a[i]
if ai == r:
a[i] = w
else:
for i in range(a0):
ai = a[i]
if ai != ai:
a[i] = w
Add a moving window scoreatpercentile function. It could be based on the move_median code with some slight modifications.
Compile and run bottleneck with python 3. Then run:
>>> import bottleneck as bn
>>> bn.test()
>>> bn.bench()
A typo in the error message of bn.argpartsort() makes the error message confusion:
In [20]: a = np.arange(15).reshape(5,3)
In [21]: bn.argpartsort(a, 10, axis=1)
ValueError: `n` (=10) must be between 1 and 5, inclusive.
The error message should be:
ValueError: `n` (=10) must be between 1 and 3, inclusive.
To fix, change (n0 --> n1):
if (n < 1) or (n > n1):
raise ValueError(PARTSORT_ERR_MSG % (n, n0))
to
if (n < 1) or (n > n1):
raise ValueError(PARTSORT_ERR_MSG % (n, n1))
in argpartsort_2d_float64_axis1 (or actually in the template that create the function). Similar bugs may exist for the 3d version of the function.
Add a faster version of scipy.stats.ss (sum of squares).
Timings of prototype function with (1000, 1000) array:
>> from bottleneck import ss_2d_float64_axis0
>> from bottleneck import ss_2d_float64_axis1
>> from scipy.stats import ss
>>
>> a = np.random.rand(1000,1000)
>>
>> timeit ss(a, 0)
100 loops, best of 3: 10.7 ms per loop
>> timeit ss(a, 1)
100 loops, best of 3: 3.03 ms per loop
>>
>> timeit ss_2d_float64_axis0(a)
100 loops, best of 3: 8.79 ms per loop
>> timeit ss_2d_float64_axis1(a)
1000 loops, best of 3: 1.17 ms per loop
Timing for (100,100) array:
>> a = np.random.rand(100,100)
>> timeit ss(a, 1)
10000 loops, best of 3: 44.8 us per loop
>> timeit ss_2d_float64_axis1(a)
100000 loops, best of 3: 12.2 us per loop
Timing for (1000,10) array:
>> a = np.random.rand(1000,10)
>> timeit ss(a, 1)
10000 loops, best of 3: 148 us per loop
>> timeit ss_2d_float64_axis1(a)
100000 loops, best of 3: 12.8 us per loop
For large 1d arrays np.dot is faster:
>> a = np.random.rand(1000)
>> timeit ss_1d_float64_axisNone(a)
1000000 loops, best of 3: 1.77 us per loop
>> timeit np.dot(a, a)
1000000 loops, best of 3: 1.6 us per loop
But not for small 1d arrays:
>> a = np.random.rand(10)
>> timeit ss_1d_float64_axisNone(a)
1000000 loops, best of 3: 696 ns per loop
>> timeit np.dot(a, a)
1000000 loops, best of 3: 801 ns per loop
Prototype code based on bn.nansum:
@cython.boundscheck(False)
@cython.wraparound(False)
def ss_1d_float64_axisNone(np.ndarray[np.float64_t, ndim=1] a):
"SS of 1d array with dtype=float64 along axis=None ignoring NaNs."
cdef np.float64_t asum = 0, ai
cdef Py_ssize_t i0
cdef np.npy_intp *dim
dim = PyArray_DIMS(a)
cdef int n0 = dim[0]
for i0 in range(n0):
ai = a[i0]
asum += ai * ai
return np.float64(asum)
@cython.boundscheck(False)
@cython.wraparound(False)
def ss_2d_float64_axis0(np.ndarray[np.float64_t, ndim=2] a):
"Mean of 2d array with dtype=float64 along axis=0 ignoring NaNs."
cdef np.float64_t asum = 0, ai
cdef Py_ssize_t i0, i1
cdef np.npy_intp *dim
dim = PyArray_DIMS(a)
cdef int n0 = dim[0]
cdef int n1 = dim[1]
cdef np.npy_intp *dims = [n1]
cdef np.ndarray[np.float64_t, ndim=1] y = PyArray_EMPTY(1, dims,
NPY_float64, 0)
for i1 in range(n1):
asum = 0
for i0 in range(n0):
ai = a[i0, i1]
asum += ai * ai
y[i1] = asum
return y
I'm getting tons of compiler warnings (all unit tests pass). For example:
bottleneck/src/func/64bit/func.c: In function ‘__pyx_pf_4func_897argpartsort_3d_float64_axis2’:
bottleneck/src/func/64bit/func.c:213289:14: warning: variable ‘__pyx_bshape_2_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213288:14: warning: variable ‘__pyx_bshape_1_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213287:14: warning: variable ‘__pyx_bshape_0_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213282:14: warning: variable ‘__pyx_bshape_2_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213281:14: warning: variable ‘__pyx_bshape_1_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213280:14: warning: variable ‘__pyx_bshape_0_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213275:14: warning: variable ‘__pyx_bshape_2_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213274:14: warning: variable ‘__pyx_bshape_1_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213273:14: warning: variable ‘__pyx_bshape_0_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213272:14: warning: variable ‘__pyx_bstride_2_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213271:14: warning: variable ‘__pyx_bstride_1_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213270:14: warning: variable ‘__pyx_bstride_0_a’ set but not used [-Wunused-but-set-variable]
Due to typo in template, int array input results in call to slow, non-cython version of move_nanmean:
>>> a = np.arange(10)
>>> bn.move.move_nanmean_selector(a, window=2, axis=0)
(<built-in function move_nanmean_slow_axis0>,
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
Currently if you
$ make sdist
when the *.c files don't exist then you will get a bad source distribution that contains no C files. So replace:
sdist:
rm -f ../../MANIFEST
git status
find ../.. -name *.c
cd ../..; python setup.py sdist
with
sdist: pyx cfiles
rm -f ../../MANIFEST
git status
find ../.. -name *.c
cd ../..; python setup.py sdist
Add a moving window median function perhaps by (cython) wrapping the following C code:
http://home.tiac.net/~cri_a/source_code/index.html#winmedian_pkg
As reported by Jeff on the bottleneck mailing list [1], bottleneck 0.4.3 doesn't support python 2.5 since bottleneck uses a with
statement without importing it.
[1] http://groups.google.com/group/bottle-neck/browse_thread/thread/e80c888b22532c74
Add a partial sort function (based on bn.median code).
partsort() would not fully sort. Instead the kth smallest element would be in the correct place. Everything to the left of it would smaller or equal to the kth element (but not sorted); everything to the right would be greater or equal to it (but not sorted). That is much faster than a full sort.
Example usage (median of ten smallest elements):
>>> a = np.random.rand(500,500)
>>> b = bn.partsort(a, k=10, axis=0)
>>> bn.median(b[:k], axis=0)
An argpartsort() function would be useful too.
bottleneck/slow/func.py contains copies of a few scipy functions (users asked that bottleneck not depend on scipy). Update the functions if scipy has made any improvements.
The cython functions in bottleneck that return a scalar make that scalar a numpy object with dtype attribute etc by using:
>>> np.float64(x)
And similarly for other dtypes.
Is there a faster (but simple) way to do this using the Numpy C API?
As reported by Christoph Gohlke, the bottleneck functions that return indices return the wrong dtype on win64.
See the following threads:
http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056679.html
http://groups.google.com/group/cython-users/browse_thread/thread/f8022ee7ccbf7c5b
Should we add a bn.searchsorted(arr1, arr2)
function for the special case where both arr1
and arr2
are already sorted?
Here's a prototype: https://github.com/sot/Ska.Numpy/blob/master/Ska/Numpy/fastss.pyx
Some issues to consider:
order
('left, 'right') input parameter?For the origin of the idea, see https://groups.google.com/group/bottle-neck/browse_thread/thread/ec37c0e93d6d58cc
Brute force is faster than scipy.spatial.cKDTree for knn when k=1 and distance is Euclidean:
I[5] targets = np.random.uniform(0, 8, (5000, 7))
I[6] element = np.arange(1, 8, dtype=np.float64)
I[7] T = scipy.spatial.cKDTree(targets)
I[8] timeit T.query(element)
10000 loops, best of 3: 36.1 us per loop
I[9] timeit nn(targets, element)
10000 loops, best of 3: 28.5 us per loop
What about for lower dimensions (2 instead of 7) where cKDTree gets faster?
I[18] element = np.arange(1,3, dtype=np.float64)
I[19] targets = np.random.uniform(0,8,(5000,2))
I[20] T = scipy.spatial.cKDTree(targets)
I[21] timeit T.query(element)
10000 loops, best of 3: 27.5 us per loop
I[22] timeit nn(targets, element)
100000 loops, best of 3: 11.6 us per loop
Prototype code (bottleneck license):
@cython.boundscheck(False)
@cython.wraparound(False)
def nn(np.ndarray[np.float64_t, ndim=2] targets,
np.ndarray[np.float64_t, ndim=1] element):
"Sum of squares of 2d array with dtype=int64 along axis=1."
cdef:
np.float64_t ssum = 0, d, ssummin=np.inf, dist
Py_ssize_t i0, i1, imin = 0
np.npy_intp *dim
Py_ssize_t n0
Py_ssize_t n1
dim = PyArray_DIMS(targets)
n0 = dim[0]
n1 = dim[1]
for i0 in range(n0):
ssum = 0
for i1 in range(n1):
d = targets[i0, i1] - element[i1]
ssum += d * d
if ssum < ssummin:
ssummin = ssum
imin = i0
dist = sqrt(ssummin)
return dist, imin
For some context, see this thread: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062103.html
$ git describe
v0.4.3-9-g523d22d
'bottleneck/src/func/func.c' and 'bottleneck/src/move/move.c'
as specified in setup.py are missing.
Because the bottleneck/src/Makefile makes a call to python that imports "bottleneck.src.makepyx" (and that module also imports others), a completely clean, standard setup system can not build bottleneck from source. This is a checken-and-egg problem that can probably only be resolved by restructuring the build process.
Sometimes it's useful to get a rolling mean over the entire input.
import pandas
In [8]: pandas.rolling_mean(np.array([1,2,3,4,5]),3)
Out[8]: array([ nan, nan, 2., 3., 4.])
In [9]: pandas.rolling_mean(np.array([1,2,3,4,5]),3, min_periods=1)
Out[9]: array([ 1. , 1.5, 2. , 3. , 4. ])
MANIFEST.in points to old location of MakeFile:
include bottleneck/src/Makefile
Tests are failing on windows 64 when run with unreleased numpy 1.7 (check to see if they fail on linux 64):
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:13: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .func import (nansum, nanmax, nanmin, nanmean, nanstd, nanvar, median,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:13: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility
from .func import (nansum, nanmax, nanmin, nanmean, nanstd, nanvar, median,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:19: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .move import (move_sum, move_nansum,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:19: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility
from .move import (move_sum, move_nansum,
.............................FFFF.......................................................................................
======================================================================
FAIL: Test nanmax.
----------------------------------------------------------------------
Traceback (most recent call last):
File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
assert_array_equal(actual, desired, err_msg)
File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
func nanmax | input a4 (int32) | shape (2L, 0L) | axis -2
Input array:
[]
(mismatch 100.0%)
x: array([], dtype=int32)
y: array('Crashed',
dtype='|S7')
======================================================================
FAIL: Test nanargmin.
----------------------------------------------------------------------
Traceback (most recent call last):
File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
assert_array_equal(actual, desired, err_msg)
File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
func nanargmin | input a88 (float32) | shape (2L, 3L) | axis -1
Input array:
[[ nan nan nan]
[ 3. 4. 5.]]
(mismatch 100.0%)
x: array('Crashed',
dtype='|S7')
y: array([-9223372036854775808, 0], dtype=int64)
======================================================================
FAIL: Test nanargmax.
----------------------------------------------------------------------
Traceback (most recent call last):
File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
assert_array_equal(actual, desired, err_msg)
File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
func nanargmax | input a88 (float32) | shape (2L, 3L) | axis -1
Input array:
[[ nan nan nan]
[ 3. 4. 5.]]
(mismatch 100.0%)
x: array('Crashed',
dtype='|S7')
y: array([-9223372036854775808, 2], dtype=int64)
======================================================================
FAIL: Test nanmin.
----------------------------------------------------------------------
Traceback (most recent call last):
File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
assert_array_equal(actual, desired, err_msg)
File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
For particular orderings of elements in an array that contains both numbers and NaNs, nanmedian makes the wrong count of the number of non-NaN elements and therefore returns the wrong value for the nanmedian.
The first two examples are correct, the last one is wrong:
>> bn.nanmedian([1, 2])
1.5
>> bn.nanmedian([1, np.nan, 2])
1.5
>> bn.nanmedian([1, np.nan, np.nan, 2])
1.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.