pydata / bottleneck Goto Github PK

View Code? Open in Web Editor NEW

1.0K 32.0 101.0 13.24 MB

Fast NumPy array functions written in C

License: BSD 2-Clause "Simplified" License

Makefile 0.47% Python 49.71% C 48.46% Batchfile 0.60% Shell 0.61% Dockerfile 0.16%

c python numpy fast c-extension

bottleneck's People

Contributors

Stargazers

Watchers

Forkers

weathergod fhal stroxler djsutherland jmcloughlin seberg dalejung jenshnielsen biolab shoyer jortizcs wisertogether jaimefrio andreabedini ssanderson schevalier simudream jennolsen84 midnighter toobaz ghisvail vlvlvlvl woodychow freephys spencerdudley narendraphadke tianjixuetu sanyaade-machine-learning abmitra84 mfrodl astrofrog cbonnett jkaluzka ales-erjavec yutiansut oldjohn86 laurabromley qwhelan simongarisch songqiangchina shawndegroot lokz666 bifaxin mannisen sakshisharmapuresoftware ross1503 ssps12 ossdev07 bohan-gsm saimn cenwurong rovedream codeur66 joye1503 vestigegroup ssh352 mathause zzy1601 traceofpoem akhand1111 lusewell sailfish009 yields-io andrii-riazanov hercules261188 hazho annuupadhyayps xenonorion itsjohnward dvp2015 mgorny marscher juh10 odidev jkadbear python-repository-hub ignitewind richardscottoz valeman terragord7 boyangzhou nileshpatra alikatgh bryant1410 arpitjain799 qulogic wukan1986 wellhiiguess stonebig cielavenir sysfce2 walidaun cburroughs abhihampiholi bnavigator sandutsar mrtarantoga

bottleneck's Issues

Add overwrite_input=False to median, nanmedian

Add optional overwrite_input (default False) input parameter to median() and nanmedian(). Should behave the same as in np.median().

rankdata and nanrankdata crash on win64

Writing beyond the range of an array

The low-level functions nanstd_3d_int32_axis1 and nanstd_3d_int64_axis1, called by bottleneck.nanstd() for 3d input, wrote beyond the memory owned by the output array if arr.shape[1] == 0 and arr.shape[0] > arr.shape[2], where arr is the input array.

Thanks to Christoph Gohlke for finding an example to demonstrate the bug.

Upgrade numpydoc from 0.3.1 to 0.4 to support Sphinx 1.0.1

unit test fails when building from git source

erg@ommegang ~/python/bottleneck $ [master*] make all
rm -rf bottleneck/src/*~ bottleneck/src/*.so bottleneck/src/*.c bottleneck/src/*.o bottleneck/src/*.html bottleneck/src/build bottleneck/src/../*.so
rm -rf bottleneck/src/func/32bit/*.c bottleneck/src/func/64bit/*.c
rm -rf bottleneck/src/move/32bit/*.c bottleneck/src/move/64bit/*.c
rm -rf bottleneck/src/func/32bit/*.pyx  bottleneck/src/func/64bit/*.pyx
rm -rf bottleneck/src/move/32bit/*.pyx  bottleneck/src/move/64bit/*.pyx
python -c "from bottleneck.src.makepyx import makepyx; makepyx()"
cython bottleneck/src/func/32bit/func.pyx
cython bottleneck/src/func/64bit/func.pyx
cython bottleneck/src/move/32bit/move.pyx
cython bottleneck/src/move/64bit/move.pyx
rm -rf bottleneck/src/../func.so
python bottleneck/src/func/setup.py build_ext --inplace
running build_ext
skipping 'bottleneck/src/func/64bit/func.c' Cython extension (up-to-date)
building 'func' extension
gcc -pthread -fno-strict-aliasing -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -fPIC -I/usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/func/64bit/func.c -o build/temp.linux-x86_64-2.7/bottleneck/src/func/64bit/func.o
gcc -pthread -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu build/temp.linux-x86_64-2.7/bottleneck/src/func/64bit/func.o -L/usr/lib -lpython2.7 -o /home/erg/python/bottleneck/func.so
rm -rf bottleneck/src/../move.so
python bottleneck/src/move/setup.py build_ext --inplace
running build_ext
skipping 'bottleneck/src/move/64bit/move.c' Cython extension (up-to-date)
building 'move' extension
gcc -pthread -fno-strict-aliasing -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -fPIC -I/usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/move/64bit/move.c -o build/temp.linux-x86_64-2.7/bottleneck/src/move/64bit/move.o
gcc -pthread -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu build/temp.linux-x86_64-2.7/bottleneck/src/move/64bit/move.o -L/usr/lib -lpython2.7 -o /home/erg/python/bottleneck/move.so
python -c "import bottleneck;bottleneck.test(extra_argv=['--processes=4'])"
Running unit tests for bottleneck
NumPy version 1.6.2
NumPy is installed in /usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy
Python version 2.7.3 (default, Apr 24 2012, 00:00:54) [GCC 4.7.0 20120414 (prerelease)]
nose version 1.1.2
.......................................nose.plugins.multiprocess: ERROR: Worker 2 error running test or returning results
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 688, in __runner
    test(result)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
    test(orig)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
    test(orig)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 767, in run
    self.tasks, test)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 511, in addtask
    testQueue.put((test_addr,arg), block=False)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
PicklingError: Can't pickle <built-in function nansum>: import of module func failed
....E...................................................
======================================================================
ERROR: Failure: PicklingError (Can't pickle <built-in function nansum>: import of module func failed)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 688, in __runner
    test(result)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
    test(orig)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
    test(orig)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 767, in run
    self.tasks, test)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 511, in addtask
    testQueue.put((test_addr,arg), block=False)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
PicklingError: Can't pickle <built-in function nansum>: import of module func failed

----------------------------------------------------------------------
Ran 95 tests in 9.141s

FAILED (errors=1)

erg@ommegang ~/python/bottleneck $ [master*] uname -a
Linux ommegang 3.5.3-1-ARCH #1 SMP PREEMPT Sun Aug 26 09:14:51 CEST 2012 x86_64 GNU/Linux

compiler warnings

I just installed bottleneck with easy_install on Windows with MingW 4.? (32bit python on 64 bit computer)
Works without problems, test suite passes but there are a few compiler warnings

Searching for bottleneck
Reading http://pypi.python.org/simple/bottleneck/
Reading http://berkeleyanalytics.com/bottleneck
Best match: Bottleneck 0.5.0
Downloading http://pypi.python.org/packages/source/B/Bottleneck/Bottleneck-0.5.0.tar.gz#md5=65f2b3be0ca74b859392d554a48a0906
Processing Bottleneck-0.5.0.tar.gz
Running Bottleneck-0.5.0\setup.py -q bdist_egg --dist-dir c:\users\josef\appdata\local\temp\easy_install-hagefy\Bottleneck-0.5.0\egg-dist-tmp-hf63me
package init file 'bottleneck/tests\__init__.py' not found (or not a regular file)
package init file 'bottleneck/src/func\__init__.py' not found (or not a regular file)
package init file 'bottleneck/src/move\__init__.py' not found (or not a regular file)
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseArgtupleInvalid':
bottleneck/src/func/32bit/func.c:247114: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247114: warning: format '%s' expects type 'char *', but argument 5 has type 'Py_ssize_t'
bottleneck/src/func/32bit/func.c:247114: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247114: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseNeedMoreValuesError':
bottleneck/src/func/32bit/func.c:247124: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247124: warning: format '%s' expects type 'char *', but argument 3 has type 'Py_ssize_t'
bottleneck/src/func/32bit/func.c:247124: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseTooManyValuesError':
bottleneck/src/func/32bit/func.c:247132: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247132: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: At top level:
C:\Python26\lib\site-packages\numpy\core\include/numpy/__ufunc_api.h:196: warning: '_import_umath' defined but not used
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseArgtupleInvalid':
bottleneck/src/move/32bit/move.c:203854: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203854: warning: format '%s' expects type 'char *', but argument 5 has type 'Py_ssize_t'
bottleneck/src/move/32bit/move.c:203854: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203854: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseNeedMoreValuesError':
bottleneck/src/move/32bit/move.c:203956: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203956: warning: format '%s' expects type 'char *', but argument 3 has type 'Py_ssize_t'
bottleneck/src/move/32bit/move.c:203956: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseTooManyValuesError':
bottleneck/src/move/32bit/move.c:203964: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203964: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: At top level:
C:\Python26\lib\site-packages\numpy\core\include/numpy/__ufunc_api.h:196: warning: '_import_umath' defined but not used
zip_safe flag not set; analyzing archive contents...
Adding bottleneck 0.5.0 to easy-install.pth file

Installed c:\python26\lib\site-packages\bottleneck-0.5.0-py2.6-win32.egg
Processing dependencies for bottleneck
Finished processing dependencies for bottleneck

E:\>python
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import bottleneck as bo
>>> bo.test()
<...>
NumPy version 1.5.1
NumPy is installed in C:\Python26\lib\site-packages\numpy
Python version 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)]
nose version 1.0.0
................................................................................
----------------------------------------------------------------------
Ran 80 tests in 36.863s

OK
<nose.result.TextTestResult run=80 errors=0 failures=0>

Run bn.test() with numpy 1.6.2

Cannot build/install on Ubuntu 12.10 64Bit in Virtual Machine

Hello,

I'm not pretending that somebody will track down the cause but I thought I'd share the situation.

I'm trying to install bottleneck on a VirtualBox 64bits Ubuntu 12.10 guest.

python setup.py install

ends with the following lines:

bottleneck/src/func/64bit/func.c: At top level:
/home/scrosta/virtualenv/local/lib/python2.7/site-packages/numpy/core/include/numpy/__ufunc_api.h:226:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 4

where of course the gcc: internal compiler error: Killed (program cc1) is the ugly bit.

Also when trying a make all from the repository gives exactly the same error:

/home/scrosta/virtualenv/local/lib/python2.7/site-packages/numpy/core/include/numpy/__ufunc_api.h:226:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 4
make: *** [funcs] Error 1

Symbol not found: _get_largest_child

After installing on OS X 10.8 Mountain Lion, I get this error:

>>> from bottleneck.move import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dlopen(/Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so, 2): Symbol not found: _get_largest_child
  Referenced from: /Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so
  Expected in: flat namespace
 in /Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so

Any ideas?

>>> import numpy
>>> numpy.__version__
'1.6.2'

Add move_ewa() function

Add a moving window exponentially weighted average.

bn.anynan() and bn.allnan()

bn.anynan(arr, axis) and bn.allnan(arr, axis) might be useful functions to have in bottleneck.

Here's some proof of concept code for anynan:

@cython.boundscheck(False)
@cython.wraparound(False)
def anynan_axis1(np.ndarray[np.float64_t, ndim=2] a):
   cdef:
       int n0, n1, i, j, f
       np.npy_intp *dim
       np.float64_t aij
       np.ndarray[np.uint8_t, ndim=1, cast=True] y
   dim = PyArray_DIMS(a)
   n0 = dim[0]
   n1 = dim[1]
   cdef np.npy_intp *dims = [n0]
   y = PyArray_EMPTY(1, dims, NPY_BOOL, 0)
   for i in range(n0):
       f = 1
       for j in range(n1):
           aij = a[i, j]
           if aij != aij:
               y[i] = 1
               f = 0
               break
       if f == 1:
           y[i] = 0
   return y

We'd have to teach the templating function loop_cdef() to create empty bool return arrays.

Makefile is not included in the source distribution

$ python setup.py sdist

does not add the makefile to the source distribution.

Only update C files at each release?

Every time I modify a cython pyx file, the corresponding change to the auto-generated C file is huge. The repo is growing in size quickly.

So, should I only update the C files at each release? In between releases users can use the makefile.

Or maybe I shouldn't include the c files in the repo. Instead generate them when I make the source distribution?

Suggestions?

Can not build bottleneck from fresh clone

Having recently moved over to a new computer, I needed to reinstall/rebuild several packages I have been using. I have run into a brick wall with bottleneck. Below is my command and the error messages on a Ubuntu 11.04 machine. It appears that there are some init.py files expected that no longer exists. I cloned the repo this afternoon.

ben@tigger:~/Programs/bottleneck$ python setup.py build
running build
running build_py
package init file 'bottleneck/tests/init.py' not found (or not a regular file)
package init file 'bottleneck/src/func/init.py' not found (or not a regular file)
package init file 'bottleneck/src/move/init.py' not found (or not a regular file)
package init file 'bottleneck/tests/init.py' not found (or not a regular file)
package init file 'bottleneck/src/func/init.py' not found (or not a regular file)
package init file 'bottleneck/src/move/init.py' not found (or not a regular file)
running build_ext
building 'func' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/home/ben/Programs/numpy/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/func/32bit/func.c -o build/temp.linux-i686-2.7/bottleneck/src/func/32bit/func.o
gcc: bottleneck/src/func/32bit/func.c: No such file or directory
gcc: no input files
error: command 'gcc' failed with exit status 1

Segmentation fault with Cython 0.14.1 but not 0.13

Segmentation fault with Cython 0.14.1 but not 0.13.

See this thread: http://groups.google.com/group/cython-users/browse_thread/thread/23f373a65a2a6391

Fixed with commit 89bad25

Bug in nanmedian

Good:

>> bn.nanmedian(np.array([np.nan, 1, np.nan]))
   1.0

Bad:

>> bn.nanmedian(np.array([np.nan, np.nan, 1]))
   -inf

Only an issue for odd number of elements. And only for all NaNs except the last value.

Check return value of PyArray_FillWithScalar

Bottleneck uses the numpy function PyArray_FillWithScalar to (typically) fill an array with np.nan. The function PyArray_FillWithScalar returns -1 if the fill was unsuccessful but bottlneck does not check the return value. Should check and then raise an exception if the return value is -1.

median, nanmedian modify input

Docstring does not explain that median and nanmedian modify the input array. Keep docstring; change code to not modify input.

Make single source distribution (currently have 32 and 64 bit)

Figure out a clean way to make a single source distribution for both 32 bit and 64 bit OS.

Writing beyond the range of an array, part 2

nanvar() has same bug that was fixed in Bottleneck 0.4.1 for nanstd().

Thanks again to Christoph Gohlke for finding the bug.

Remove SciPy dependency

A couple of requests came from the scipy mailing list to remove the dependency of Bottleneck on scipy:

On Wed, Dec 1, 2010 at 3:49 PM, T J wrote:

On Tue, Nov 30, 2010 at 7:04 PM, Keith Goodman wrote:

I bumped the Bottleneck requirements from "NumPy, SciPy" to "NumPy
1.5.1+, SciPy 0.8.0+". I think that is fair to do for a brand new
project.

If SciPy is only used in the benchmarks/tests, then why not make it an
optional benchmark/test that runs only if SciPy is present?
nose.SkipTest should be useful here. I frequently run software on
machines that only have NumPy installed.

Seems like a strange discussion to have on the scipy list :)

I don't want to have a hole in my unit test coverage. But I could copy
over the nan functions in scipy stats. And I guess the benchmark could
use those too. And then skip moving window benchmarks against
scipy.ndimage for those who don't have scipy installed.

Bug for very large input arrays

As discussed recently (January) on the numpy list, be careful when using C ints for indexing into arrays. A C int can overflow for very large but-some-people-use-them arrays.

I think Bottleneck uses:

cdef Py_ssize_t idx

for indexing. So that should be safe for large arrays. BUT it does use ints for the size of the array:

cdef np.npy_intp *dim
dim = PyArray_DIMS(a)
cdef int n = dim[0]
for idx in range(n):
    ...

That could be a problem since n could overflow. So switch to:

cdef Py_ssize_t n = dim[0]

But first try to confirm there is a problem by working on very large arrays. And that the issue is fixed by the above or similar.

Problems on 32 bit OS

Output of some functions depend on the default integer dtype of NumPy on your OS. Bottleneck determines the default int dtype when it templates the functions. Therefore the C file that I include in the releases on PYPI were for 64 bit OS since my computer is 64 bit.

Need to make two source releases: 32 and 64 bit.

Add unit test coverage report to makefile

Docstring typo

Mean --> Sum

I[3] bn.func.nansum_2d_float32_axis0?
nansum_2d_float32_axis0(ndarray a)
Mean of 2d array with dtype=float32 along axis=0 ignoring NaNs.

Some functions choke on nd size zero arrays

Some functions choke on size zero input arrays or give the wrong output. Need to fix and to add unit tests.

Add replace()

A replace() function would be handy. Here are some timings of a prototype:

I[31] a = np.random.rand(10000)
I[32] a[a>0.8] = np.nan
I[33] timeit np.nan_to_num(a)
1000 loops, best of 3: 300 us per loop

I[34] a = np.random.rand(10000)
I[35] a[a>0.8] = np.nan
I[36] timeit mask = np.isnan(a); a[mask] = 0
10000 loops, best of 3: 50.9 us per loop

I[37] a = np.random.rand(10000)
I[38] a[a>0.8] = np.nan
I[39] timeit mask = np.isnan(a); np.putmask(a, mask, 0)
10000 loops, best of 3: 32.9 us per loop

I[40] a = np.random.rand(10000)
I[41] a[a>0.8] = np.nan
I[42] timeit replace(a, np.nan, 0)
100000 loops, best of 3: 8.57 us per loop

And here's the prototype:

@cython.boundscheck(False)
@cython.wraparound(False)
def replace(np.ndarray[np.float64_t, ndim=1] a, double r, double w):
    "replace elements of 1d numpy array of dtype=np.float64."
    cdef Py_ssize_t i
    cdef int a0 = a.shape[0]
    cdef np.float64_t ai
    if r == r:
        for i in range(a0):
            ai = a[i]
            if ai == r:
                a[i] = w
    else:
         for i in range(a0):
            ai = a[i]
            if ai != ai:
                a[i] = w

Add move_percentile()

Add a moving window scoreatpercentile function. It could be based on the move_median code with some slight modifications.

Test bottleneck on python 3

Compile and run bottleneck with python 3. Then run:

>>> import bottleneck as bn
>>> bn.test()
>>> bn.bench()

Typo in argpartsort error message

A typo in the error message of bn.argpartsort() makes the error message confusion:

In [20]: a = np.arange(15).reshape(5,3)
In [21]: bn.argpartsort(a, 10, axis=1)
ValueError: `n` (=10) must be between 1 and 5, inclusive.

The error message should be:

ValueError: `n` (=10) must be between 1 and 3, inclusive.

To fix, change (n0 --> n1):

if (n < 1) or (n > n1):
    raise ValueError(PARTSORT_ERR_MSG % (n, n0))

if (n < 1) or (n > n1):
    raise ValueError(PARTSORT_ERR_MSG % (n, n1))

in argpartsort_2d_float64_axis1 (or actually in the template that create the function). Similar bugs may exist for the 3d version of the function.

bottleneck.move_std() produces nans, doesn't match pandas.rolling_std()

https://gist.github.com/3624548

pandas-dev/pandas#1840

Add sum of squares function ss()

Add a faster version of scipy.stats.ss (sum of squares).

Timings of prototype function with (1000, 1000) array:

>> from bottleneck import ss_2d_float64_axis0
>> from bottleneck import ss_2d_float64_axis1
>> from scipy.stats import ss
>> 
>> a = np.random.rand(1000,1000)
>> 
>> timeit ss(a, 0)
100 loops, best of 3: 10.7 ms per loop
>> timeit ss(a, 1)
100 loops, best of 3: 3.03 ms per loop
>> 
>> timeit ss_2d_float64_axis0(a)
100 loops, best of 3: 8.79 ms per loop
>> timeit ss_2d_float64_axis1(a)
1000 loops, best of 3: 1.17 ms per loop

Timing for (100,100) array:

>> a = np.random.rand(100,100)
>> timeit ss(a, 1)
10000 loops, best of 3: 44.8 us per loop
>> timeit ss_2d_float64_axis1(a)
100000 loops, best of 3: 12.2 us per loop

Timing for (1000,10) array:

>> a = np.random.rand(1000,10)
>> timeit ss(a, 1)
10000 loops, best of 3: 148 us per loop
>> timeit ss_2d_float64_axis1(a)
100000 loops, best of 3: 12.8 us per loop

For large 1d arrays np.dot is faster:

>> a = np.random.rand(1000)
>> timeit ss_1d_float64_axisNone(a)
1000000 loops, best of 3: 1.77 us per loop
>> timeit np.dot(a, a)
1000000 loops, best of 3: 1.6 us per loop

But not for small 1d arrays:

>> a = np.random.rand(10)
>> timeit ss_1d_float64_axisNone(a)
1000000 loops, best of 3: 696 ns per loop
>> timeit np.dot(a, a)
1000000 loops, best of 3: 801 ns per loop

Prototype code based on bn.nansum:

@cython.boundscheck(False)
@cython.wraparound(False)
def ss_1d_float64_axisNone(np.ndarray[np.float64_t, ndim=1] a):
    "SS of 1d array with dtype=float64 along axis=None ignoring NaNs."
    cdef np.float64_t asum = 0, ai
    cdef Py_ssize_t i0
    cdef np.npy_intp *dim
    dim = PyArray_DIMS(a)
    cdef int n0 = dim[0]
    for i0 in range(n0):
        ai = a[i0]
        asum += ai * ai
    return np.float64(asum)

@cython.boundscheck(False)
@cython.wraparound(False)
def ss_2d_float64_axis0(np.ndarray[np.float64_t, ndim=2] a):
    "Mean of 2d array with dtype=float64 along axis=0 ignoring NaNs."
    cdef np.float64_t asum = 0, ai
    cdef Py_ssize_t i0, i1
    cdef np.npy_intp *dim
    dim = PyArray_DIMS(a)
    cdef int n0 = dim[0]
    cdef int n1 = dim[1]
    cdef np.npy_intp *dims = [n1]
    cdef np.ndarray[np.float64_t, ndim=1] y = PyArray_EMPTY(1, dims,
                                              NPY_float64, 0)
    for i1 in range(n1):
        asum = 0
        for i0 in range(n0):
            ai = a[i0, i1]
            asum += ai * ai
        y[i1] = asum
    return y

Tons of compiler warnings

I'm getting tons of compiler warnings (all unit tests pass). For example:

bottleneck/src/func/64bit/func.c: In function ‘__pyx_pf_4func_897argpartsort_3d_float64_axis2’:
bottleneck/src/func/64bit/func.c:213289:14: warning: variable ‘__pyx_bshape_2_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213288:14: warning: variable ‘__pyx_bshape_1_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213287:14: warning: variable ‘__pyx_bshape_0_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213282:14: warning: variable ‘__pyx_bshape_2_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213281:14: warning: variable ‘__pyx_bshape_1_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213280:14: warning: variable ‘__pyx_bshape_0_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213275:14: warning: variable ‘__pyx_bshape_2_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213274:14: warning: variable ‘__pyx_bshape_1_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213273:14: warning: variable ‘__pyx_bshape_0_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213272:14: warning: variable ‘__pyx_bstride_2_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213271:14: warning: variable ‘__pyx_bstride_1_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213270:14: warning: variable ‘__pyx_bstride_0_a’ set but not used [-Wunused-but-set-variable]

int array input calls slow, non-cython version of move_nanmean

Due to typo in template, int array input results in call to slow, non-cython version of move_nanmean:

>>> a = np.arange(10)
>>> bn.move.move_nanmean_selector(a, window=2, axis=0) 
(<built-in function move_nanmean_slow_axis0>,
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

'make sdist' should 'make pyx' and 'make cfiles'

Currently if you

$ make sdist

when the *.c files don't exist then you will get a bad source distribution that contains no C files. So replace:

sdist:
    rm -f ../../MANIFEST
    git status
    find ../.. -name *.c
    cd ../..; python setup.py sdist

with

sdist: pyx cfiles
    rm -f ../../MANIFEST
    git status
    find ../.. -name *.c
    cd ../..; python setup.py sdist

Add move_median() function

Add a moving window median function perhaps by (cython) wrapping the following C code:

http://home.tiac.net/~cri_a/source_code/index.html#winmedian_pkg

bottleneck doesn't support python 2.5

As reported by Jeff on the bottleneck mailing list [1], bottleneck 0.4.3 doesn't support python 2.5 since bottleneck uses a with statement without importing it.

[1] http://groups.google.com/group/bottle-neck/browse_thread/thread/e80c888b22532c74

Add partsort() and argpartsort()

Add a partial sort function (based on bn.median code).

partsort() would not fully sort. Instead the kth smallest element would be in the correct place. Everything to the left of it would smaller or equal to the kth element (but not sorted); everything to the right would be greater or equal to it (but not sorted). That is much faster than a full sort.

Example usage (median of ten smallest elements):

>>> a = np.random.rand(500,500)
>>> b = bn.partsort(a, k=10, axis=0)
>>> bn.median(b[:k], axis=0)

An argpartsort() function would be useful too.

Check to see if copied scipy functions are up to date

bottleneck/slow/func.py contains copies of a few scipy functions (users asked that bottleneck not depend on scipy). Update the functions if scipy has made any improvements.

Use Numpy C-API version of np.float64(x)

The cython functions in bottleneck that return a scalar make that scalar a numpy object with dtype attribute etc by using:

>>> np.float64(x)

And similarly for other dtypes.

Is there a faster (but simple) way to do this using the Numpy C API?

Wrong dtype on win64 for functions that return indices

As reported by Christoph Gohlke, the bottleneck functions that return indices return the wrong dtype on win64.

See the following threads:

http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056679.html
http://groups.google.com/group/cython-users/browse_thread/thread/f8022ee7ccbf7c5b

searchsorted when both arrays are already sorted

Should we add a bn.searchsorted(arr1, arr2) function for the special case where both arr1 and arr2 are already sorted?

Here's a prototype: https://github.com/sot/Ska.Numpy/blob/master/Ska/Numpy/fastss.pyx

Some issues to consider:

Add optional order ('left, 'right') input parameter?
Handle NaNs like np.searchsorted does?

For the origin of the idea, see https://groups.google.com/group/bottle-neck/browse_thread/thread/ec37c0e93d6d58cc

Should we add a nearest neighbor function, bn.nn()?

Brute force is faster than scipy.spatial.cKDTree for knn when k=1 and distance is Euclidean:

I[5] targets = np.random.uniform(0, 8, (5000, 7))
I[6] element = np.arange(1, 8, dtype=np.float64)

I[7] T = scipy.spatial.cKDTree(targets)
I[8] timeit T.query(element)
10000 loops, best of 3: 36.1 us per loop

I[9] timeit nn(targets, element)
10000 loops, best of 3: 28.5 us per loop

What about for lower dimensions (2 instead of 7) where cKDTree gets faster?

I[18] element = np.arange(1,3, dtype=np.float64)
I[19] targets = np.random.uniform(0,8,(5000,2))

I[20] T = scipy.spatial.cKDTree(targets)
I[21] timeit T.query(element)
10000 loops, best of 3: 27.5 us per loop

I[22] timeit nn(targets, element)
100000 loops, best of 3: 11.6 us per loop

Prototype code (bottleneck license):

@cython.boundscheck(False)
@cython.wraparound(False)
def nn(np.ndarray[np.float64_t, ndim=2] targets,
       np.ndarray[np.float64_t, ndim=1] element):
    "Sum of squares of 2d array with dtype=int64 along axis=1."
    cdef:
        np.float64_t ssum = 0, d, ssummin=np.inf, dist
        Py_ssize_t i0, i1, imin = 0
        np.npy_intp *dim
        Py_ssize_t n0
        Py_ssize_t n1
    dim = PyArray_DIMS(targets)
    n0 = dim[0]
    n1 = dim[1]
    for i0 in range(n0):
        ssum = 0
        for i1 in range(n1):
            d = targets[i0, i1] - element[i1]
            ssum += d * d
        if ssum < ssummin:
            ssummin = ssum
            imin = i0
    dist = sqrt(ssummin)        
    return dist, imin

For some context, see this thread: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062103.html

func.c and move.c are missing from v0.4.3-9-g523d22d

$ git describe
v0.4.3-9-g523d22d

'bottleneck/src/func/func.c' and 'bottleneck/src/move/move.c'
as specified in setup.py are missing.

Unable to build from source on a clean system

Because the bottleneck/src/Makefile makes a call to python that imports "bottleneck.src.makepyx" (and that module also imports others), a completely clean, standard setup system can not build bottleneck from source. This is a checken-and-egg problem that can probably only be resolved by restructuring the build process.

feature request: min_periods parameter for moving functions

Sometimes it's useful to get a rolling mean over the entire input.

import pandas

In [8]: pandas.rolling_mean(np.array([1,2,3,4,5]),3)
Out[8]: array([ nan,  nan,   2.,   3.,   4.])

In [9]: pandas.rolling_mean(np.array([1,2,3,4,5]),3, min_periods=1)
Out[9]: array([ 1. ,  1.5,  2. ,  3. ,  4. ])

Update path in MANIFEST.in

MANIFEST.in points to old location of MakeFile:

include bottleneck/src/Makefile

Test against numpy 1.7

From http://www.lfd.uci.edu/~gohlke/pythonlibs/tests/numpy-MKL-1.7.0.dev-66bd39f-win-amd64-py2.7/bottleneck_test.txt

Tests are failing on windows 64 when run with unreleased numpy 1.7 (check to see if they fail on linux 64):

X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:13: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
  from .func import (nansum, nanmax, nanmin, nanmean, nanstd, nanvar, median,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:13: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility
  from .func import (nansum, nanmax, nanmin, nanmean, nanstd, nanvar, median,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:19: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
  from .move import (move_sum, move_nansum,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:19: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility
  from .move import (move_sum, move_nansum,
.............................FFFF.......................................................................................
======================================================================
FAIL: Test nanmax.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
    assert_array_equal(actual, desired, err_msg)
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

func nanmax | input a4 (int32) | shape (2L, 0L) | axis -2

Input array:
[]

(mismatch 100.0%)
 x: array([], dtype=int32)
 y: array('Crashed', 
      dtype='|S7')

======================================================================
FAIL: Test nanargmin.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
    assert_array_equal(actual, desired, err_msg)
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

func nanargmin | input a88 (float32) | shape (2L, 3L) | axis -1

Input array:
[[ nan  nan  nan]
 [  3.   4.   5.]]

(mismatch 100.0%)
 x: array('Crashed', 
      dtype='|S7')
 y: array([-9223372036854775808,                    0], dtype=int64)

======================================================================
FAIL: Test nanargmax.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
    assert_array_equal(actual, desired, err_msg)
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

func nanargmax | input a88 (float32) | shape (2L, 3L) | axis -1

Input array:
[[ nan  nan  nan]
 [  3.   4.   5.]]

(mismatch 100.0%)
 x: array('Crashed', 
      dtype='|S7')
 y: array([-9223372036854775808,                    2], dtype=int64)

======================================================================
FAIL: Test nanmin.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
    assert_array_equal(actual, desired, err_msg)
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

Bug in nanmedian

For particular orderings of elements in an array that contains both numbers and NaNs, nanmedian makes the wrong count of the number of non-NaN elements and therefore returns the wrong value for the nanmedian.

The first two examples are correct, the last one is wrong:

>> bn.nanmedian([1, 2])
   1.5
>> bn.nanmedian([1, np.nan, 2])
   1.5
>> bn.nanmedian([1, np.nan, np.nan, 2])
   1.0