Git Product home page Git Product logo

bottleneck's People

Contributors

ales-erjavec avatar alexamici avatar astrofrog avatar bnavigator avatar bryant1410 avatar cburroughs avatar cgohlke avatar danhakimi avatar djsutherland avatar ghisvail avatar gliptak avatar jaimefrio avatar jennolsen84 avatar jenshnielsen avatar kwgoodman avatar lebedov avatar marscher avatar mathause avatar mgorny avatar midnighter avatar mrjbq7 avatar nileshpatra avatar odidev avatar qwhelan avatar rdbisme avatar richardscottoz avatar shoyer avatar toobaz avatar vstinner avatar weathergod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bottleneck's Issues

Writing beyond the range of an array

The low-level functions nanstd_3d_int32_axis1 and nanstd_3d_int64_axis1, called by bottleneck.nanstd() for 3d input, wrote beyond the memory owned by the output array if arr.shape[1] == 0 and arr.shape[0] > arr.shape[2], where arr is the input array.

Thanks to Christoph Gohlke for finding an example to demonstrate the bug.

unit test fails when building from git source

erg@ommegang ~/python/bottleneck $ [master*] make all
rm -rf bottleneck/src/*~ bottleneck/src/*.so bottleneck/src/*.c bottleneck/src/*.o bottleneck/src/*.html bottleneck/src/build bottleneck/src/../*.so
rm -rf bottleneck/src/func/32bit/*.c bottleneck/src/func/64bit/*.c
rm -rf bottleneck/src/move/32bit/*.c bottleneck/src/move/64bit/*.c
rm -rf bottleneck/src/func/32bit/*.pyx  bottleneck/src/func/64bit/*.pyx
rm -rf bottleneck/src/move/32bit/*.pyx  bottleneck/src/move/64bit/*.pyx
python -c "from bottleneck.src.makepyx import makepyx; makepyx()"
cython bottleneck/src/func/32bit/func.pyx
cython bottleneck/src/func/64bit/func.pyx
cython bottleneck/src/move/32bit/move.pyx
cython bottleneck/src/move/64bit/move.pyx
rm -rf bottleneck/src/../func.so
python bottleneck/src/func/setup.py build_ext --inplace
running build_ext
skipping 'bottleneck/src/func/64bit/func.c' Cython extension (up-to-date)
building 'func' extension
gcc -pthread -fno-strict-aliasing -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -fPIC -I/usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/func/64bit/func.c -o build/temp.linux-x86_64-2.7/bottleneck/src/func/64bit/func.o
gcc -pthread -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu build/temp.linux-x86_64-2.7/bottleneck/src/func/64bit/func.o -L/usr/lib -lpython2.7 -o /home/erg/python/bottleneck/func.so
rm -rf bottleneck/src/../move.so
python bottleneck/src/move/setup.py build_ext --inplace
running build_ext
skipping 'bottleneck/src/move/64bit/move.c' Cython extension (up-to-date)
building 'move' extension
gcc -pthread -fno-strict-aliasing -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -fPIC -I/usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/move/64bit/move.c -o build/temp.linux-x86_64-2.7/bottleneck/src/move/64bit/move.o
gcc -pthread -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu build/temp.linux-x86_64-2.7/bottleneck/src/move/64bit/move.o -L/usr/lib -lpython2.7 -o /home/erg/python/bottleneck/move.so
python -c "import bottleneck;bottleneck.test(extra_argv=['--processes=4'])"
Running unit tests for bottleneck
NumPy version 1.6.2
NumPy is installed in /usr/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy
Python version 2.7.3 (default, Apr 24 2012, 00:00:54) [GCC 4.7.0 20120414 (prerelease)]
nose version 1.1.2
.......................................nose.plugins.multiprocess: ERROR: Worker 2 error running test or returning results
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 688, in __runner
    test(result)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
    test(orig)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
    test(orig)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 767, in run
    self.tasks, test)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 511, in addtask
    testQueue.put((test_addr,arg), block=False)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
PicklingError: Can't pickle <built-in function nansum>: import of module func failed
....E...................................................
======================================================================
ERROR: Failure: PicklingError (Can't pickle <built-in function nansum>: import of module func failed)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 688, in __runner
    test(result)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
    test(orig)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 784, in run
    test(orig)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/suite.py", line 176, in __call__
    return self.run(*arg, **kw)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 767, in run
    self.tasks, test)
  File "/usr/lib/python2.7/site-packages/nose-1.1.2-py2.7.egg/nose/plugins/multiprocess.py", line 511, in addtask
    testQueue.put((test_addr,arg), block=False)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
PicklingError: Can't pickle <built-in function nansum>: import of module func failed

----------------------------------------------------------------------
Ran 95 tests in 9.141s

FAILED (errors=1)

erg@ommegang ~/python/bottleneck $ [master*] uname -a
Linux ommegang 3.5.3-1-ARCH #1 SMP PREEMPT Sun Aug 26 09:14:51 CEST 2012 x86_64 GNU/Linux

compiler warnings

I just installed bottleneck with easy_install on Windows with MingW 4.? (32bit python on 64 bit computer)
Works without problems, test suite passes but there are a few compiler warnings

Searching for bottleneck
Reading http://pypi.python.org/simple/bottleneck/
Reading http://berkeleyanalytics.com/bottleneck
Best match: Bottleneck 0.5.0
Downloading http://pypi.python.org/packages/source/B/Bottleneck/Bottleneck-0.5.0.tar.gz#md5=65f2b3be0ca74b859392d554a48a0906
Processing Bottleneck-0.5.0.tar.gz
Running Bottleneck-0.5.0\setup.py -q bdist_egg --dist-dir c:\users\josef\appdata\local\temp\easy_install-hagefy\Bottleneck-0.5.0\egg-dist-tmp-hf63me
package init file 'bottleneck/tests\__init__.py' not found (or not a regular file)
package init file 'bottleneck/src/func\__init__.py' not found (or not a regular file)
package init file 'bottleneck/src/move\__init__.py' not found (or not a regular file)
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseArgtupleInvalid':
bottleneck/src/func/32bit/func.c:247114: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247114: warning: format '%s' expects type 'char *', but argument 5 has type 'Py_ssize_t'
bottleneck/src/func/32bit/func.c:247114: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247114: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseNeedMoreValuesError':
bottleneck/src/func/32bit/func.c:247124: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247124: warning: format '%s' expects type 'char *', but argument 3 has type 'Py_ssize_t'
bottleneck/src/func/32bit/func.c:247124: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: In function '__Pyx_RaiseTooManyValuesError':
bottleneck/src/func/32bit/func.c:247132: warning: unknown conversion type character 'z' in format
bottleneck/src/func/32bit/func.c:247132: warning: too many arguments for format
bottleneck/src/func/32bit/func.c: At top level:
C:\Python26\lib\site-packages\numpy\core\include/numpy/__ufunc_api.h:196: warning: '_import_umath' defined but not used
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseArgtupleInvalid':
bottleneck/src/move/32bit/move.c:203854: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203854: warning: format '%s' expects type 'char *', but argument 5 has type 'Py_ssize_t'
bottleneck/src/move/32bit/move.c:203854: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203854: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseNeedMoreValuesError':
bottleneck/src/move/32bit/move.c:203956: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203956: warning: format '%s' expects type 'char *', but argument 3 has type 'Py_ssize_t'
bottleneck/src/move/32bit/move.c:203956: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: In function '__Pyx_RaiseTooManyValuesError':
bottleneck/src/move/32bit/move.c:203964: warning: unknown conversion type character 'z' in format
bottleneck/src/move/32bit/move.c:203964: warning: too many arguments for format
bottleneck/src/move/32bit/move.c: At top level:
C:\Python26\lib\site-packages\numpy\core\include/numpy/__ufunc_api.h:196: warning: '_import_umath' defined but not used
zip_safe flag not set; analyzing archive contents...
Adding bottleneck 0.5.0 to easy-install.pth file

Installed c:\python26\lib\site-packages\bottleneck-0.5.0-py2.6-win32.egg
Processing dependencies for bottleneck
Finished processing dependencies for bottleneck

E:\>python
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import bottleneck as bo
>>> bo.test()
<...>
NumPy version 1.5.1
NumPy is installed in C:\Python26\lib\site-packages\numpy
Python version 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)]
nose version 1.0.0
................................................................................
----------------------------------------------------------------------
Ran 80 tests in 36.863s

OK
<nose.result.TextTestResult run=80 errors=0 failures=0>

Cannot build/install on Ubuntu 12.10 64Bit in Virtual Machine

Hello,

I'm not pretending that somebody will track down the cause but I thought I'd share the situation.

I'm trying to install bottleneck on a VirtualBox 64bits Ubuntu 12.10 guest.

python setup.py install

ends with the following lines:

bottleneck/src/func/64bit/func.c: At top level:
/home/scrosta/virtualenv/local/lib/python2.7/site-packages/numpy/core/include/numpy/__ufunc_api.h:226:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 4

where of course the gcc: internal compiler error: Killed (program cc1) is the ugly bit.

Also when trying a make all from the repository gives exactly the same error:

/home/scrosta/virtualenv/local/lib/python2.7/site-packages/numpy/core/include/numpy/__ufunc_api.h:226:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 4
make: *** [funcs] Error 1

Symbol not found: _get_largest_child

After installing on OS X 10.8 Mountain Lion, I get this error:

>>> from bottleneck.move import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dlopen(/Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so, 2): Symbol not found: _get_largest_child
  Referenced from: /Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so
  Expected in: flat namespace
 in /Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so

Any ideas?

>>> import numpy
>>> numpy.__version__
'1.6.2'

bn.anynan() and bn.allnan()

bn.anynan(arr, axis) and bn.allnan(arr, axis) might be useful functions to have in bottleneck.

Here's some proof of concept code for anynan:

@cython.boundscheck(False)
@cython.wraparound(False)
def anynan_axis1(np.ndarray[np.float64_t, ndim=2] a):
   cdef:
       int n0, n1, i, j, f
       np.npy_intp *dim
       np.float64_t aij
       np.ndarray[np.uint8_t, ndim=1, cast=True] y
   dim = PyArray_DIMS(a)
   n0 = dim[0]
   n1 = dim[1]
   cdef np.npy_intp *dims = [n0]
   y = PyArray_EMPTY(1, dims, NPY_BOOL, 0)
   for i in range(n0):
       f = 1
       for j in range(n1):
           aij = a[i, j]
           if aij != aij:
               y[i] = 1
               f = 0
               break
       if f == 1:
           y[i] = 0
   return y

We'd have to teach the templating function loop_cdef() to create empty bool return arrays.

Only update C files at each release?

Every time I modify a cython pyx file, the corresponding change to the auto-generated C file is huge. The repo is growing in size quickly.

So, should I only update the C files at each release? In between releases users can use the makefile.

Or maybe I shouldn't include the c files in the repo. Instead generate them when I make the source distribution?

Suggestions?

Can not build bottleneck from fresh clone

Having recently moved over to a new computer, I needed to reinstall/rebuild several packages I have been using. I have run into a brick wall with bottleneck. Below is my command and the error messages on a Ubuntu 11.04 machine. It appears that there are some init.py files expected that no longer exists. I cloned the repo this afternoon.

ben@tigger:~/Programs/bottleneck$ python setup.py build
running build
running build_py
package init file 'bottleneck/tests/init.py' not found (or not a regular file)
package init file 'bottleneck/src/func/init.py' not found (or not a regular file)
package init file 'bottleneck/src/move/init.py' not found (or not a regular file)
package init file 'bottleneck/tests/init.py' not found (or not a regular file)
package init file 'bottleneck/src/func/init.py' not found (or not a regular file)
package init file 'bottleneck/src/move/init.py' not found (or not a regular file)
running build_ext
building 'func' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/home/ben/Programs/numpy/numpy/core/include -I/usr/include/python2.7 -c bottleneck/src/func/32bit/func.c -o build/temp.linux-i686-2.7/bottleneck/src/func/32bit/func.o
gcc: bottleneck/src/func/32bit/func.c: No such file or directory
gcc: no input files
error: command 'gcc' failed with exit status 1

Bug in nanmedian

Good:

>> bn.nanmedian(np.array([np.nan, 1, np.nan]))
   1.0

Bad:

>> bn.nanmedian(np.array([np.nan, np.nan, 1]))
   -inf

Only an issue for odd number of elements. And only for all NaNs except the last value.

Check return value of PyArray_FillWithScalar

Bottleneck uses the numpy function PyArray_FillWithScalar to (typically) fill an array with np.nan. The function PyArray_FillWithScalar returns -1 if the fill was unsuccessful but bottlneck does not check the return value. Should check and then raise an exception if the return value is -1.

median, nanmedian modify input

Docstring does not explain that median and nanmedian modify the input array. Keep docstring; change code to not modify input.

Remove SciPy dependency

A couple of requests came from the scipy mailing list to remove the dependency of Bottleneck on scipy:

On Wed, Dec 1, 2010 at 3:49 PM, T J wrote:

On Tue, Nov 30, 2010 at 7:04 PM, Keith Goodman wrote:

I bumped the Bottleneck requirements from "NumPy, SciPy" to "NumPy
1.5.1+, SciPy 0.8.0+". I think that is fair to do for a brand new
project.

If SciPy is only used in the benchmarks/tests, then why not make it an
optional benchmark/test that runs only if SciPy is present?
nose.SkipTest should be useful here. I frequently run software on
machines that only have NumPy installed.

Seems like a strange discussion to have on the scipy list :)

I don't want to have a hole in my unit test coverage. But I could copy
over the nan functions in scipy stats. And I guess the benchmark could
use those too. And then skip moving window benchmarks against
scipy.ndimage for those who don't have scipy installed.

Bug for very large input arrays

As discussed recently (January) on the numpy list, be careful when using C ints for indexing into arrays. A C int can overflow for very large but-some-people-use-them arrays.

I think Bottleneck uses:

cdef Py_ssize_t idx

for indexing. So that should be safe for large arrays. BUT it does use ints for the size of the array:

cdef np.npy_intp *dim
dim = PyArray_DIMS(a)
cdef int n = dim[0]
for idx in range(n):
    ...

That could be a problem since n could overflow. So switch to:

cdef Py_ssize_t n = dim[0]

But first try to confirm there is a problem by working on very large arrays. And that the issue is fixed by the above or similar.

Problems on 32 bit OS

Output of some functions depend on the default integer dtype of NumPy on your OS. Bottleneck determines the default int dtype when it templates the functions. Therefore the C file that I include in the releases on PYPI were for 64 bit OS since my computer is 64 bit.

Need to make two source releases: 32 and 64 bit.

Docstring typo

Mean --> Sum

I[3] bn.func.nansum_2d_float32_axis0?
nansum_2d_float32_axis0(ndarray a)
Mean of 2d array with dtype=float32 along axis=0 ignoring NaNs.

Add replace()

A replace() function would be handy. Here are some timings of a prototype:

I[31] a = np.random.rand(10000)
I[32] a[a>0.8] = np.nan
I[33] timeit np.nan_to_num(a)
1000 loops, best of 3: 300 us per loop

I[34] a = np.random.rand(10000)
I[35] a[a>0.8] = np.nan
I[36] timeit mask = np.isnan(a); a[mask] = 0
10000 loops, best of 3: 50.9 us per loop

I[37] a = np.random.rand(10000)
I[38] a[a>0.8] = np.nan
I[39] timeit mask = np.isnan(a); np.putmask(a, mask, 0)
10000 loops, best of 3: 32.9 us per loop

I[40] a = np.random.rand(10000)
I[41] a[a>0.8] = np.nan
I[42] timeit replace(a, np.nan, 0)
100000 loops, best of 3: 8.57 us per loop

And here's the prototype:

@cython.boundscheck(False)
@cython.wraparound(False)
def replace(np.ndarray[np.float64_t, ndim=1] a, double r, double w):
    "replace elements of 1d numpy array of dtype=np.float64."
    cdef Py_ssize_t i
    cdef int a0 = a.shape[0]
    cdef np.float64_t ai
    if r == r:
        for i in range(a0):
            ai = a[i]
            if ai == r:
                a[i] = w
    else:
         for i in range(a0):
            ai = a[i]
            if ai != ai:
                a[i] = w   

Add move_percentile()

Add a moving window scoreatpercentile function. It could be based on the move_median code with some slight modifications.

Test bottleneck on python 3

Compile and run bottleneck with python 3. Then run:

>>> import bottleneck as bn
>>> bn.test()
>>> bn.bench()

Typo in argpartsort error message

A typo in the error message of bn.argpartsort() makes the error message confusion:

In [20]: a = np.arange(15).reshape(5,3)
In [21]: bn.argpartsort(a, 10, axis=1)
ValueError: `n` (=10) must be between 1 and 5, inclusive.

The error message should be:

ValueError: `n` (=10) must be between 1 and 3, inclusive.

To fix, change (n0 --> n1):

if (n < 1) or (n > n1):
    raise ValueError(PARTSORT_ERR_MSG % (n, n0))

to

if (n < 1) or (n > n1):
    raise ValueError(PARTSORT_ERR_MSG % (n, n1))

in argpartsort_2d_float64_axis1 (or actually in the template that create the function). Similar bugs may exist for the 3d version of the function.

Add sum of squares function ss()

Add a faster version of scipy.stats.ss (sum of squares).

Timings of prototype function with (1000, 1000) array:

>> from bottleneck import ss_2d_float64_axis0
>> from bottleneck import ss_2d_float64_axis1
>> from scipy.stats import ss
>> 
>> a = np.random.rand(1000,1000)
>> 
>> timeit ss(a, 0)
100 loops, best of 3: 10.7 ms per loop
>> timeit ss(a, 1)
100 loops, best of 3: 3.03 ms per loop
>> 
>> timeit ss_2d_float64_axis0(a)
100 loops, best of 3: 8.79 ms per loop
>> timeit ss_2d_float64_axis1(a)
1000 loops, best of 3: 1.17 ms per loop

Timing for (100,100) array:

>> a = np.random.rand(100,100)
>> timeit ss(a, 1)
10000 loops, best of 3: 44.8 us per loop
>> timeit ss_2d_float64_axis1(a)
100000 loops, best of 3: 12.2 us per loop

Timing for (1000,10) array:

>> a = np.random.rand(1000,10)
>> timeit ss(a, 1)
10000 loops, best of 3: 148 us per loop
>> timeit ss_2d_float64_axis1(a)
100000 loops, best of 3: 12.8 us per loop

For large 1d arrays np.dot is faster:

>> a = np.random.rand(1000)
>> timeit ss_1d_float64_axisNone(a)
1000000 loops, best of 3: 1.77 us per loop
>> timeit np.dot(a, a)
1000000 loops, best of 3: 1.6 us per loop

But not for small 1d arrays:

>> a = np.random.rand(10)
>> timeit ss_1d_float64_axisNone(a)
1000000 loops, best of 3: 696 ns per loop
>> timeit np.dot(a, a)
1000000 loops, best of 3: 801 ns per loop

Prototype code based on bn.nansum:

@cython.boundscheck(False)
@cython.wraparound(False)
def ss_1d_float64_axisNone(np.ndarray[np.float64_t, ndim=1] a):
    "SS of 1d array with dtype=float64 along axis=None ignoring NaNs."
    cdef np.float64_t asum = 0, ai
    cdef Py_ssize_t i0
    cdef np.npy_intp *dim
    dim = PyArray_DIMS(a)
    cdef int n0 = dim[0]
    for i0 in range(n0):
        ai = a[i0]
        asum += ai * ai
    return np.float64(asum)

@cython.boundscheck(False)
@cython.wraparound(False)
def ss_2d_float64_axis0(np.ndarray[np.float64_t, ndim=2] a):
    "Mean of 2d array with dtype=float64 along axis=0 ignoring NaNs."
    cdef np.float64_t asum = 0, ai
    cdef Py_ssize_t i0, i1
    cdef np.npy_intp *dim
    dim = PyArray_DIMS(a)
    cdef int n0 = dim[0]
    cdef int n1 = dim[1]
    cdef np.npy_intp *dims = [n1]
    cdef np.ndarray[np.float64_t, ndim=1] y = PyArray_EMPTY(1, dims,
                                              NPY_float64, 0)
    for i1 in range(n1):
        asum = 0
        for i0 in range(n0):
            ai = a[i0, i1]
            asum += ai * ai
        y[i1] = asum
    return y

Tons of compiler warnings

I'm getting tons of compiler warnings (all unit tests pass). For example:

bottleneck/src/func/64bit/func.c: In function ‘__pyx_pf_4func_897argpartsort_3d_float64_axis2’:
bottleneck/src/func/64bit/func.c:213289:14: warning: variable ‘__pyx_bshape_2_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213288:14: warning: variable ‘__pyx_bshape_1_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213287:14: warning: variable ‘__pyx_bshape_0_y’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213282:14: warning: variable ‘__pyx_bshape_2_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213281:14: warning: variable ‘__pyx_bshape_1_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213280:14: warning: variable ‘__pyx_bshape_0_b’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213275:14: warning: variable ‘__pyx_bshape_2_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213274:14: warning: variable ‘__pyx_bshape_1_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213273:14: warning: variable ‘__pyx_bshape_0_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213272:14: warning: variable ‘__pyx_bstride_2_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213271:14: warning: variable ‘__pyx_bstride_1_a’ set but not used [-Wunused-but-set-variable]
bottleneck/src/func/64bit/func.c:213270:14: warning: variable ‘__pyx_bstride_0_a’ set but not used [-Wunused-but-set-variable]

int array input calls slow, non-cython version of move_nanmean

Due to typo in template, int array input results in call to slow, non-cython version of move_nanmean:

>>> a = np.arange(10)
>>> bn.move.move_nanmean_selector(a, window=2, axis=0) 
(<built-in function move_nanmean_slow_axis0>,
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

'make sdist' should 'make pyx' and 'make cfiles'

Currently if you

$ make sdist

when the *.c files don't exist then you will get a bad source distribution that contains no C files. So replace:

sdist:
    rm -f ../../MANIFEST
    git status
    find ../.. -name *.c
    cd ../..; python setup.py sdist

with

sdist: pyx cfiles
    rm -f ../../MANIFEST
    git status
    find ../.. -name *.c
    cd ../..; python setup.py sdist 

Add partsort() and argpartsort()

Add a partial sort function (based on bn.median code).

partsort() would not fully sort. Instead the kth smallest element would be in the correct place. Everything to the left of it would smaller or equal to the kth element (but not sorted); everything to the right would be greater or equal to it (but not sorted). That is much faster than a full sort.

Example usage (median of ten smallest elements):

>>> a = np.random.rand(500,500)
>>> b = bn.partsort(a, k=10, axis=0)
>>> bn.median(b[:k], axis=0)

An argpartsort() function would be useful too.

Use Numpy C-API version of np.float64(x)

The cython functions in bottleneck that return a scalar make that scalar a numpy object with dtype attribute etc by using:

>>> np.float64(x)

And similarly for other dtypes.

Is there a faster (but simple) way to do this using the Numpy C API?

Should we add a nearest neighbor function, bn.nn()?

Brute force is faster than scipy.spatial.cKDTree for knn when k=1 and distance is Euclidean:

I[5] targets = np.random.uniform(0, 8, (5000, 7))
I[6] element = np.arange(1, 8, dtype=np.float64)

I[7] T = scipy.spatial.cKDTree(targets)
I[8] timeit T.query(element)
10000 loops, best of 3: 36.1 us per loop

I[9] timeit nn(targets, element)
10000 loops, best of 3: 28.5 us per loop

What about for lower dimensions (2 instead of 7) where cKDTree gets faster?

I[18] element = np.arange(1,3, dtype=np.float64)
I[19] targets = np.random.uniform(0,8,(5000,2))

I[20] T = scipy.spatial.cKDTree(targets)
I[21] timeit T.query(element)
10000 loops, best of 3: 27.5 us per loop

I[22] timeit nn(targets, element)
100000 loops, best of 3: 11.6 us per loop

Prototype code (bottleneck license):

@cython.boundscheck(False)
@cython.wraparound(False)
def nn(np.ndarray[np.float64_t, ndim=2] targets,
       np.ndarray[np.float64_t, ndim=1] element):
    "Sum of squares of 2d array with dtype=int64 along axis=1."
    cdef:
        np.float64_t ssum = 0, d, ssummin=np.inf, dist
        Py_ssize_t i0, i1, imin = 0
        np.npy_intp *dim
        Py_ssize_t n0
        Py_ssize_t n1
    dim = PyArray_DIMS(targets)
    n0 = dim[0]
    n1 = dim[1]
    for i0 in range(n0):
        ssum = 0
        for i1 in range(n1):
            d = targets[i0, i1] - element[i1]
            ssum += d * d
        if ssum < ssummin:
            ssummin = ssum
            imin = i0
    dist = sqrt(ssummin)        
    return dist, imin

For some context, see this thread: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062103.html

Unable to build from source on a clean system

Because the bottleneck/src/Makefile makes a call to python that imports "bottleneck.src.makepyx" (and that module also imports others), a completely clean, standard setup system can not build bottleneck from source. This is a checken-and-egg problem that can probably only be resolved by restructuring the build process.

feature request: min_periods parameter for moving functions

Sometimes it's useful to get a rolling mean over the entire input.

import pandas

In [8]: pandas.rolling_mean(np.array([1,2,3,4,5]),3)
Out[8]: array([ nan,  nan,   2.,   3.,   4.])

In [9]: pandas.rolling_mean(np.array([1,2,3,4,5]),3, min_periods=1)
Out[9]: array([ 1. ,  1.5,  2. ,  3. ,  4. ])

Test against numpy 1.7

From http://www.lfd.uci.edu/~gohlke/pythonlibs/tests/numpy-MKL-1.7.0.dev-66bd39f-win-amd64-py2.7/bottleneck_test.txt

Tests are failing on windows 64 when run with unreleased numpy 1.7 (check to see if they fail on linux 64):

X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:13: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
  from .func import (nansum, nanmax, nanmin, nanmean, nanstd, nanvar, median,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:13: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility
  from .func import (nansum, nanmax, nanmin, nanmean, nanstd, nanvar, median,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:19: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
  from .move import (move_sum, move_nansum,
X:\Python27-x64\lib\site-packages\bottleneck\__init__.py:19: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility
  from .move import (move_sum, move_nansum,
.............................FFFF.......................................................................................
======================================================================
FAIL: Test nanmax.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
    assert_array_equal(actual, desired, err_msg)
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

func nanmax | input a4 (int32) | shape (2L, 0L) | axis -2

Input array:
[]

(mismatch 100.0%)
 x: array([], dtype=int32)
 y: array('Crashed', 
      dtype='|S7')

======================================================================
FAIL: Test nanargmin.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
    assert_array_equal(actual, desired, err_msg)
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

func nanargmin | input a88 (float32) | shape (2L, 3L) | axis -1

Input array:
[[ nan  nan  nan]
 [  3.   4.   5.]]

(mismatch 100.0%)
 x: array('Crashed', 
      dtype='|S7')
 y: array([-9223372036854775808,                    0], dtype=int64)

======================================================================
FAIL: Test nanargmax.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
    assert_array_equal(actual, desired, err_msg)
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

func nanargmax | input a88 (float32) | shape (2L, 3L) | axis -1

Input array:
[[ nan  nan  nan]
 [  3.   4.   5.]]

(mismatch 100.0%)
 x: array('Crashed', 
      dtype='|S7')
 y: array([-9223372036854775808,                    2], dtype=int64)

======================================================================
FAIL: Test nanmin.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "X:\Python27-x64\lib\site-packages\bottleneck\tests\func_test.py", line 76, in unit_maker
    assert_array_equal(actual, desired, err_msg)
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 718, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\Gohlke\Desktop\TestNumpy\numpy-build\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

Bug in nanmedian

For particular orderings of elements in an array that contains both numbers and NaNs, nanmedian makes the wrong count of the number of non-NaN elements and therefore returns the wrong value for the nanmedian.

The first two examples are correct, the last one is wrong:

>> bn.nanmedian([1, 2])
   1.5
>> bn.nanmedian([1, np.nan, 2])
   1.5
>> bn.nanmedian([1, np.nan, np.nan, 2])
   1.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.