scitools / biggus Goto Github PK

View Code? Open in Web Editor NEW

54.0 14.0 27.0 666 KB

:no_entry: [DEPRECATED] Virtual large arrays and lazy evaluation.

Home Page: http://biggus.readthedocs.io/

License: GNU Lesser General Public License v3.0

Python 100.00%

deprecated

biggus's People

Contributors

Stargazers

Watchers

biggus's Issues

Floor division when using floats.

I recently submitted a pull request (#1762) to iris for preserving lazy data when performing basic mathematical operations with cubes.

This lead me to discovering what is perhaps an inconsistency with the way biggus performs division.

In Python 2 the command 1.0 / 2.0 = 0.5, and the command 1 / 2 = 0. When using integers, python performs integer or 'floor' division. (Note: in Python 3, 1 / 2 = 0.5)

From what I can tell, it seems that biggus does not mimic this behaviour. For example this code:

import biggus
import numpy as np

ones = np.ones(5, dtype=np.float32)
fives = 5 * ones

bigones = biggus.NumpyArrayAdapter(ones)
bigfives = biggus.NumpyArrayAdapter(fives)

div = bigones / bigfives

print div
print div.ndarray()

Produces:

<Array shape=(5) dtype=dtype('float32') size=20 B>

[ 0.  0.  0.  0.  0.]

Here, it looks like biggus is performing floor division even when using floats.

Adding from __future__ import division to this code fixes this problem as it means that the / is interpreted as a 'true' division, however this does not fix the problem as I think biggus interprets the underlying operator.div as a floor division regardless of the dtype.

Any thoughts @pelson @rhattersley? Am I missing something?

IndexError when computing with 0-dim NumpyArrayAdapter

I have the following scenario:

import biggus
import numpy as np

depth_c = np.array(5.)

c = np.empty((1, 36, 1, 1))
s = np.empty((1, 36, 1, 1))
eta = np.empty((855, 1, 82, 130))
depth = np.empty((1, 1, 82, 130))

I convert all those arrays to NumpyArrayAdapter and then perform some computations with them:

c, s, eta, depth, depth_c = map(biggus.NumpyArrayAdapter, (c, s, eta, depth, depth_c))
S = (depth_c * s) + ((depth - depth_c) * c)
z = eta * (S / depth + 1) + S

z.shape
(855, 36, 82, 130)

z[0, :, 30, 40]
IndexError

That should be a valid slice. If I do not make the 0-dim array a NumpyArrayAdapter things work as expected:

c, s, eta, depth = map(biggus.NumpyArrayAdapter, (c, s, eta, depth))
S = (depth_c * s) + ((depth - depth_c) * c)
z = eta * (S / depth + 1) + S

z.shape
(855, 36, 82, 130)

z[0, :, 30, 40]
<_Elementwise shape=(36) dtype=dtype('float64')>

PS: Sorry I could not make this simpler. I tried but simple versions of this actually work just fine!

mean of an arraystack of masked arrays

Hi:

I was trying to get a time mean of a masked array using biggus as follows

   for sname, folders in season_name_to_folder_list.iteritems():
        arr_stack = biggus.ArrayStack(
            np.array([biggus.NumpyArrayAdapter(DataForDay(var_name=varname, day=fname)) for fname in folders])
        )
        print arr_stack.shape
        result[sname] = np.flipud(biggus.mean(arr_stack, axis=0).masked_array())

And it seems that the result of this operation gives masked-value + not-masked value -> not-masked-value, since when I use longer time intervals the masked region decreases?

__getitem__ of DataForDay returns a masked array.... But maybe I spoil everything by wrapping the list of numpyadapters with np.array()... Is there a way to do it properly?

Here is the definition of the DataForDay class: https://github.com/guziy/ShortPythonScripts/blob/master/modis_download/mcd43c3_seasonal_mean.py#L29

Re-consider the make-concrete interface

What should the interface be for converting an Array to a concrete ndarray/MaskedArray?

Can/should the existing ndarray() and masked_array() methods be converted to a single as_concrete() method? Can/should it return an ndarray instance where possible and MaskedArray where necessary?

Simple shared evaluation

Depends on #18.

Find a way to compute the following with only a single pass of the source data:

mean = biggus.mean(source)
std_dev = biggus.std_dev(source)

For example:

with biggus.context([mean, std_dev]):
    mean = mean.ndarray()
    std_dev = std_dev.ndarray()

Or:

mean, std_dev = biggus.ndarray(mean, std_dev)

Debian packaging

Out of range keys slice.

>>> import biggus
>>> import numpy as np
>>> b = biggus.ArrayAdapter(np.arange(12).reshape(3, 4), slice(30, None))
>>> b
<ArrayAdapter shape=(0, 4) dtype=dtype('int64')>
>>> r = b.ndarray()
>>> print r
[]
>>> r.ndim
2
>>> r.shape
(0, 4)

Standard deviation operator

Automatic chunk size selection

Currently the chunk size is fixed to 10, which gave near optimal results on one test file (tas_..._20051130.nc) with data dimensions (52560, 145, 192). This equates to approx 1MB.

Determine the optimal value for data files with differing resolutions and numbers of dimensions. And attempt to approximate this with an automatic calculation which uses the shape, dtype, etc. to select a near optimal value.

NB. The optimal value will probably vary depending on the architecture (e.g. desktop vs. HPC). How much does this matter? Is there a convenient 80/20 trade-off? If not and we really do need architecture-dependent tuning, how simple can this be? A single number?

Support leading ellipsis indexing

NumpyArrayAdapter(np.arange(24).reshape(3, 4, 2))[..., 0]

Currently:

Traceback (most recent call last):
  File "biggus/biggus/__init__.py", line 1463, in <module>
    b = NumpyArrayAdapter(np.arange(24).reshape(3, 4, 2))[..., 0]
  File "biggus/biggus/__init__.py", line 503, in __getitem__
    result_key = self._cleanup_new_key(new_key, size, axis)
  File "biggus/biggus/__init__.py", line 423, in _cleanup_new_key
    raise TypeError('invalid key {!r}'.format(key))
TypeError: invalid key Ellipsis

Overlapping shared evaluation

Compute the following with only one pass each of model1, model2, analysis.

mean_error1 = biggus.mean(model1 - analysis)
mean_error2 = biggus.mean(model2 - analysis)
e1, e2 = biggus.ndarrays([mean_error1, mean_error2])

biggus should implement the ndarray interface

Implementation would be a one-liner. Just add the following method to biggus.Array:

    def __array__(self, dtype=None):
        """NumPy array interface"""
        return self.ndarray()

Unfortunately, it's not as easy to support casting to masked arrays, although it looks like that is possible by adding a _mask property.

The reason I have not submitted this as a pull request (yet) is that there is a trade-off here:

Positives: This would make it dead easy to use biggus arrays in external packages (e.g., Iris or xray) without needing to do any un-Pythonic isinstance or hasattr type inspection to figure out how to make a concrete array. NumPy functions like np.asarray or np.sin would just work, making biggus arrays immediately compatible with tons of other code. By default, all arrays would be evaluated, but numpy also has hooks for ufuncs that would let you handle them in a lazy fashion.
Negatives: Writing np.array([[small_array] * 1000] * 5) will evaluate the array, unless you use the keyword argument dtype=object, which also would need to be sprinkled throughout biggus' code in a few places. Creating such arrays of arrays could be more awkward.

In my opinion, this is not much of a downside. Most users won't be making ArrayStacks from scratch, and people writing library code can use native Python data structures (which could be automatically cast to object ndarrays) or learn to write dtype=object. Thoughts?

biggus.dot

It would be useful to implement the numpy.dot function in biggus.

One interesting addition would be to add an optimised execution order for chained dot products (see http://www.cs.stonybrook.edu/~jgao/CSE548-fall07/David-mount-DP.pdf or for a numpy implementation https://github.com/sotte/numpy_mdot/blob/master/mdot.py) - it would exercise an interesting use case for optimisation of the biggus engine.

Define interface for mixed bounded/unbounded evaluation

What should the public user API be to enable users to mix and match bounded (ndarray and/or MaskedArray) with unbounded (e.g. netCDF variable) evaluation?

See https://github.com/SciTools/biggus/wiki/Processing

Also, consider:

How to make this consistent with any special-case methods/functions which deal with simple cases? For example my_array.ndarray() or biggus.save() (which currently saves a single array to a single variable). NB. It is perfectly legitimate (and possibly even desirable) to change the existing methods/functions, just as much as it is to define new ones.
What is the interface between biggus and a (possibly unbounded) save target? For example, if saving to a netCDF variable does the user need to ensure the variable has the correct shape before requesting the save? We don't want to have biggus doing any "magic".)

Provide Engine.ndarrays()

results = engine.ndarrays(my_derived_array, my_other_derived_array, ...)

For background, see #58.

AssertionError when loading the vertical coordinate of an iris cube.

I am not sure if this is a bug in Biggus or in iris. If I try to load a cube vertical coordinate directly, like this:

>>> import iris
>>> url = "http://colossus.dl.stevens-tech.edu:8080/thredds/dodsC/latest/Bight_gcmplt.nc"
>>> cube = iris.load_cube(url, 'sea_water_salinity')
>>> cube.coord('sea_surface_height_above_reference_ellipsoid')
<repr(<iris.coords.AuxCoord at 0x7f902daf6750>) failed: AssertionError: >

and accessing the points trigger the Exception.

Biggus-0.10.0-py2.7.egg/biggus/__init__.pyc in __init__(self, array1, array2, numpy_op, ma_op)
   2679 
   2680         # TODO: Type-promotion
-> 2681         assert array1.dtype == array2.dtype
   2682         self._array1 = array1
   2683         self._array2 = array2

But if I load the cube.coords(), then everything works. Here is a notebook showing this issue.

Simple element-wise arithmetic

Add support for deferred element-wise add, subtract, multiply, etc. of Arrays with identical shapes. (i.e. There is no need to support broadcasting at this stage.)

>>> assert model.shape == analysis.shape
>>> error = model - analysis
>>> isinstance(error, biggus.Array)
True
>>> error = error * error
>>> isinstance(error, biggus.Array)
True

Chained shared evaluation

Compute the mean & standard deviation of the error with a single pass of the data.

>>> error = model - analysis
>>> isinstance(error, biggus.Array)
True
>>> mean_error = biggus.mean(error)
>>> std_error = biggus.std(error)
>>> mean_error, std_error = biggus.ndarrays([mean_error, std_error])

Throttled producer

Limit the amount of pre-fetch that can be performed by the producer in the multi-threaded producer-consumer evaluation.

Bug with lazy variance and masked arrays

A very specialised bug with the variance calculation when producing masked arrays (smaller arrays, and accessing ndarray work fine):

>>> import biggus
>>> import numpy as np
>>> 
>>> arr = np.empty((10, 400, 720), dtype=np.float32)
>>> r = biggus.var(arr, axis=1)
>>> 
>>> r 
<_Aggregation shape=(10, 720) dtype=dtype('float32')>
>>> r.masked_array()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "biggus/__init__.py", line 1555, in masked_array
    result, = engine.masked_arrays(self)
  File "biggus/__init__.py", line 448, in masked_arrays
    return self._evaluate(arrays, True)
  File "biggus/__init__.py", line 442, in _evaluate
    ndarrays = group.evaluate(masked)
  File "biggus/__init__.py", line 428, in evaluate
    raise Exception('error during evaluation')
Exception: error during evaluation
>>> Exception in thread <biggus.StreamsHandlerNode object at 0x1949850>:
Traceback (most recent call last):
  File "lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "biggus/__init__.py", line 277, in run
    self.output(self.process_chunks(input_chunks))
  File "biggus/__init__.py", line 305, in process_chunks
    return self.streams_handler.process_chunks(chunks)
  File "biggus/__init__.py", line 1341, in process_chunks
    result = self.finalise()
  File "biggus/__init__.py", line 1501, in finalise
    chunk = super(_VarMaskedStreamsHandler, self).finalise()
  File "biggus/__init__.py", line 1459, in finalise
    array.shape = self.current_shape
ValueError: total size of new array must be unchanged

New 32-bit failures

Some small issues with longs:

======================================================================
FAIL: test_999999999 (biggus.tests.unit.test_Array.Test___str__)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/biggus-0.10.0/biggus/tests/unit/test_Array.py", line 129, in test_999999999
    "<Array shape=(9999999999,) dtype=dtype('int8') size=9.31 GiB>")
  File "/builddir/build/BUILD/biggus-0.10.0/biggus/tests/unit/test_Array.py", line 95, in _test
    self.assertEqual(str(array), expected)
AssertionError: "<Array shape=(9999999999L,) dtype=dtype('int8') size=9.31 GiB>" != "<Array shape=(9999999999,) dtype=dtype('int8') size=9.31 GiB>"

======================================================================
FAIL: test_999999999999 (biggus.tests.unit.test_Array.Test___str__)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/biggus-0.10.0/biggus/tests/unit/test_Array.py", line 134, in test_999999999999
    "<Array shape=(9999999999999,) dtype=dtype('int8') size=9.09 TiB>")
  File "/builddir/build/BUILD/biggus-0.10.0/biggus/tests/unit/test_Array.py", line 95, in _test
    self.assertEqual(str(array), expected)
AssertionError: "<Array shape=(9999999999999L,) dtype=dtype('int8') size=9.09 TiB>" != "<Array shape=(9999999999999,) dtype=dtype('int8') size=9.09 TiB>"

======================================================================
FAIL: test_999999999999999 (biggus.tests.unit.test_Array.Test___str__)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/biggus-0.10.0/biggus/tests/unit/test_Array.py", line 139, in test_999999999999999
    "<Array shape=(9999999999999999,) dtype=dtype('int8') "
  File "/builddir/build/BUILD/biggus-0.10.0/biggus/tests/unit/test_Array.py", line 95, in _test
    self.assertEqual(str(array), expected)
AssertionError: "<Array shape=(9999999999999999L,) dtype=dtype('int8') size=9094.95 TiB>" != "<Array shape=(9999999999999999,) dtype=dtype('int8') size=9094.95 TiB>"

New aggregation operators

~~count~~, gmean, hmean, ~~max~~, median, ~~min~~, percentile, proportion, ~~std_dev~~, ~~sum~~, and ~~variance~~

newaxis

@rhattersley Is it the hope that #116 will resolve:

>>> c = biggus.ConstantArray((2, 3, 4, 5))
>>> t = biggus.TransposedArray(c, (3, 1, 0, 2))
>>> print t
TransposedArray(<ConstantArray shape=(2, 3, 4, 5) dtype=dtype('float64')>, [3, 1, 0, 2])
>>> t[:2, 0, :, np.newaxis, :]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/h05/itwl/projects/git/biggus/biggus/__init__.py", line 1029, in __getitem__
    new_arr = self.pre_transposed[tuple(remapped_keys)]
  File "/home/h05/itwl/projects/git/biggus/biggus/__init__.py", line 587, in __getitem__
    keys = self._normalise_keys(keys)
  File "/home/h05/itwl/projects/git/biggus/biggus/__init__.py", line 537, in _normalise_keys
    raise IndexError('too many keys')
IndexError: too many keys

Add GridMosaic class

Similar to LinearMosaic, but extend along two dimensions.

... or ... should we just go for a generic n-dimensional mosaic?

Documentation

Get some docs built and hosted somewhere. e.g. www.readthedocs.org, scitools.org.uk, ... ?

Pathological difference in aggregation results from numpy

I just wanted to report a minor difference in the result dtype when aggregating with masked arrays (with a mask value). Numpy appears to upcast the result if the mask is defined (but not otherwise):

>>> a = np.ones((20, 4), dtype=np.float32)
>>> a = np.ma.masked_array(a, mask=True)

>>> print biggus.mean(a, axis=0).ndarray().dtype
float32
>>> print np.mean(a, axis=0).dtype
float64

I personally am not convinced that what numpy has is desirable behaviour, so please feel free to close this issue - I just wanted somewhere to report it for future me.

Indexing a TransposedArray with multiple ellipsis

Commit 5c01bc0

>>> a = np.empty((2, 3, 4, 5))
>>> t = biggus.TransposedArray(a, (3, 1, 0, 2))
>>> t.shape
(5, 3, 2, 4)
>>> b = np.empty(t.shape)
>>> b[:, ..., 0, ..., :].shape
(5, 2, 4)
>>> t[:, ..., 0, ..., :].shape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/h05/itwl/projects/git/biggus/biggus/__init__.py", line 1013, in __getitem__
    keys = _full_keys(keys, self.ndim)
  File "/home/h05/itwl/projects/git/biggus/biggus/__init__.py", line 2315, in _full_keys
    raise IndexError('Dimensions are over specified for indexing.')
IndexError: Dimensions are over specified for indexing.

Support for axis in aggregations

For example:

>>> isinstance(model, biggus.Array)
True
>>> model.shape
(52560, 145, 192)
>>> zonal_mean = biggus.mean(model, axis=1)
>>> time_mean = biggus.mean(model, axis=0)
>>>
>>> # This should only use a single pass of the data.
>>> zonal_mean, time_mean = biggus.ndarrays([zonal_mean, time_mean])

Better ArrayStack/LinearMosaic input validation

Throw an error if the things to be stacked/concatenated aren't biggus arrays.

See #77 for a case where a stack was made from non-biggus arrays.

Rolling window

At least "mean"...

... but consider: count, gmean, hmean, max, median, min, percentile, proportion, std_dev, sum, and variance.

PYPI classifiers meta-data

Consider whether more PYPI meta-data classifiers are relevant to Biggus in setup.py, see https://pypi.python.org/pypi?:action=list_classifiers.

Setting fill_value interfere with iris._merge

Assigning a fill_value property in Array.fill_value will create a masked array default fill_value even when the original data is not masked.

That prevents cubes merging in iris.

The failure happens when slicing cubes and trying to merge them again. If one of the slices happens to be masked and the other is not the merging will fail. That occurs because the merging will be between the original fill_value, from the original cube metadata, and the default fill_value assigned by biggus lazy_data().

Don't know if the best approach would be for iris to preserve the original fill_value when slicing data that is masked (even when there is nothing to fill -- that would be an iris fix), or if biggus could avoid attaching this property to non-masked objects.

Deep copy of Biggus Array

There is a problem with taking a deep copy of a Biggus Array.

It appears that doing so can change the order of the elements in the array.

We can define our own copy implementation by adding a __deepcopy__ method. If we do so, the question is what should our implementation be?

Using biggus operators on 'large' varying-dtype arrays

With thanks to @matthew-mizielinski for originally pointing this out...

Using biggus operators to combine 'large' arrays with differing dtypes raises an error when biggus runs the specified operator on the chunks to be processed.

This is demonstrated by the following code snippet and the error(s) produced when I ran it:

a_type = np.float32
b_type = np.float64
shape = (41, 192, 144)

a = np.array(np.random.random(shape), dtype=a_type)
b = np.array(np.random.random(shape), dtype=b_type)

a_bg = biggus.NumpyArrayAdapter(a)
b_bg = biggus.NumpyArrayAdapter(b)
prod = a_bg * b_bg

result = biggus.sum(prod, axis=0).ndarray()

Exception in thread <biggus.StreamsHandlerNode object at 0x7f64d325ac50>:
Traceback (most recent call last):
  File "/.../lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/.../lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/.../biggus/biggus/__init__.py", line 293, in run
    self.output(self.process_chunks(input_chunks))
  File "/.../biggus/biggus/__init__.py", line 323, in process_chunks
    return self.streams_handler.process_chunks(chunks)
  File "/.../biggus/biggus/__init__.py", line 2808, in process_chunks
    array = self.operator(*[chunk.data for chunk in chunks])
ValueError: operands could not be broadcast together with shapes (41,192,144) (37,192,144) 

Traceback (most recent call last):
  File "<input>", line 11, in <module>
  File "/.../biggus/biggus/__init__.py", line 2547, in ndarray
    result, = engine.ndarrays(self)
  File "/.../biggus/biggus/__init__.py", line 469, in ndarrays
    return self._evaluate(arrays, False)
  File "/.../biggus/biggus/__init__.py", line 460, in _evaluate
    ndarrays = group.evaluate(masked)
  File "/.../biggus/biggus/__init__.py", line 446, in evaluate
    raise Exception('error during evaluation')
Exception: error during evaluation

If you make the shape of the arrays smaller or unify the dtypes of the two arrays then this error does not occur.

I've taken a little look into what might be causing this but have tied my brain in knots trying to follow the flow of execution that biggus follows to get to this point. So, instead of sitting on this for ages and trying to find a solution I figured it would be beneficial to raise this as an issue and also keep working on it.

Defer aggregations

Change the behaviour of aggregation operations to return Array results.

So to get a NumPy array for the mean, instead of:

mean_ndarray = biggus.mean(array, axis=0)

one would do:

mean_array = biggus.mean(array, axis=0)
mean_ndarray = mean_array.ndarray()

Biggus 0.6 is available on github, but only 0.5 is available on pypi

I don't know if this is the right place to report this, but the IRIS requirements want biggus >= 0.6

http://scitools.org.uk/iris/docs/latest/installing.html

and only 0.5 is available on pypi...

https://pypi.python.org/pypi/Biggus/

zeros() and ones()

Add the analogues of numpy.zeros() and numpy.ones().

Error when slicing aggregations?

Looks like Aggregation.__getitem__() might be making an invalid assumption about which axis is involved in the aggregation.

Proportion aggregator

Add a proportion aggregation function that supports the corresponding capability in Iris.

biggus.tests.test_save.TestWritePattern.test_large takes forever with NumPy 1.9

In the process of testing #102, I set up conda with Python 3.4 and whatever version of NumPy was installed by default (since the requirements file didn't specify a version), which turned out to be 1.9.

The problem arose with the above mentioned test, which never completed. I left it for almost an hour with no success. It uses a lot of CPU, but barely any memory.

Masked arrays not preserved in biggus math operations

The biggus mathematical operations (biggus.power, biggus.log10 etc) do not preserve masked arrays through the operation.

For example:

import biggus
import numpy as np
a = np.arange(12).reshape(3, 4)
m = a % 3 == 0
am = np.ma.masked_array(a, m)

print am
[[-- 1 2 --]
 [4 5 -- 7]
 [8 -- 10 11]]
bm = biggus.NumpyArrayAdapter(am)
cm = biggus.power(bm, 2)
# We'd expect this to not return the masked array. print cm.ndarray()
[[999998000001            1            4 999998000001]
 [          16           25 999998000001           49]
 [          64 999998000001          100          121]]

# But we would expect this to...
print cm.masked_array()
[[999998000001            1            4 999998000001]
 [          16           25 999998000001           49]
 [          64 999998000001          100          121]]

Fix incoming.

User-defined per-slice operations

Provide a relatively simple way for end-users to define their own transformations which need to operate on one or more complete dimensions.

For example, a filtering or regridding algorithm which operates on a 2D slice.

Document somewhere that the order of the operations matter

This took almost 2 hours of my afternoon yesterday:

import numpy as np
import biggus
arr = biggus.NumpyArrayAdapter(np.array([1, 2, 3.]))

1 + arr

TypeError: unsupported operand type(s) for +: 'int' and 'NumpyArrayAdapter'

But biggus is OK with this:

arr + 1
<_Elementwise shape=(3) dtype=dtype('float64')>

(arr + 1).ndarray()
array([ 2.,  3.,  4.])

Numpy array input validation

Relating to #77 and #81 it would be good to handle numpy arrays as inputs in a better way than currently exists.

The key example:

>>> meaned = biggus.mean(np.arange(12), axis=0) 
>>> meaned.ndarray()
Exception in thread <biggus.ProducerNode object at 0x35db190>:
Traceback (most recent call last):
  File "lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "biggus/__init__.py", line 204, in run
    data = self.array[key].ndarray()
AttributeError: 'numpy.ndarray' object has no attribute 'ndarray'

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "biggus/__init__.py", line 1536, in ndarray
    result, = engine.ndarrays(self)
  File "biggus/__init__.py", line 451, in ndarrays
    return self._evaluate(arrays, False)
  File "biggus/__init__.py", line 442, in _evaluate
    ndarrays = group.evaluate(masked)
  File "biggus/__init__.py", line 428, in evaluate
    raise Exception('error during evaluation')
Exception: error during evaluation

Of course, this is fixed by wrapping the array in a NumpyArrayAdapter, a numpy array should be supported directly or better input validation should take place.

Linear regression

As in SciPy this would need to return multiple values. For example:

slope, intercept, r_value, p_value, std_err = biggus.linregress(x, y)
slope, intercept = biggus.ndarrays(slope, intercept)

The potentially tricky bit is ensuring the lazy evaluation is shared.

BLD: Proposed new tag release

I propose that a new biggus tag release be made, allowing those packages which depend on recent API changes of biggus to be reference by its tag release.

In particular, I'm pushing for this as a result of the following commit (89ced47).

Released v0.9.0 breaks Iris tests

Iris testing on Travis CI appears to be broken after v0.9.0 was packaged:

======================================================================
ERROR: test_deferred_bytes (iris.tests.unit.fileformats.pp.test__create_field_data.Test__create_field_data)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/Iris-1.8.0.dev0-py2.7-linux-x86_64.egg/iris/tests/unit/fileformats/pp/test__create_field_data.py", line 74, in test_deferred_bytes
    self.assertEqual(field._data.shape, data_shape)
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/Biggus-0.9.0-py2.7.egg/biggus/__init__.py", line 1018, in shape
    return _sliced_shape(self.concrete.shape, self._keys)
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/Biggus-0.9.0-py2.7.egg/biggus/__init__.py", line 2583, in _sliced_shape
    size = len(range(*key.indices(shape[shape_dim])))
TypeError: '_SentinelObject' object cannot be interpreted as an index
======================================================================
ERROR: test_real_data_cube_indexing (iris.tests.test_cdm.TestDataManagerIndexing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/Iris-1.8.0.dev0-py2.7-linux-x86_64.egg/iris/tests/test_cdm.py", line 972, in test_real_data_cube_indexing
    cube = self.cube[(0, 4, 5, 2), 0, 0]
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/Iris-1.8.0.dev0-py2.7-linux-x86_64.egg/iris/cube.py", line 1929, in __getitem__
    data = data[other_slice]
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/Biggus-0.9.0-py2.7.egg/biggus/__init__.py", line 525, in __getitem__
    keys = _full_keys(keys, self.ndim)
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/Biggus-0.9.0-py2.7.egg/biggus/__init__.py", line 2622, in _full_keys
    if n_keys_non_newaxis - 1 >= ndim and Ellipsis in keys:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Fancy-indexing vs column-indexing wrapper classes

Provide separate wrapper classes for data sources which use fancy indexing (e.g. NumPy) vs. column indexing (e.g. netCDF4).

Error when trying to save the result of biggus.mean that does not fit into memory.

Hi I was trying to calculate the mean seasonal value of modis data using biggus.

I've assembled tiles using linearmosaic for each day and then created stackarray from the list of linear mosaics

arr_stack = biggus.ArrayStack(np.asarray([get_data_for_date(d, path=path) for d in dates]))
the_mean = biggus.NumpyArrayAdapter(biggus.mean(arr_stack, axis=0))


folder_path = "/home/huziy/DATA/seasonal_modis_snow_albedo"
if not os.path.isdir(folder_path):
    os.makedirs(folder_path)

ds = Dataset(os.path.join(folder_path, "{}.nc".format(season_name)), mode="w")
ds.createDimension("y", the_mean.shape[0])
ds.createDimension("x", the_mean.shape[1])
var_nc = ds.createVariable("I6", the_mean.dtype, ("y", "x"))
biggus.save([the_mean, ], [var_nc, ])
ds.close()

But I get the exception at the line before the last.

    biggus.save([the_mean, ], [var_nc, ])
  File "/home/huziy/b2_fs2/virtual_envs/py2.7-default/lib/python2.7/site-packages/biggus/__init__.py", line 1063, in save
    target[keys] = array[keys].ndarray()
  File "netCDF4.pyx", line 3227, in netCDF4.Variable.__setitem__ (netCDF4.c:40235)
AttributeError: '_Aggregation' object has no attribute 'size'

Do you know how to fix this?

Thanks

ArrayAdapter equality

Shouldn't the following two ArrayAdapter instances be equal? My expectation was that they should be.

>>> import biggus
>>> import numpy as np
>>> array = np.arange(12).reshape(3, 4)
>>> print array
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
>>> a = biggus.ArrayAdapter(array, ((1, 2), (1, 2)))
>>> a
<ArrayAdapter shape=(2, 2) dtype=dtype('int64')>
>>> a.ndarray()
array([[ 5,  6],
       [ 9, 10]])
>>> b = biggus.ArrayAdapter(np.array([5, 6, 9, 10]).reshape(2, 2))
>>> b
<ArrayAdapter shape=(2, 2) dtype=dtype('int64')>
>>> b.ndarray()
array([[ 5,  6],
       [ 9, 10]])
>>> a == b
False

scitools / biggus Goto Github PK

biggus's People

Contributors

Stargazers

Watchers

Forkers

biggus's Issues

Recommend Projects

Recommend Topics

Recommend Org