Original author: [email protected] (October 07, 2008 22:21:04)
Requirements for C-style ordering are erratically enforced, thanks to
idiotic misuse of bitwise operators in the validation routines. This can
lead to data corruption or inappropriate failure when attempting to
store/read an array from/into non-contiguous memory or Fortran order. What
needs to happen:
- Fix low-level array flags validation and add separate unit tests just for
these validation routines
- Remove requirement that an array owns its data, as long as obj->data
pointer is valid and data is C-contiguous
- Standardize exception for incorrect array flags; should be TypeError
everywhere
Additional required behavior:
- The low-level modules should continue to raise an exception for an
illegal input array
- The h5py.highlevel routines should be modified to transparently coerce an
illegal array into flat C-order
- Data read back from HDF5 using h5py.highlevel routines will continue to
always be provided in plain C-contiguous format
Reported behavior (thanks Z. Pincus):
- Fortran-contiguous arrays can be stored as a dataset; however they
come out a bit garbled:
In : f = h5py.File('test.dat', 'w')
In : a = numpy.array([[1,2,3],[4,5,6]], order='F')
In : a
Out:
array([[1, 2, 3],
[4, 5, 6]])
In : a.strides
Out: (4, 8)
In : f['a'] = a
In : f['a'].value
Out:
array([[1, 4, 2],
[5, 3, 6]])
In : f['a'].value.strides
Out: (12, 4)
In : fixed = numpy.asarray(f['a'].value)
In : fixed.strides = (4,8)
In : fixed
Out:
array([[1, 2, 3],
[4, 5, 6]])
- Non-contiguous arrays that do own their own data can be "stored"
but of course everything breaks:
In : b = numpy.arange(2_3_4, dtype=numpy.uint8)
In : b
Out:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16,
17, 18, 19, 20, 21, 22, 23], dtype=uint8)
In : b.shape=(2,3,4)
In : b.strides=(0,1,1)
In : b
Out:
array([[[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5]],
[[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5]]], dtype=uint8)
In : b.flags
Out:
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In : f['b'] = b
In : f['b'].value
Out:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]], dtype=uint8)
- C-contiguous arrays that don't own their own data can be stored
just fine:
In : b = numpy.arange(2_3_4, dtype=numpy.uint8)
In : c = numpy.ndarray(buffer=b, offset=2, shape=(2,4),
dtype=numpy.uint8)
In : c
Out:
array([[2, 3, 4, 5],
[6, 7, 8, 9]], dtype=uint8)
In : c.flags
Out:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In : f['c'] = c
In : f['c'].value
Out:
array([[2, 3, 4, 5],
[6, 7, 8, 9]], dtype=uint8)
- Fortran-contiguous arrays that do not own their own data cannot be
stored (as expected):
In : b = numpy.arange(2_3_4, dtype=numpy.uint8)
In : d = numpy.ndarray(buffer=b, offset=2, shape=(2,4), strides=(1,
2), dtype=numpy.uint8)
In : d
Out:
array([[2, 4, 6, 8],
[3, 5, 7, 9]], dtype=uint8)
In : d.flags
Out:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In : f['d'] = d
ValueError: Array must be C-contiguous and own its data.
- Non-contiguous arrays that do not own their data cannot be stored
(as expected):
In : b = numpy.arange(2_3_4, dtype=numpy.uint8)
In : e = numpy.ndarray(buffer=b, offset=2, shape=(2,4), strides=(-1,
1), dtype=numpy.uint8)
In : e
Out:
array([[2, 3, 4, 5],
[1, 2, 3, 4]], dtype=uint8)
In : e.flags
Out:
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In : f['e'] = e
ValueError: Array must be C-contiguous and own its data.
Original issue: http://code.google.com/p/h5py/issues/detail?id=1