Git Product home page Git Product logo

h5py_wrapper's People

Contributors

jakobj avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

h5py_wrapper's Issues

Enable storage of Python's type long

Since Python's long integer type is not a native numpy type it is converted to an object and thus has no native HDF5 equivalent.

I discussed this with @jakobj: One workaround would be to store it as a string and store the type information and recreate the long type when loading from h5. @mschmidt87, what do you think?

unicode strings cannot be stored

Storing a list of unicode string leads to the following h5py error:
TypeError: No conversion path for dtype: dtype('<U1').

I haven't understood the background completely. If there is no workaround in h5py, we should at least build in some error catching / conversion code. This error can be reproduced with the additional test added in my personal branch unicode_strings.

Specification of path within hdf5 file inconsistent between load and save function

Both while loading from and saving to a file, the user can specify paths within the hdf5 file. They are specified with keyword arguments that are called differently in the load and save function:

save(filename, d, write_mode='a', overwrite_dataset=False, resize=False, dict_label='', compression=None)

load(filename, path='', lazy=False)

So, we have dict_path in the save function and path in the load function. In my opinion, we should name the arguments consistently across the two function and opt for calling it path.

This will of course break user code so we should only include this in the next major release.

Files created with Python2 cannot be read using Python3

I discovered a problem with compatibility of files beween Python 2 and 3:

If a file has been created with Python 2, it cannot be opened with Python 3.
The reason is that key_type and value_type are read in as byte-strings by Python 3 so that the wrapper complains that e.g. it does not support data type b'int' or cannot correctly read a string key.

Here is minimal reproducer:
Execute with Python 2:

import h5py_wrapper as h5w
d = {'a': 1}
h5w.save('h5w_bug.h5', d, 'w')`

Execute with Python 3:

import h5py_wrapper as h5w
d = h5w.load('h5w_bug.h5')

Explicitely store type of value

Currently, we do not explicitely store the type of the value of the dataset.
This restricts functionality, for instance, it is not possible to distinguish between a NoneType and the string 'None', because we store a None as the string 'None'.

Furthermore, implementing this enables us to retrieve lists as lists again instead of Numpy arrays.

Improve warning in case of deprecated file

If the wrapper fails to load a file because of a KeyError, we display a message informing the user that this probably occured because the file has been created by an old release version.

This only works if the user loads the entire file. If the user specifies a path to a deeper substructure of the stored file, by using load('foo.h5', path='path/to/substructure') or load('foo.h5', 'path/to/substructure'), the KeyError is caught by the load function and the message is not displayed.

We should improve the error handling so that the message is also displayed in this case.

replace assert with unittest.assert...

the tests use the regular assert. this should be replace with the asserts from the unittest module to provide more information in the case of failed tests.

Implement testsuite with unittest

Currently, the tests are just a collection of functions which are executed.
It would be better to design it as a proper testsuite using the unittest library.

supported data types not consistent?

The following code runs (I'm able to save and load):

import h5py_wrapper as h5w
import numpy as np

d = { 'np_int64': [ np.int64(10) ] }

h5w.save('file.name', d, write_mode='w')
h5w.load('file.name')

However, if I don't put the np.int64(10) value in a list, so d = { 'np_int64': np.int64(10) }, I get an error when trying to load the file:

NotImplementedError: Unsupported data type: int64.

I think it's because the list of supported data types is not complete (valuetype_dict), although this shouldn't represent a problem since it's possible to save the values..

include more complex tests

as far as i can see, our test dictionary is quite simple. it might be a good idea to include more complex examples (e.g., lists of dicts, sets, tuples)

Allow storage of complex numbers

Currently, complex numbers are not supported by the wrapper. Since this is certainly a common use case, we should remedy this.

Fix Python 2.7 tests

The testsuite currently fails for the python 2.7 tests. Seems to be a problem with the conversion script

Benchmark wrapper in comparison with release 0.0.1 and work on speed

I did a quick speed test of the current master (25834fb) with release version 0.0.1 and discovered a significant slow down of the loading routine.

We should do some proper benchmarking, investigate bottlenecks that we apparently introduced and think about solutions. Either we can improve the speed of the implementation in general or we could provide options to the user to e.g. circumvent the time-consuming type-checking in the loading routine.

Better API function names

In my opinion the current function names of the API are not very well chosen. add_to_h5 and load_h5 are not particularly intuitive. How about save and load like in numpy?
This changes the user interface, so it should be well discussed and we would need to add deprecation the current functions for a while before removing the completely.

Add information about supported python version to setup.py

The setup.py script is currently lacking information about the supported python versions:
I copy our discussion from a pull-request of python-dicthash:

When you check numpy on pypi (https://pypi.python.org/pypi/numpy/1.13.0rc1) you see that they list the Python versions that this packages is compatible with explicitly:

Requires Python: >=2.7,!=3.0.,!=3.1.,!=3.2.,!=3.3.

Maybe we need to add something similar? Alternatively we could just create a new release with Python3 support?

Enable storage of nested tuples with complex shape

Currently, it is not possible to store tuples with complex shape, such as
((1, 2), (3, 4, 5)) because h5py converts this into a numpy array with dtype=Object.

For numpy arrays, we handle this case (for 2D arrays) but to apply this to tuples (and sets) as well, we first need to fix #11 .

Update testing on travis from Python 3.4 to 3.5

With #59 , the wrapper supports Python 3. Although it actually supports Python 3.5, we only test for Python 3.4 for now because conda reports a package conflict for Python 3.5 and h5py.

We should keep an eye on this and remedy this once the problem is solved on the conda level.

Make 'quantities' optional

Currently, the wrapper requires every user to have the quantities package installed.
We should check for the availability of the package and adapt functionality if it's not found since the wrapper itself does not require it.

Add function to load file without actual data

A function, which only loads the skeleton of a data file would be very useful to be able to investigate the structure of a dataset/file without taking the effort of loading the actual data.

Numpy strings cannot be loaded

Hi, I just came across the following issue:

If you have numpy strings as keys of a dictionary and you save this dictionary, the h5py_wrapper raises the error

ValueError: malformed node or string: 
    <ast.Name object at 0x7fcfa1870970>

when trying to load the file. This can be reproduced using the following test (using pytest):

import pytest
import numpy as np
import h5py_wrapper as h5

from unittest import TestCase


def test_saving_and_consecutive_loading_of_numpy_string_keys(tmpdir):
    file = 'test.h5'
    keys = ['a', 'b', 'c']
    
    # this makes the keys numpy strings
    keys = np.atleast_1d(keys)
    output = {key: i for i, key in enumerate(keys)}

    # create temporary directory to save test file into
    tmp_test = tmpdir.mkdir('tmp_test')
    with tmp_test.as_cwd():
        h5.save(file, output)
        input = h5.load(file)
        TestCase().assertDictEqual(output, input)

The problem seems to be line 296 in wrapper.py where key_type is compared to ['str', 'unicode', 'string_'], but a numpy string leads to key_type = 'str_'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.