Git Product home page Git Product logo

hdf5_manipulator's Introduction

HDF5 MANIPULATOR

Simple manipulation on hdf5 files.

*Note: for files too big to fit in memory, use _big.py

Split

Split hdf5 file (requires the same no. of entries per dataset):

usage: ./split.py <options>

HDF5 MANIPULATOR (split)

optional arguments:
  -h, --help            show this help message and exit
  --prefix [path/to/filename_base]
                        prefix for splitted files (base on input file if not
                        defined)
  --filelist [path/to/filelist]
                        save output files list in txt file

required arguments:
  --input [path/to/input_file]
                        path to input hdf5 file
  --size [int]          number of entries per file
  • Example:

    ./split.py --input /path/to/my/data/data.hdf5 --size 100

    will create /path/to/my/data/data_XXX.hdf5 files, each with 100 entries (the last one may have less no. of entries)

Merge

Merge hdf5 files (requires the same datasets, with the same shapes, in all input files):

usage: ./merge.py <options>

HDF5 MANIPULATOR (merge)

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  --input [list of input files]
                        path to input hdf5 files to merge ('file1, file2,...'
                        will look for all files starts with file1 and file2
                        and ends with .hdf5)
  --output [path/to/filename]
                        path to output hdf5 file
  • Example:

    ./merge.py --input '/path1/basename1, /path2/basename2' --output merged.hdf5

    will merge all files matching /path1/basename1* and /path2/basename2* into merged.hdf5 file

Extract

Extract chosen datasets from hdf5 file (requires the same no. of entries per dataset):

usage: ./extract.py <options>

HDF5 MANIPULATOR (extract)

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  --input [path/to/filename]
                        path to input hdf5 file
  --output [path/to/filename]
                        path to output hdf5 file
  --keys ['key1, key2, ...']
                        list of datasets to be saved in the output file
  • Example:

    ./extract.py --input /path/to/input.hdf5 --output /path/to/output.hdf5 --keys 'dataset1, dataset2'

    will extract dataset1 and dataset2 from input.hdf5 and save in output.hdf5

Combine

Save different datasets from different files into one output hdf5 (requires the same no. of entries per dataset within the file and one common key use for ordering):

usage: ./combine.py <options>

HDF5 MANIPULATOR (combine)

optional arguments:
  -h, --help            show this help message and exit
  --keys1 ['key1, key2, ...']
                        list of datasets to be extracted from the first input
                        file (use all if not defined)
  --keys2 ['key1, key2, ...']
                        list of datasets to be extracted from the second input
                        file (use all if not defined)

required arguments:
  --input1 [path/to/filename1]
                        path to first input hdf5 file
  --input2 [path/to/filename2]
                        path to second input hdf5 file
  --output [path/to/filename]
                        path to output hdf5 file
  --match [key]         the common key use to order data
  • Example 1:

    ./combine.py --input1 /path/to/file1 --input2 /path/to/file2 --output /path/to/output --match id

    requires both input files have id key, and no other common keys; will create a file which contains all datasets from input files for all entries with matching ids

  • Example 2:

    ./combine.py --input1 /path/to/file1 --input2 /path/to/file2 --output /path/to/output --match id --keys1 'data1' --keys2 'data2, data3'

    will create a file which contains data1 from file1, data2 and data3 from file2 (for all entries with matching idss)

Test: create_hdf5.py

Create several hdf5 files filled with random numbers, matrices etc.

Test: diff.py

Check if two hdf5 files are exactly the same.

Test: diff_big.py

Check if two hdf5 files are exactly the same. If single dataset is too big to fit into memory it can perform partial check [default] or full check.

  • Example 1:

    ./diff_big.py file1 file2

    if some dataset is too big, it will check first 100 entries, last 100 entries, and random 100 entries.

  • Example 2:

    ./diff_big.py file1 file2 fullcheck

    if some dataset is too big, it will check dataset entry by entry (takes a lot of time).

hdf5_manipulator's People

Contributors

gnperdue avatar tomaszgolan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hdf5_manipulator's Issues

AttributeError: 'ellipsis' object has no attribute 'encode'

I keep running into this error, both on Python 2 & 3...
Is there anything wrong with how I execute the function(s)?

Traceback (most recent call last):
File "/hdf5_manipulator-master/merge.py", line 77, in
data[f] = hdf5.load(f)
File "/hdf5_manipulator-master/hdf5.py", line 22, in load
data[key] = f[key][...]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/hdf5_manipulator-master/venv/lib/python2.7/site-packages/h5py/_hl/group.py", line 264, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "/hdf5_manipulator-master/venv/lib/python2.7/site-packages/h5py/_hl/base.py", line 137, in _e
name = name.encode('ascii')
AttributeError: 'ellipsis' object has no attribute 'encode'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.