Git Product home page Git Product logo

hdf5.jl's People

Contributors

abhijithch avatar adambrewster avatar bjarthur avatar carlobaldassi avatar david-macmahon avatar eschnett avatar ggggggggg avatar github-actions[bot] avatar jeffbezanson avatar jipolanco avatar jmert avatar keno avatar kleinhenz avatar mbauman avatar mkitti avatar musm avatar nhz2 avatar nicoleepp avatar petercolberg avatar ranocha avatar rened avatar simonbyrne avatar simonster avatar staticfloat avatar stevengj avatar t-bltg avatar timholy avatar tkelman avatar tknopp avatar yuyichao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdf5.jl's Issues

Empty strings

Currently, attempting to write an empty string fails because it's illegal to call H5Tset_size with size == 0. AFAICT, we can store an empty string as a null-terminated string consisting only of a null character or using the null dataspace. If we use the null dataspace, then we serialize "" the same way we presently serialize ASCIIString[], although we could use the julia type attribute to distinguish the two in JLD. Thoughts?

Segfaults and Failed Loads With Octave HDF5 files

This might be another 32 bit issue that I have run into, but I simply cannot get this library to read in some of the serialized files I have. I would have stuck with the MAT file support in the affiliated library, but I appear to have one of the unsupported file versions. (assuming I'm using the lib right).

At this point I can manage to get some segfaults which usually appear in the GC stage.
Looking at some gdb asm output, it looked like it was trying to traverse a NULL pointer, so perhaps some important data structures are getting smashed.
The closest I have gotten to identifying the dangerous call (all generated via read(file["var"]["value"])) is the following valgrind trace.
I have still not been able to decipher the responsible julia level read() as I'm still quite a newbie in the julia environment.


==10261== Syscall param read(buf) points to unaddressable byte(s)
==10261== at 0x539030E: __read_nocancel (in /lib/libpthread-2.15.so)
==10261== by 0x7DAC00A: ??? (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7DA3D16: H5FD_read (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7D90395: H5F_accum_read (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7D94680: H5F_block_read (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7D65ECF: ??? (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7F2E94F: H5V_opvv (in /usr/lib/libhdf5.so.7.0.4)
==10261== Address 0x7646be0 is 0 bytes after a block of size 8,208 alloc'd
==10261== at 0x402928A: memalign (vg_replace_malloc.c:694)
==10261== by 0x40292D8: posix_memalign (vg_replace_malloc.c:835)
==10261== by 0x415C73A: allocobj (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x41515CE: jl_alloc_array_1d (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x5BEB9AB: ???
==10261== by 0x4115948: jl_apply_generic (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x5BE2D2C: ???
==10261== by 0x5BE0844: ???
==10261== by 0x4115948: jl_apply_generic (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x5BF2FA4: ???
==10261== by 0x4115948: jl_apply_generic (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x5BEA2B3: ???


Other noted issue:
h5g_get_objname_by_idx uses C_int as the type of its second argument, which works on 64 bit, but fails on 32, as its real type is defined to be unsigned long long in the headers. I have not spotted any functions with the same issue, but is is possible that they are the source of the issues.

Example file:
http://fundamental-code.com/tmp/octave.hdf5
(contains [1; 2; 3; 4] in what appears to be "/foobar/value")
reading this data results in a 0x1 array

Relevant versions:
julia - git updated about a day ago
HDF5 - git current
libhdf5 - 1.8.10-patch1
octave - 3.6.3

Store Julia objects as compound types

At the moment, we can write, but not read immutables from JLD. While it would be pretty trivial to copy the code for creating new immutables from serialize.jl, I wonder if we can use compound types instead. It seems like there would be massive performance and disk space advantages to storing arrays of immutables contiguously on disk as opposed to using HDF5 references for each field, even if the on-disk representation isn't necessarily the same as the in-memory representation because of padding.

JLD @save macro fails with local functions

As long as there is even a single local function defined, @save fails

julia> using HDF5, JLD

julia> x = 10
10

julia> @save "test.jld" 

julia> y(m) = m+1
y (generic function with 1 method)

julia> @save "test.jld" 
ERROR: This is the write function for CompositeKind, but the input is of type Ptr{None}
 in write_composite at /home/marcusps/.julia/HDF5/src/jld.jl:613
 in write at /home/marcusps/.julia/HDF5/src/jld.jl:608
 in write at /home/marcusps/.julia/HDF5/src/jld.jl:546
 in write_composite at /home/marcusps/.julia/HDF5/src/jld.jl:651
 in anonymous at no file

Enormous file sizes

Hi,

I have text files of generally 100k that have data such as:

NA NA NA
NA NA NA
-4.11554869953487 NA NA
-4.49517142619306 NA NA
-4.62434879575859 NA NA
-4.85365577849306 NA NA
-4.83319566688069 NA NA
-4.62021998272287 NA NA
-4.38650861894108 NA NA
-4.33796653562191 NA NA
...

using the code

using HDF5
using JLD

x = readdlm("1.dat", ' ')

file = jldopen("teste.jld", "w")
@write file x
close(file)

to write a 117K file of that form to teste.jld generates a 13Mb file... Even if compression is not being used, I don't understand the size difference. I have to process 270 files of this kind, which ended up generating a 3.5Gb file.

Am I doing something wrong? If it helps, I can e-mail a sample file for testing.

I'm on OS X using julia master/9c392b7*, HDF5 f27612 and hdf5 installed from homebrew:

Cassios-iMac:~ cassio$ brew info hdf5
hdf5: stable 1.8.11
http://www.hdfgroup.org/HDF5
/usr/local/Cellar/hdf5/1.8.11 (119 files, 9.8M) *
  Built from source
From: https://github.com/mxcl/homebrew/commits/master/Library/Formula/hdf5.rb
==> Dependencies
Required: szip
==> Options
--enable-cxx
    Compile C++ bindings
--enable-fortran
    Compile Fortran bindings
--enable-fortran2003
    Compile Fortran 2003 bindings. Requires enable-fortran.
--enable-parallel
    Compile parallel bindings
--enable-threadsafe
    Trade performance and C++ or Fortran support for thread safety
--universal
    Build a universal binary

Thanks,
Cássio

Writing "nothing" to new path does not work

Trying to write nothing to a HDF5 dataset that does not exist yet fails with

dset not defined
while loading In[5], in expression starting on line 2
 in write at /Users/rene/.julia/HDF5/src/jld.jl:478
 in write at /Users/rene/.julia/HDF5/src/jld.jl:481

Minimal test case:

using HDF5, JLD
jldopen("/tmp/test","w") do a
    write(a, "/a/b/c", nothing)
end

The problem is that write(parent::Union(JldFile, JldGroup), name::ASCIIString, n::Nothing) in jld.plain tries to call HDF5Dataset directly, and not through d_create as it is being done for other data types.
d_create would ensure that the entire path /a/b/c gets created first.

I tried to come up with a fix for this but got lost...

Just to illustrate, the following code works:

using HDF5, JLD
jldopen("/tmp/test","w") do a
    write(a, "/a/b/dummy", 1)
    write(a, "/a/b/c", nothing)
end

b[:] gives error instead of returning value of b

using HDF5

fid = h5open("test.h5", "w")
b = d_create(fid, "b", Int, ((1000,),(-1,)), "chunk", (100,)) #-1 is equivalent to typemax(Hsize) as far as I can tell
b[:] # ERROR: no method endof(HDF5Dataset{PlainHDF5File},)

Problems with DataFrames

I'm trying to use the HDF5 package to write an array of DataFrames, but I'm having some problems.

Trying to run the jld_dataframe.jl test gives me the following error on OS X 10.8.4 with julia master/9c392b7*, HDF5 f27612c88 and DataFrames 859f3272.

Cassios-iMac:test cassio$ julia jld_dataframe.jl
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5A.c line 557 in H5Aopen(): unable to load attribute info from object header for attribute: 'TypeParameters'
    major: Attribute
    minor: Unable to initialize object
  #001: H5Oattribute.c line 537 in H5O_attr_open_by_name(): can't locate attribute: 'TypeParameters'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5A.c line 1400 in H5Aget_name(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
ERROR: Error getting attribute name
 in h5a_get_name at /Users/cassio/.julia/HDF5/src/plain.jl:1743
 in h5a_get_name at /Users/cassio/.julia/HDF5/src/plain.jl:1793
 in h5a_open at /Users/cassio/.julia/HDF5/src/plain.jl:1743
 in a_read at /Users/cassio/.julia/HDF5/src/plain.jl:948
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:320
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:173
 in read at /Users/cassio/.julia/HDF5/src/plain.jl:960
 in include_from_node1 at loading.jl:92
 in process_options at client.jl:274
 in _start at client.jl:352
at /Users/cassio/.julia/HDF5/test/jld_dataframe.jl:16

Excessive warnings about error in static compilation

Recently (maybe because of the merge of the static compile branch in julia), many functions are outputting this warning:
"warning: literal address used in ccall for (null); code cannot be statically compiled"
Calling @save outputs this warning ~30 times. Any way we can fix and/or suppress this warning to avoid the warning spam?

ENH: Improve interface for working with groups in jld files

Just a quick idea...

We can currently organize things into groups within a .jld file, but only using the lower-level HDF-like interface.

I think it would be great to extend the @load and @save macros to allow for .jld files to be organized into groups. I would think that the interface could look something like this:

@load "file_name" "group_name"  # loads all items in that group
@load "file_name" "group_name" a b c  # loads a b and c from the group

@save "file_name" "group_name"  # save all items from current module in  group
@save "file_name" "group_name"  a b c  # loads a b and c from the group

Sorry, but the comments below are a bit stream of consciousness

I just thought maybe it would make more sense to have @load_group and @save_group macros. However, multiple dispatch could make the above ideas work because we could dispatch on two strings followed by optional symbols.

Another thought that maybe when we do this automatic creation of groups we would have /group_name/_refs and /group_name/_types also.


The reason I want this feature is that I really like how easy it is to @load and @save many variables into/out of modules with a quick one liner. I have projects where I have different values of underlying parameters that are used to generate different datasets. I am currently getting this easy functionality by creating different jld files for each parameterization, but I would prefer to do this in a single file with groups. Maybe I'm crazy... feedback on the idea as well as implementation would be great.

JLD + parallelization

I noticed the following bug (?). Essentially, variables loaded with the load macro can not be used by worker processes. By chance, I noticed that this could be fixed by copying to a variable of a different name.

$ cat bug.jl
using HDF5,JLD

function setup()
    x = 10
    @save "bug.jld" x
end

function doesnt_work()
    @load "bug.jld" x
    @spawn println(x)
end

function works()
    @load "bug.jld" x
    y = copy(x)
    @spawn println(y)
end
julia> include("bug.jl");

julia> setup();

julia> doesnt_work();
exception on 2: ERROR: x not defined
 in anonymous at multi.jl:1278
 in anonymous at multi.jl:827
 in run_work_thunk at multi.jl:575
 in run_work_thunk at multi.jl:584
 in anonymous at task.jl:88

julia> works();
From worker 2:  10

Windows Binary Downloads

Building HDF5..jl on windows machines fails currently since the binaries have disappeared from archive.org. Is there an alternative location for the binaries? Is there a backup? I believe @ihnorton has been replacing the downloads on to s3 if we can find a backup.

Implicit delete on write?

Hi,

would it be ok to make write implicitly delete/overwrite the dataset when it is already present? The following snippet yields a name already exists error when commenting out the o_delete:

h5open("test.jld", "w") do file
  write(file, "a", 1)
  o_delete(file, "a")
  write(file, "a", 2)
  dump(file)
end

This would make the behavior more closely mimic the file-system like character of HDF5 as well as make file["a"] = 2 behave more like a Dict, which would be great!

I can prepare a PR for this, just wanted to check first whether that would be ok.

Matlab-like h5read/h5write?

Would be nice to have Matlab-like h5read and h5write commands (in module HDF5), to read/write a single numeric-array dataset in an HDF5 file, analgous to dlmread/dlmwrite, without having to go to the trouble of h5open etcetera.

Cannot save multiple Julia objects with the same name in different groups

julia> using HDF5, JLD

julia> f = jldopen("test.jld", "w");

julia> g = g_create(f, "group1");

julia> write(g, "x", {1})

julia> close(g)

julia> g = g_create(f, "group2");

julia> write(g, "x", {2})
HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 139760863500096:
  #000: ../../../src/H5G.c line 303 in H5Gcreate2(): unable to create group
    major: Symbol table
    minor: Unable to initialize object
  #001: ../../../src/H5Gint.c line 194 in H5G__create_named(): unable to create and link to group
    major: Symbol table
    minor: Unable to initialize object
  #002: ../../../src/H5L.c line 1638 in H5L_link_object(): unable to create new link to object
    major: Links
    minor: Unable to initialize object
  #003: ../../../src/H5L.c line 1882 in H5L_create_real(): can't insert link
    major: Symbol table
    minor: Unable to insert object
  #004: ../../../src/H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #005: ../../../src/H5Gtraverse.c line 755 in H5G_traverse_real(): component not found
    major: Symbol table
    minor: Object not found
ERROR: Error creating group ///_refs/group2/x
 in h5g_create at /home/simon/.julia/HDF5/src/plain.jl:1715
 in write at /home/simon/.julia/HDF5/src/jld.jl:466
 in write at /home/simon/.julia/HDF5/src/jld.jl:507

Problem building HDF5 on Mac OS

On Mac OS 10.7, when I run Pkg.build("HDF5") I get:

julia> Pkg.build("HDF5")
INFO: Building HDF5
==> Installing hdf5 dependency: szip
==> Downloading http://archive.org/download/julialang/bottles/szip-2.1.lion.bottle.tar.gz
==> Pouring szip-2.1.lion.bottle.tar.gz
?  /Users/bjohnson/.julia/Homebrew/deps/usr/Cellar/szip/2.1: 9 files, 136K
==> Installing hdf5
==> Downloading http://archive.org/download/julialang/bottles/hdf5-1.8.11.lion.bottle.tar.gz
==> Pouring hdf5-1.8.11.lion.bottle.tar.gz
?  /Users/bjohnson/.julia/Homebrew/deps/usr/Cellar/hdf5/1.8.11: 119 files, 9.9M
================[ ERROR: HDF5 ]===================

Provider PackageManager failed to satisfy dependency hdf5
at /Users/bjohnson/.julia/HDF5/deps/build.jl:33

==================================================

repeatedly writing simple hdf5 file works a few times, then fails

using HDF5

function write_macro(max_dim)
    x = rand(max_dim)
    fid = h5open("test.h5","w")
    @write fid x
    close(fid)
end
function write_simple(max_dim)
    fid = h5open("test.h5","w")
    d = d_create(fid, "b", datatype(Float64), dataspace((max_dim,)))
    d[1:max_dim]=rand(max_dim)
    close(fid)
end


for j = 1:10
    print("write macro $j: ")
    @time(write_macro(int(10^6)))
end
for j = 1:10
    print("write simple $j: ")
    @time(write_simple(int(10^6)))
end

gives output

write macro 1: elapsed time: 0.32677965 seconds (15224664 bytes allocated)
write macro 2: elapsed time: 0.013648108 seconds (8044228 bytes allocated)
write macro 3: elapsed time: 0.034962227 seconds (8006536 bytes allocated)
write macro 4: elapsed time: 0.03506796 seconds (8006536 bytes allocated)
write macro 5: elapsed time: 0.048584303 seconds (8006536 bytes allocated)
write macro 6: elapsed time: 0.044131843 seconds (8006536 bytes allocated)
write macro 7: elapsed time: 0.03688556 seconds (8006536 bytes allocated)
write macro 8: elapsed time: 0.035140289 seconds (8006536 bytes allocated)
write macro 9: elapsed time: 0.06822681 seconds (8006536 bytes allocated)
write macro 10: elapsed time: 0.0788429 seconds (8006536 bytes allocated)
write simple 1: elapsed time: 0.092585827 seconds (9083076 bytes allocated)
write simple 2: elapsed time: 0.041490742 seconds (8012336 bytes allocated)
write simple 3: elapsed time: 0.038722462 seconds (8012336 bytes allocated)
write simple 4: elapsed time: 0.037058108 seconds (8012336 bytes allocated)
write simple 5: HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5D.c line 437 in H5Dget_space(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
ERROR: Error getting dataspace
 in h5d_get_space at /Users/oneilg/.julia/HDF5/src/plain.jl:1758
 in hyperslab at /Users/oneilg/.julia/HDF5/src/plain.jl:1388
 in setindex! at /Users/oneilg/.julia/HDF5/src/plain.jl:1373
 in write_simple at /Users/oneilg/github/mass_prep/hdf5_crash.jl:12
 in anonymous at no file:44
 in include_from_node1 at loading.jl:92
at /Users/oneilg/github/mass_prep/hdf5_crash.jl:24

Why does write_simple fail after working 4 times? Also if I remove write_macro and it's loop from the file, it often fails after working only two times.

Compound type with an array inside

I have a dataset with a compound type and inside this compound type an array. Is there a way to read this compound type into Julia? It seems that Array types are not supported yet, but perhaps it's possible to patch this scenario with the low-level routines?

File handles and garbage collection

I'm not sure if I've stumbled on an HDF5 bug or a julia one. This is similar to JuliaLang/julia#3884

I'm on OS X 10.8.4.

Consider this write code:

using HDF5
using JLD

x = Dict{Int64,Array{Float64}}()

for i in 1:10
    x[i] = rand(1000000)
end

file = jldopen("x.jld", "w")
@write file x
close(file)

and the read code:

module Test

using HDF5
using JLD

for i in 1:10
    file = jldopen("x.jld", "r") ; x = read(file, "x") ; close(file)
end

end

Evaluating the read code on the REPL with include("read.jl"), I can consistently get:

DF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5Dio.c line 140 in H5Dread(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5I.c line 2271 in H5Iget_name(): can't retrieve object location
    major: Object atom
    minor: Can't get value
  #001: H5Gloc.c line 224 in H5G_loc(): invalid data ID
    major: Invalid arguments to routine
    minor: Bad value
ERROR: Error getting object name
 in h5i_get_name at /Users/cassio/.julia/HDF5/src/plain.jl:1743
 in h5i_get_name at /Users/cassio/.julia/HDF5/src/plain.jl:1805
 in h5d_read at /Users/cassio/.julia/HDF5/src/plain.jl:1743
 in h5d_read at /Users/cassio/.julia/HDF5/src/plain.jl:1526
 in read at /Users/cassio/.julia/HDF5/src/plain.jl:994
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:251
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:254
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:173
 in getrefs at /Users/cassio/.julia/HDF5/src/jld.jl:355
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:292
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:173
 in getrefs at /Users/cassio/.julia/HDF5/src/jld.jl:355
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:306
 in read at /Users/cassio/.julia/HDF5/src/jld.jl:173
 in read at /Users/cassio/.julia/HDF5/src/plain.jl:960
 in anonymous at no file:7
 in include_from_node1 at loading.jl:92
at /Users/cassio/Desktop/test/read.jl:6

If I add a call to gc() right before close() the problem disappears...

DataFrames test gives an error when writing to .jld file

I get the following output when when running test/jld_dataframe.jl.

I am on the latest HDF5.jl (0d1d50d) and 02e71d9-Linux-x86_64 (2013-06-21 12:47:45)

julia> write(file, "df2", df2)
ERROR: access to undefined reference
 in write_composite at /home/chris/.julia/HDF5/src/jld.jl:584
 in write at /home/chris/.julia/HDF5/src/jld.jl:545
 in write at /home/chris/.julia/HDF5/src/jld.jl:487
 in write_composite at /home/chris/.julia/HDF5/src/jld.jl:586
 in write at /home/chris/.julia/HDF5/src/jld.jl:545
 in write at /home/chris/.julia/HDF5/src/jld.jl:487
 in write at /home/chris/.julia/HDF5/src/jld.jl:506
 in write at /home/chris/.julia/HDF5/src/jld.jl:487
 in write_composite at /home/chris/.julia/HDF5/src/jld.jl:586
 in write at /home/chris/.julia/HDF5/src/jld.jl:514

JLD Reading Error

Not really sure what's going on here but using recent Julia (today) and Pkg.update() recent HDF5:

julia> include("/home/bana/.julia/HDF5/test/jld.jl")
WARNING: randi(n,...) is deprecated, use rand(1:n,...) instead.
WARNING: strcat is deprecated, use string instead.
ERROR: Error reading x
 in include_from_node1 at loading.jl:76
at /home/bana/.julia/HDF5/test/jld.jl:38

And trying a minimal testcase:

using HDF5
using JLD

fid = jldopen("/tmp/test.jld","w")
A = rand(0:1, 50000)
@write fid A
close(fid)

fidr = jldopen("/tmp/test.jld","r")
dump(fidr)
dump(fidr["A"])
@read fidr A
close(fidr)

results in:

julia> include("/home/bana/GSP/code/julia/h5.jl")
JldFile 
  id: Int32 16777216
  filename: ASCIIString "/tmp/test.jld"
  version: ASCIIString "0.0.0"
  toclose: Bool true
  writeheader: Bool true
HDF5Dataset{JldFile} 
  id: Int32 83886080
  file: JldFile 
    id: Int32 16777216
    filename: ASCIIString "/tmp/test.jld"
    version: ASCIIString "0.0.0"
    toclose: Bool true
    writeheader: Bool true
  toclose: Bool true
ERROR: no method ref(Expr,Int64)
 in julia_type at /home/bana/.julia/HDF5/src/jld.jl:673
 in read at /home/bana/.julia/HDF5/src/jld.jl:165
 in read at /home/bana/.julia/HDF5/src/plain.jl:858
 in include_from_node1 at loading.jl:76
at /home/bana/GSP/code/julia/h5.jl:12

Can't open HDF5 on Windows...

It seems like the search routine is not permissive enough...

In [1]:

using HDF5

WARNING: 

backtraces on your platform are often misleading or partially incorrect
Library not found. See the README for installation instructions.
at C:\Users\inorton\AppData\Roaming\Julia\packages\HDF5\src\plain.jl:39
at C:\Users\inorton\AppData\Roaming\Julia\packages\HDF5\src\HDF5.jl:1
at In[1]:1
 in findlibhdf5 at C:\Users\inorton\AppData\Roaming\Julia\packages\HDF5\src\plain.jl:37


In [2]:

dlopen("hdf5.dll")
Out[2]:
Ptr{Void} @0x000000001e04b010
In [ ]:

Rename hdf5.jl

On case-insensitive file systems, it's a problem that hdf5.jl and HDF5.jl differ only by case. On OS X, I only got the former when I cloned the repository.

HDF5 build failed on Mac OS X 10.8

Here is the message:

julia> Pkg.build("HDF5")
INFO: Building Homebrew
From https://github.com/staticfloat/homebrew
 * branch            kegpkg     -> FETCH_HEAD
HEAD is now at d92b125 Quash warning about Fink/Macports
From https://github.com/staticfloat/homebrew-juliadeps
 * branch            master     -> FETCH_HEAD
HEAD is now at 5b22ea4 Update bottles for glpk 4.52
INFO: Building HDF5
================================[ ERROR: HDF5 ]=================================

None of the selected providers can install dependency hdf5
at /Users/dhlin/.julia/HDF5/deps/build.jl:33

================================================================================

================================[ BUILD ERRORS ]================================

WARNING: HDF5 had build errors.

 - packages with build errors remain installed in /Users/dhlin/.julia
 - build a package and all its dependencies with `Pkg.build(pkg)`
 - build a single package by running its `deps/build.jl` script

================================================================================

For some reason, Homebrew doesn't install hdf5.

Name changing to HDF5.jl

I believe this will reach all who have "starred" this repository...

In preparation for packaging, the name of this repository is changing to HDF5.jl. I believe you can edit your .git/config file (the url line), and you will start tracking it by its new name.

Seg fault on assignment of wrong type

using HDF5

fid = h5open("test.h5", "w")
d = d_create(fid, "foo", datatype(Float64), ((10,20),(100,200)), "chunk", (1,1))
d[1,1]=4 # Segmentation fault: 11

This should fail more gracefully.

UTF8 strings broken

x=[utf8("Jon"), utf8("Tim")]
Pkg.update()
using HDF5
using JLD
@save "test" x
@load "test"
Error reading dataset /x
at In[4]:1
 in h5d_read at /Users/malmaud/.julia/HDF5/src/plain.jl:1751
 in read at /Users/malmaud/.julia/HDF5/src/plain.jl:1031
 in read at /Users/malmaud/.julia/HDF5/src/jld.jl:290
 in read at /Users/malmaud/.julia/HDF5/src/jld.jl:291
 in read at /Users/malmaud/.julia/HDF5/src/jld.jl:205
 in read at /Users/malmaud/.julia/HDF5/src/jld.jl:194
 in anonymous at no file

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5Dio.c line 182 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 438 in H5D__read(): unable to set up type info
    major: Dataset
    minor: Unable to initialize object
  #002: H5Dio.c line 939 in H5D__typeinfo_init(): unable to convert between src and dest datatype
    major: Dataset
    minor: Feature is unsupported
  #003: H5T.c line 4525 in H5T_path_find(): no appropriate function for conversion path
    major: Datatype
    minor: Unable to initialize object

versioninfo()

Julia Version 0.2.0+22
Commit 30fb816* (2013-11-18 10:18 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.0.0)
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

malmaud@malbook ~/tmp> brew install hdf5
Warning: hdf5-1.8.11 already installed

[bug] potential segfault after changing composite type definition

Context at julia-user, including files to reproduce.

To reproduce, run:

include("segf.jl")
tt=DDCM.segf();
tt.tpi[1][1,1] 

With include("old_types.jl") in segf.jl, this returns 0.0.

With include("new_types.jl") in segf.jl, it throws sometimes a segfault, sometimes:

ERROR: no method getindex(SYSTEM: show(lasterr) caused an error
ERROR: no method Enumerate{I}(
 in showerror at repl.jl:111
 in showerror at repl.jl:66
 in anonymous at client.jl:93
 in with_output_color at util.jl:444
 in display_error at client.jl:91SYSTEM: show(lasterr) caused an error
WARNING: it is likely that something important is broken, and Julia will not be able to continue normally

"Certificate verification error"

If I download via browser and place the file manually than it works on the second time.

Joaquim

julia> Pkg.add("HDF5")
INFO: Cloning cache of HDF5 from git://github.com/timholy/HDF5.jl.git
INFO: Installing HDF5 v0.2.14
...
Connecting to ia601003.us.archive.org|207.241.227.33|:443... connected.
ERROR: Certificate verification error for ia601003.us.archive.org: self signed certificate in certificate chain
To connect to ia601003.us.archive.org insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

unicode strings are broken

julia> A = "uniçº∂e"
"uniçº∂e"

julia> file = jldopen("mydata.jld", "w")
Julia data file version 0.0.1: mydata.jld

julia> write(file, "A", A)

julia> close(file)



julia> file = jldopen("mydata.jld", "r")
Julia data file version 0.0.1: mydata.jld

julia> c = read(file, "A")
ERROR: invalid UTF-8 sequence
 in convert at utf8.jl:110
 in read at /Users/westley/.julia/HDF5/src/plain.jl:1007
 in read at /Users/westley/.julia/HDF5/src/jld.jl:255
 in read at /Users/westley/.julia/HDF5/src/jld.jl:176
 in read at /Users/westley/.julia/HDF5/src/plain.jl:952

Reading a DataFrame from JLD is not always successful

I'm seeing something very similar to issue #29 when trying to read a DataFrame from a JLD file. The error I get upon read(jldfile, objname) is the following

[thebe:skim@jp/skimming]$ julia -F load.jl tw.jld                                                                                            names=df                                                                                                                                     HDF5-DIAG: Error detected in HDF5 (1.8.12) thread 0:
  #000: H5A.c line 557 in H5Aopen(): unable to load attribute info from object header for attribute: 'TypeParameters'
    major: Attribute
    minor: Unable to initialize object
  #001: H5Oattribute.c line 537 in H5O_attr_open_by_name(): can't locate attribute: 'TypeParameters'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.12) thread 0:
  #000: H5A.c line 1400 in H5Aget_name(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
ERROR: Error getting attribute name
 in h5a_get_name at /home/joosep/.julia/HDF5/src/plain.jl:1825
while loading /home/joosep/singletop/stpol2/src/skim/load.jl, in expression starting on line 8

Mysteriously, this appears only when the file is written on SL6 with a specific environment which does not affect which libhdf5 gets loaded, and reading only fails on SL6. When transferred to an OSX machine, reading the file works.

I have attached a file exhibiting this behaviour here: http://hep.kbfi.ee/~joosep/hdf5_read_fail.jld.bz2

Any suggestions on where to start looking? It seems like the jld file is missing some expected some structure in this case.

Unicode variable names?

I suppose there are good reasons for restricting the variable names to ASCIIString (possibly the HDF5 format specs), but I figured I'd ask anyway:

julia> jldopen("tst.jld", "w") do f
         write(f, "a", randn(10,10)); # works
       end

julia> jldopen("tst.jld", "w") do f
         write(f, "ä", randn(10,10)); # oops
       end
ERROR: no method write(JldFile, UTF8String, Array{Float64,2})

Any chance this could work by simply relaxing the signature to ByteString?

Fix reading of empty arrays

For an empty array, Matlab writes the array dimensions as the dataset, and then adds an attribute MATLAB_empty. Check for this and do the right thing.

segfault on string array slicing

The following code leads to a segfault when trying to slice a string array

using HDF5


# create file
f = h5open("testjl.hdf5", "w")
f["MyStrings"] = ["1231","asdasdad","ffsfd"]
f["MyNumbers"] = [1,2,3]
close(f)

# try to read back
f = h5open("testjl.hdf5", "r")

# seems we can read whole data set 

println(read(f["MyStrings"]))
println(read(f["MyNumbers"]))

# but slicing
# works for numbers
println(f["MyNumbers"][1:2])

# but not for strings - segfault
println(f["MyStrings"][1:2])

close(f)

writing to ranges (hyperslabs) of on disk hdf5 arrays?

This is mostly just an enquiry. Is it possible to write to ranges of on disk hdf5 arrays? I need to create a very large array inside an hdf5 file, then write specific portions to it, something like:

file["big_dataset"][:,i,j] = Array

I can do this in python/c using hyperslabs. I'm currently trying to read you code to work out how to do it in julia, but I was wondering if you had thought about this?

The basic steps needed are to select the correct dataspace, and then use this to write from the correct memory space. It looks like your wrapper exposes the necessary interfaces, but I'm still getting my head around reading julia code.

Thanks for any help,
John

Test suite fails without DataFrames

Perhaps the DataFrames test can be conditionally run if DataFrames is actually installed?

You can use Pkg.installed("DataFrames") to check.

setindex for JldFile

Currently, the following yields a no setindex!(JldFile, Int64, ASCIIString) error for jldopen, but it works for h5open:

using HDF5, JLD
jldopen("test.jld", "w") do file
  file["a"] = 1
  dump(file)
end

Is there a design consideration I am overlooking or can I work on a PR to make this work for JldFile as well? Thanks!

rows and columns transposed for DataFrame

With the latest Julia HEAD, doing show(x) and show(y) in test/jld_dataframe.jl, the rows and columns of the DataFrames read from the file are transposed wrt. the one written:

2x2 DataFrame:
          x1                                                                                           x2
[1,]    "x1"                                                                                  [2,3,4,5,6]
[2,]    "x2" [3.141592653589793,6.283185307179586,9.42477796076938,12.566370614359172,15.707963267948966]
2x2 DataFrame:
         x1                                                                                           x2
[1,]    "a"                                                                                  [1,2,3,4,5]
[2,]    "b" [3.141592653589793,6.283185307179586,9.42477796076938,12.566370614359172,15.707963267948966]

Thus, one does not read back with read what is written with write.

  • julia version 0.2.0-prerelease+3919
  • HDF5.jl 11821ba

Can't interpolate variables when using @save macro

Trying to save data using the excellent new @save macro, I can't use syntax like @save $i/file.jld x within a for loop, as I get i is not defined.

Creating a variable with the content beforehand doesn't help either. path = "$i/file.jld" yields the same problem when trying to @save path x.

To workaround this I have to create path before and then use @save :($path) x.

Be nice if this could be done automagically.

no method start(Index) when reading DataFrame

Just putting this out here while I attempt a fix.

The following code

using DataFrames, HDF5, JLD
fi = jldopen("bad_hdf5.jld")
df = read(fi, "df")
println(size(df))

fails with

ERROR: no method start(Index)
 in read at /Users/joosep/.julia/HDF5/src/jld.jl:347
 in read at /Users/joosep/.julia/HDF5/src/jld.jl:207
 in read at /Users/joosep/.julia/HDF5/src/jld.jl:196
 in include_from_node1 at loading.jl:120
while loading /Users/joosep/Dropbox/kbfi/top/stpol/src/analysis/dftest.jl, in expression starting on line 3

on DataFrames.jl 3b269760093542d972436f008cd07e742f9556f2, HDF5.jl b83cea4.
It does not seem consistent, i.e. I can open some older files. Most likely related to the efforts at pruning DataFrames.

Test files (sorry, ~150 MB each)
http://hep.kbfi.ee/~joosep/good_hdf5.jld => succeeds
http://hep.kbfi.ee/~joosep/bad_hdf5.jld => fails

Mac installation changed

hdf5 is now in brew/science,

so the installation instructions should be

brew tap homebrew/science
brew install hdf5

Problem buildling HDF5

(Repost from julia-users)
Hi all. I'm running Julia 0.2.0 on Mac OS X 10.9, and I'm having this issue building HDF5:

julia> Pkg.build("HDF5")
INFO: Building Homebrew
HEAD is now at c588ffb Remove git rebasing code that slipped through
HEAD is now at b6300f3 Update nettle bottle
INFO: Building HDF5
============================================================[ ERROR: HDF5 ]=============================================================

Provider PackageManager failed to satisfy dependency libhdf5
at /Users/john/.julia/HDF5/deps/build.jl:30

============================================================[ BUILD ERRORS ]============================================================

WARNING: HDF5 had build errors.

  • packages with build errors remain installed in /Users/john/.julia
  • build a package and all its dependencies with Pkg.build(pkg)
  • build a single package by running its deps/build.jl script

The offending dependency is nowhere to be found:

julia> dlopen("libhdf5")
ERROR: could not load module libhdf5: dlopen(libhdf5.dylib, 1): image not found
in dlopen at c.jl:29

Any help appreciated!

Error when dumping simple .jld file

Just starting to play with HDF5.jl, looks fantastic!

I wanted to get a feel for how the .jld files look internally, and quickly ran into this:

a = {1=>"a", 2=>"b"}
@save "/tmp/a.jld" a
h5open("/tmp/a.jld", "r") do fid 
    dump(fid)
end

resulting in

HDF5File len 3
  _refs: HDF5Group len 1
    a: HDF5Group len 4
      1: HDF5Dataset (2,) : 
Dataset indexing (hyperslab) is available only for bits types
while loading In[38], in expression starting on line 3
 in getindex at /Users/rene/.julia/HDF5/src/plain.jl:1387
 in dump at /Users/rene/.julia/HDF5/src/plain.jl:880
 in dump at /Users/rene/.julia/HDF5/src/plain.jl:893 (repeats 3 times)
 in dump at show.jl:536
 in anonymous at show.jl:542
 in dump at show.jl:542
 in anonymous at no file:4
 in h5open at /Users/rene/.julia/HDF5/src/plain.jl:504

Non-Named Variables

When I (dumbly) tried to save an unbound array:

using HDF5
using JLD

fid = jldopen("/tmp/test.jld","w")
@write fid rand(0:1,50000)
close(fid)

It worked fine (no errors reported and the file size indicates that the numbers are in there. But I could not manage to retrieve it again, and couldn't even see it's existence:

fidr = jldopen("/tmp/test.jld","r+")
dump(fidr)
@read fidr rand(0:1,50000)
close(fidr)

results in:

julia> include("/home/bana/GSP/code/julia/h5.jl")
JldFile 
  id: Int32 16777216
  filename: ASCIIString "/tmp/test.jld"
  version: ASCIIString "0.0.0"
  toclose: Bool true
  writeheader: Bool true
ERROR: syntax: malformed function argument (: 0 1)
 in include_from_node1 at loading.jl:76
at /home/bana/GSP/code/julia/h5.jl:10

I realize this is more user error, but perhaps the @write macro should refuse to write unless it is a valid string? Or maybe print a warning?

Thanks again!

Memory mapping along with compression

Hi,

Why is memory mapping not usable along with compression? Is it a limitation of HDF5?

It seems to me that that is the best use case for very large files. You want both compression and lazy loading with memory mapping.

Thank you,
Cássio

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.