Git Product home page Git Product logo

Comments (18)

timholy avatar timholy commented on May 16, 2024

Works for me. I tried the original version, then as suggested I commented out the loop with the macro, bumped the number of iterations up to a hundred, and ran that several times. Never had a problem.

What platform are you on? What libhdf5 library version?

from hdf5.jl.

ggggggggg avatar ggggggggg commented on May 16, 2024

I'm on a mac, OSX 10.8.4, using macports. I installed port hdf5-18 @1.8.11_0+cxx+gcc47
From the command line h5dump --version gives h5dump: Version 1.8.11

I'm not sure how to directly check the version number that Julia is using, but I should only have one version installed.

Maybe I should try an equivalent test with python and h5py? If it's a problem with my hdf5 it should show up in both.

from hdf5.jl.

timholy avatar timholy commented on May 16, 2024

Sounds good. Otherwise, I don't have any good ideas about how to debug it since I don't have a mac.

from hdf5.jl.

simonster avatar simonster commented on May 16, 2024

I can reproduce this on both OS X and Linux. Adding close(d) before close(fid) fixes it, as does gc_disable(). Adding a gc() before d[1:max_dim]=rand(max_dim) makes it happen reliably.

from hdf5.jl.

timholy avatar timholy commented on May 16, 2024

Puzzling that H5F_CLOSE_STRONG doesn't fix this. Does inserting sleep(0.1) after close(fid) fix it, too?

from hdf5.jl.

timholy avatar timholy commented on May 16, 2024

Even your gc() trick doesn't trigger this on my machine. Linux (Kubuntu 12.04) with libhdf5 1.8.4-patch1-3ubuntu2.

from hdf5.jl.

ggggggggg avatar ggggggggg commented on May 16, 2024

Should close recursively close everything underneath it before actually closing the argument?

from hdf5.jl.

timholy avatar timholy commented on May 16, 2024

That's much safer, but we tried to set it up so it's not necessary. If the sleep trick works it might be better to (1) figure out the minimum time needed, and (2) insert that into close() for HDF5 files (as long as it's not too long).

from hdf5.jl.

ggggggggg avatar ggggggggg commented on May 16, 2024

That sounds like something that will be system dependent. It could come back later.

from hdf5.jl.

timholy avatar timholy commented on May 16, 2024

This is really just #31 in alternate form. H5F_CLOSE_STRONG is supposed to fix this. There we fixed the problem by making sure that our code cleaned up properly, but we can't do that for user code.

So, either we declare that users are responsible for cleaning each item up, or we figure out why H5F_CLOSE_STRONG isn't working. The problem with the former is we'll get this same bug filed again and again, so I'm a bit motivated to try to squash it.

I'm guessing that the OS is doing some operations slightly asynchronously, and we're getting a race condition. It's about 30 seconds of work to test this hypothesis, but I can't do it myself since I can't trigger the bug on my system. So, I'm just hoping that someone will run that example again with a sleep(0.1) after the close(fid).

If that fixes it, then probably we should look into issuing an fsync at the end of close, but that involves setting up a ccall (which I didn't want to ask you to do right away).

from hdf5.jl.

ggggggggg avatar ggggggggg commented on May 16, 2024

I tried it with sleep(0.1) before and after close(fid). Both cases still crashed after 2 successful writes.

Another interesting behavior is that once it has failed, I have to exit() julia and restart before h5open will work with the same filename.

from hdf5.jl.

simonster avatar simonster commented on May 16, 2024

sleep(0.1) doesn't help for me either. It seems like the problem isn't just that things aren't getting closed, since gc_disable() actually improves the situation.

Assuming H5F_CLOSE_STRONG works, what's supposed to be happening when the finalizer for d tries to close it after the file is already closed? Maybe it's closing a newly created object instead?

from hdf5.jl.

timholy avatar timholy commented on May 16, 2024

Another interesting behavior is that once it has failed, I have to exit() julia and restart before h5open will work with the same filename.

That's symptomatic of a stale file handle (a file that isn't actually closed). Perhaps calling gc() would correct that problem (fid has a finalizer that closes the file).

Assuming H5F_CLOSE_STRONG works, what's supposed to be happening when the finalizer for d tries to close it after the file is already closed? Maybe it's closing a newly created object instead?

That's an interesting theory, and might be the explanation. But it doesn't seem that this should be possible. See the code starting at plain.jl line 540. At least on my system, all this works (which of course might be why I can't reproduce the bug):

julia> using HDF5, JLD

julia> f = jldopen("/tmp/mydata.jld")
Julia data file version 0.0.1: /tmp/mydata.jld

julia> isvalid(f)
true

julia> names(f)
2-element ASCIIString Array:
 "df" 
 "df2"

julia> d = f["df"]
HDF5 dataset: /df (file: /tmp/mydata.jld)

julia> close(f)

julia> f
Julia data file (closed): /tmp/mydata.jld

julia> d
HDF5 dataset (invalid)

julia> isvalid(d)
false

julia> f = jldopen("/tmp/mydata.jld")
Julia data file version 0.0.1: /tmp/mydata.jld

julia> d
HDF5 dataset (invalid)

julia> isvalid(f)
true

from hdf5.jl.

simonster avatar simonster commented on May 16, 2024

On my system (with libhdf5 1.8.9), it looks like the dataset's ID can get used for another dataset after the file is closed, so d can go from valid -> invalid -> valid:

julia> using HDF5

julia> max_dim = int(10^6);

julia> fid = h5open("test.h5","w")
HDF5 data file: test.h5

julia> d = d_create(fid, "b", datatype(Float64), dataspace((max_dim,)))
HDF5 dataset: /b (file: test.h5)

julia> close(fid)

julia> isvalid(d)
false

julia> id = h5open("test.h5","w")
HDF5 data file: test.h5

julia> isvalid(d)
false

julia> d2 = d_create(fid, "b", datatype(Float64), dataspace((max_dim,)))
HDF5 dataset: /b (file: test.h5)

julia> isvalid(d)
true

julia> d.id
83886080

julia> d2.id
83886080

Unsurprisingly, if I set d = nothing and trigger a gc(), d2 becomes invalid. I think this is what's happening in the example above. On your system, I guess d and d2 get different IDs in this case?

It seems like three possible fixes here are:

  1. Get HDF5 to return unique IDs, even after some IDs have been closed
  2. Check if the associated file is already closed before closing objects in finalizers (assuming the ID issue isn't also a problem for files themselves)
  3. Keep an array of WeakRefs to objects associated with each file, and set o.toclose = false on each when the file is closed

from hdf5.jl.

timholy avatar timholy commented on May 16, 2024

Good detective work! Certainly 2 and 3 are within our control (I'm less sure about 1). 2 seems like the easiest fix, since each object already maintains a link to the file---even if the IDs get repeated by HDF5, we can set the id of a closed PlainHDF5File to -1 or something and just check that.

Does that fix this bug?

from hdf5.jl.

simonster avatar simonster commented on May 16, 2024

I tried to fix this in cbbf24d, but it turns out the file IDs do repeat, and that's hard to deal with. While the fix in cbbf24d worked for plain HDF5 files, it broke closing JLD files on my system.

It then occurred to me that there is a serious issue with plain(), which may or may not have been the cause of the breakage, but seems problematic in any case. If I create a dataset by calling

plainfile = plain(jldfile)
dset = d_create(plainfile)

(EDIT corrected code)

then dset.file is plainfile, not jldfile. This means that the finalizer for dset doesn't know if jldfile has been closed. Moreover, the finalizer for jldfile could potentially get called while dset is still in scope, which probably never happens in jld.jl, but seems unsafe. @timholy, is it reasonable to get rid of plain and instead create differently named variants of the functions that currently require a PlainHDF5Files that can operate on any HDF5File?

from hdf5.jl.

timholy avatar timholy commented on May 16, 2024

Rather than hoping that HDF5 will give us a unique ID, why not count on the fact that the Julia objects will be different? As I suggested above, you could set fid.id = -1 in the file-close function and then have the dataset close function check dset.file.id != -1 (and skip closing if that turns out to be the case, because if it is -1 it means that it's already been closed via CLOSE_STRONG).

In your concern about plain above, I assume pf = plainfile? In a world in which file IDs were unique, this wouldn't be an issue because isvalid would handle it all for us. However, it seems we don't live in that world.

Rather than having differently-named versions of functions in JLD, I think the better option might be to have the filetype be a wrapper around a "raw" single-instance object (so there's a 1-1 relationship between files and these raw objects). Types that signal "formatting" (JLDFile and PlainHDF5File) maintain a reference, not a copy, to the same raw object so they'll get "status updates" if the file gets closed (assuming we use the id=-1 trick). In other words,

type RawHDF5File
    id::Hid
    filename::String
    toclose::Bool
end

type PlainHDF5File
    raw::RawHDF5File
end

and similarly for JLDFile (which can also have additional fields). Then plain() would be implemented as

plain(f::JLDFile) = PlainHDF5File(f.raw)

which just makes a reference to the same single-instance RawHDF5File object. I'm not sure these are great names, but just to throw something out there. Maybe we could just make the PlainHDF5File be the raw object, and have JLD contain it rather than copy its fields.

In principle I'd be happy to help implement this, but I'm traveling all week and pretty short on time. If you don't get to it before next week, I can give it a look.

from hdf5.jl.

simonster avatar simonster commented on May 16, 2024

This doesn't seem urgent, so if you have other, more important things to do, further discussion can wait until you get back.

I think that setting fid.id = -1 is fine in principle, but we have to make sure we don't alias the ID. Because libhdf5 repeats IDs, it might also be a good idea to explicitly check that the user isn't accessing a closed object, since otherwise the user could accidentally access the wrong object instead. To perform this check, we also need to ensure that we don't alias objects' IDs.

I suggested having differently named versions of functions because, with typical OOP, the behavior that JLD etc. achieve using plain would be achieved by calling the method from the superclass (i.e. super in Java or Python) but in Julia this isn't possible unless the "superclass" method is available under a different name.

Your suggestion also sounds like it would work, but I think we'd also want RawHDF5Group and RawHDF5Dataset. Otherwise plain remains prone to GC-related issues, i.e.:

dset = plain(d_create(jldfile))
gc() # not obvious that this closes dset

from hdf5.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.