Comments (18)
Works for me. I tried the original version, then as suggested I commented out the loop with the macro, bumped the number of iterations up to a hundred, and ran that several times. Never had a problem.
What platform are you on? What libhdf5 library version?
from hdf5.jl.
I'm on a mac, OSX 10.8.4, using macports. I installed port hdf5-18 @1.8.11_0+cxx+gcc47
From the command line h5dump --version
gives h5dump: Version 1.8.11
I'm not sure how to directly check the version number that Julia is using, but I should only have one version installed.
Maybe I should try an equivalent test with python and h5py? If it's a problem with my hdf5 it should show up in both.
from hdf5.jl.
Sounds good. Otherwise, I don't have any good ideas about how to debug it since I don't have a mac.
from hdf5.jl.
I can reproduce this on both OS X and Linux. Adding close(d)
before close(fid)
fixes it, as does gc_disable()
. Adding a gc()
before d[1:max_dim]=rand(max_dim)
makes it happen reliably.
from hdf5.jl.
Puzzling that H5F_CLOSE_STRONG
doesn't fix this. Does inserting sleep(0.1)
after close(fid)
fix it, too?
from hdf5.jl.
Even your gc()
trick doesn't trigger this on my machine. Linux (Kubuntu 12.04) with libhdf5 1.8.4-patch1-3ubuntu2.
from hdf5.jl.
Should close recursively close everything underneath it before actually closing the argument?
from hdf5.jl.
That's much safer, but we tried to set it up so it's not necessary. If the sleep trick works it might be better to (1) figure out the minimum time needed, and (2) insert that into close()
for HDF5 files (as long as it's not too long).
from hdf5.jl.
That sounds like something that will be system dependent. It could come back later.
from hdf5.jl.
This is really just #31 in alternate form. H5F_CLOSE_STRONG is supposed to fix this. There we fixed the problem by making sure that our code cleaned up properly, but we can't do that for user code.
So, either we declare that users are responsible for cleaning each item up, or we figure out why H5F_CLOSE_STRONG isn't working. The problem with the former is we'll get this same bug filed again and again, so I'm a bit motivated to try to squash it.
I'm guessing that the OS is doing some operations slightly asynchronously, and we're getting a race condition. It's about 30 seconds of work to test this hypothesis, but I can't do it myself since I can't trigger the bug on my system. So, I'm just hoping that someone will run that example again with a sleep(0.1)
after the close(fid)
.
If that fixes it, then probably we should look into issuing an fsync
at the end of close
, but that involves setting up a ccall
(which I didn't want to ask you to do right away).
from hdf5.jl.
I tried it with sleep(0.1)
before and after close(fid)
. Both cases still crashed after 2 successful writes.
Another interesting behavior is that once it has failed, I have to exit()
julia and restart before h5open
will work with the same filename.
from hdf5.jl.
sleep(0.1)
doesn't help for me either. It seems like the problem isn't just that things aren't getting closed, since gc_disable()
actually improves the situation.
Assuming H5F_CLOSE_STRONG
works, what's supposed to be happening when the finalizer for d
tries to close it after the file is already closed? Maybe it's closing a newly created object instead?
from hdf5.jl.
Another interesting behavior is that once it has failed, I have to exit() julia and restart before h5open will work with the same filename.
That's symptomatic of a stale file handle (a file that isn't actually closed). Perhaps calling gc()
would correct that problem (fid
has a finalizer that closes the file).
Assuming H5F_CLOSE_STRONG works, what's supposed to be happening when the finalizer for d tries to close it after the file is already closed? Maybe it's closing a newly created object instead?
That's an interesting theory, and might be the explanation. But it doesn't seem that this should be possible. See the code starting at plain.jl line 540. At least on my system, all this works (which of course might be why I can't reproduce the bug):
julia> using HDF5, JLD
julia> f = jldopen("/tmp/mydata.jld")
Julia data file version 0.0.1: /tmp/mydata.jld
julia> isvalid(f)
true
julia> names(f)
2-element ASCIIString Array:
"df"
"df2"
julia> d = f["df"]
HDF5 dataset: /df (file: /tmp/mydata.jld)
julia> close(f)
julia> f
Julia data file (closed): /tmp/mydata.jld
julia> d
HDF5 dataset (invalid)
julia> isvalid(d)
false
julia> f = jldopen("/tmp/mydata.jld")
Julia data file version 0.0.1: /tmp/mydata.jld
julia> d
HDF5 dataset (invalid)
julia> isvalid(f)
true
from hdf5.jl.
On my system (with libhdf5 1.8.9), it looks like the dataset's ID can get used for another dataset after the file is closed, so d
can go from valid -> invalid -> valid:
julia> using HDF5
julia> max_dim = int(10^6);
julia> fid = h5open("test.h5","w")
HDF5 data file: test.h5
julia> d = d_create(fid, "b", datatype(Float64), dataspace((max_dim,)))
HDF5 dataset: /b (file: test.h5)
julia> close(fid)
julia> isvalid(d)
false
julia> id = h5open("test.h5","w")
HDF5 data file: test.h5
julia> isvalid(d)
false
julia> d2 = d_create(fid, "b", datatype(Float64), dataspace((max_dim,)))
HDF5 dataset: /b (file: test.h5)
julia> isvalid(d)
true
julia> d.id
83886080
julia> d2.id
83886080
Unsurprisingly, if I set d = nothing
and trigger a gc()
, d2
becomes invalid. I think this is what's happening in the example above. On your system, I guess d
and d2
get different IDs in this case?
It seems like three possible fixes here are:
- Get HDF5 to return unique IDs, even after some IDs have been closed
- Check if the associated file is already closed before closing objects in finalizers (assuming the ID issue isn't also a problem for files themselves)
- Keep an array of WeakRefs to objects associated with each file, and set
o.toclose = false
on each when the file is closed
from hdf5.jl.
Good detective work! Certainly 2 and 3 are within our control (I'm less sure about 1). 2 seems like the easiest fix, since each object already maintains a link to the file---even if the IDs get repeated by HDF5, we can set the id of a closed PlainHDF5File to -1 or something and just check that.
Does that fix this bug?
from hdf5.jl.
I tried to fix this in cbbf24d, but it turns out the file IDs do repeat, and that's hard to deal with. While the fix in cbbf24d worked for plain HDF5 files, it broke closing JLD files on my system.
It then occurred to me that there is a serious issue with plain()
, which may or may not have been the cause of the breakage, but seems problematic in any case. If I create a dataset by calling
plainfile = plain(jldfile)
dset = d_create(plainfile)
(EDIT corrected code)
then dset.file
is plainfile
, not jldfile
. This means that the finalizer for dset
doesn't know if jldfile
has been closed. Moreover, the finalizer for jldfile
could potentially get called while dset
is still in scope, which probably never happens in jld.jl
, but seems unsafe. @timholy, is it reasonable to get rid of plain
and instead create differently named variants of the functions that currently require a PlainHDF5File
s that can operate on any HDF5File
?
from hdf5.jl.
Rather than hoping that HDF5 will give us a unique ID, why not count on the fact that the Julia objects will be different? As I suggested above, you could set fid.id = -1
in the file-close function and then have the dataset close function check dset.file.id != -1
(and skip closing if that turns out to be the case, because if it is -1
it means that it's already been closed via CLOSE_STRONG).
In your concern about plain
above, I assume pf = plainfile
? In a world in which file IDs were unique, this wouldn't be an issue because isvalid
would handle it all for us. However, it seems we don't live in that world.
Rather than having differently-named versions of functions in JLD, I think the better option might be to have the filetype be a wrapper around a "raw" single-instance object (so there's a 1-1 relationship between files and these raw objects). Types that signal "formatting" (JLDFile
and PlainHDF5File
) maintain a reference, not a copy, to the same raw object so they'll get "status updates" if the file gets closed (assuming we use the id=-1
trick). In other words,
type RawHDF5File
id::Hid
filename::String
toclose::Bool
end
type PlainHDF5File
raw::RawHDF5File
end
and similarly for JLDFile
(which can also have additional fields). Then plain()
would be implemented as
plain(f::JLDFile) = PlainHDF5File(f.raw)
which just makes a reference to the same single-instance RawHDF5File
object. I'm not sure these are great names, but just to throw something out there. Maybe we could just make the PlainHDF5File
be the raw object, and have JLD contain it rather than copy its fields.
In principle I'd be happy to help implement this, but I'm traveling all week and pretty short on time. If you don't get to it before next week, I can give it a look.
from hdf5.jl.
This doesn't seem urgent, so if you have other, more important things to do, further discussion can wait until you get back.
I think that setting fid.id = -1
is fine in principle, but we have to make sure we don't alias the ID. Because libhdf5 repeats IDs, it might also be a good idea to explicitly check that the user isn't accessing a closed object, since otherwise the user could accidentally access the wrong object instead. To perform this check, we also need to ensure that we don't alias objects' IDs.
I suggested having differently named versions of functions because, with typical OOP, the behavior that JLD etc. achieve using plain
would be achieved by calling the method from the superclass (i.e. super
in Java or Python) but in Julia this isn't possible unless the "superclass" method is available under a different name.
Your suggestion also sounds like it would work, but I think we'd also want RawHDF5Group
and RawHDF5Dataset
. Otherwise plain
remains prone to GC-related issues, i.e.:
dset = plain(d_create(jldfile))
gc() # not obvious that this closes dset
from hdf5.jl.
Related Issues (20)
- Is tab completion supportable? [feature request] HOT 10
- Change use constructors instead of functions HOT 2
- Windows test failures for SZIP compression HOT 2
- Update change log and release notes
- Add to list of known preferences HOT 2
- Convenience function to use custom or system provided HDF5 binaries HOT 1
- Test failures in h5a_iterate HOT 1
- Changed requirements in HDF5_jll's `libhdf5.so` for `libcurl.so`? HOT 8
- Can't get HDF5.jl work with Julia running in docker (julia:1.8-alpine3.17) - can't find libmpi.so.12 HOT 6
- Add mid/high level interface for HDF5 Dimension Scale HOT 1
- Writing scalar datasets of compound types HOT 3
- freeze when `hdf5_type_id` on self-referential datatype HOT 17
- Get rid of HISTORY.md? HOT 1
- Segfault when writing variable length string as attribute HOT 8
- Feature request - add support for SparseMatrixCSC HOT 1
- Support szip (freely) HOT 5
- Installing HDF5.jl on ARM M1 HOT 5
- HDF5.jl triggers segfault in ccall with openmp+clang(m1) with julia 1.10 HOT 20
- Inconsistent writing of complex data inside compound type HOT 1
- `set_libraries!()` fails on fresh install HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hdf5.jl.