severinson / h5sparse.jl Goto Github PK
View Code? Open in Web Editor NEWSupport for out-of-core sparse arrays in Julia, backed by a HDF5 file stored on disk
License: MIT License
Support for out-of-core sparse arrays in Julia, backed by a HDF5 file stored on disk
License: MIT License
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
julia> h5_path = tempname()*".h5"
julia> data = zeros(100,1)
julia> H5SparseMatrixCSC(h5_path, "test", sparse(data))
HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
#000: H5Pdcpl.c line 2004 in H5Pset_chunk(): chunk dimensionality must be positive
major: Invalid arguments to routine
minor: Out of range
ERROR: LoadError: Error setting chunk size
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] h5p_set_chunk
@ ~/.julia/packages/HDF5/0iEnL/src/api.jl:1211 [inlined]
[3] set_chunk(::HDF5.Properties)
@ HDF5 ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:1915
[4] _prop_set!(p::HDF5.Properties, name::Symbol, val::Vector{Int64}, check::Bool)
@ HDF5 ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:837
[5] create_property(class::Int64; pv::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:chunk, :blosc), Tuple{Vector{Int64}, Int64}}})
@ HDF5 ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:865
[6] create_dataset(parent::HDF5.Group, path::String, dtype::HDF5.Datatype, dspace::HDF5.Dataspace; pv::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:chunk, :blosc), Tuple{Vector{Int64}, Int64}}})
@ HDF5 ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:729
[7] #create_dataset#32
@ ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:737 [inlined]
[8] h5writecsc(fid::HDF5.File, name::String, m::Int64, n::Int64, colptr::Vector{Int64}, rowval::Vector{Int64}, nzval::Vector{Float64}; overwrite::Bool, chunk::Nothing, blosc::Int64, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ H5Sparse ~/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:250
[9] h5writecsc
@ ~/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:231 [inlined]
[10] #h5writecsc#3
@ ~/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:228 [inlined]
[11] h5writecsc
@ ~/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:227 [inlined]
[12] H5SparseMatrixCSC(fid::HDF5.File, name::String, B::SparseMatrixCSC{Float64, Int64}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ H5Sparse ~/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:86
[13] H5SparseMatrixCSC(fid::HDF5.File, name::String, B::SparseMatrixCSC{Float64, Int64})
@ H5Sparse ~/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:86
[14] H5SparseMatrixCSC(::String, ::String, ::Vararg{Any, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ H5Sparse ~/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:84
[15] H5SparseMatrixCSC(::String, ::String, ::Vararg{Any, N} where N)
@ H5Sparse ~/.julia/packages/H5Sparse/oVw2O/src/H5Sparse.jl:84
[16] top-level scope
@ ~/code/lensman/notebooks/debug2.jl:9
in expression starting at /home/tyler/code/lensman/notebooks/debug2.jl:9
The following code fails.
B = spzeros(0, 0)
filename = tempname()
A = H5SparseMatrixCSC(filename, name, B)
@test sparse(A) โ B
I think the correct solution would be to allow last(rows) < first(rows)
(and the same for cols
), similar to how regular matrices work (e.g., sprand(0, 0)[0:-1, :]
is valid, and returns a 0 x 0
matrix).
Hello, thanks for this package!
I've encountered a very subtle bug that does seem specifically triggered by H5Sparse.jl and not HDF5.jl. I've created a minimally working example below. I have not been able to trigger the bug without the use of a Channel. It seems there is a race condition related to group creation...
using HDF5, H5Sparse, SparseArrays
import Base.Threads: @threads
function write_n_sparse_datasets(h5path, n, channel, blocker, type=:H5)
h5 = h5open(h5path, "w")
while n > 0
@assert Threads.threadid() == 1 # we only access from the master thread
dset_name, data = take!(channel)
take!(blocker)
println("take $n")
if type == :H5
h5[dset_name] = data
elseif type == :H5Sparse
H5SparseMatrixCSC(h5, dset_name, data)
end
n -= 1
end
h5
end
function test(type=:H5, n=10, N=512*512*300)
to_write = Channel(Inf)
blocker = Channel(5) # restrict to N active threads
h5_path = tempname()*".h5"
isfile(h5_path) ? rm(h5_path) : ()
@threads for i in 1:n
@async begin
put!(blocker, i)
println("start $i")
data = collect(rand(N,1) .> 0.9)
if type == :H5
put!(to_write, "$i" => data)
elseif type == :H5Sparse
data = sparse(data)
put!(to_write, "$i" => data)
end
println("put $i")
end
end
println("call write")
h5 = write_n_sparse_datasets(h5_path, n, to_write, blocker, type)
@show keys(h5)
close(h5)
rm(h5_path)
end
test(:H5, 10) # always works
test(:H5, 50) # always works
test(:H5Sparse, 5) # usually works
test(:H5Sparse, 10) # sometimes works
test(:H5Sparse, 50) # always errors
start 2
start 3
start 5
start 7
start 4
call write
put 7
take 10
start 6
put 5
put 4
put 2
put 3
take 9
start 9HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
#000: H5G.c line 388 in H5Gcreate2(): unable to create group
major: Symbol table
minor: Unable to initialize object
#001: H5VLcallback.c line 4081 in H5VL_group_create(): group create failed
major: Virtual Object Layer
minor: Unable to create file
#002: H5VLcallback.c line 4047 in H5VL__group_create(): group create failed
major: Virtual Object Layer
minor: Unable to create file
#003: H5VLnative_group.c line 74 in H5VL__native_group_create(): unable to create group
major: Symbol table
minor: Unable to initialize object
#004: H5Gint.c line 158 in H5G__create_named(): unable to create and link to group
major: Symbol table
minor: Unable to initialize object
#005: H5L.c line 1804 in H5L_link_object(): unable to create new link to object
major: Links
minor: Unable to initialize object
#006: H5L.c line 2045 in H5L__create_real(): can't insert link
majHDF5-DIor: Links
minor: Unable to insert object
AG: Error detected in #012: H5Gtraverse.c line 855 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#013: H5Gtraverse.c line 585 in H5G__traverse_real(): can't look up component
major: Symbol table
minor:HDF5 ( Object not found
#014: H5Gobj.c line 1125 in H5G__obj_lookup(): can't check for link info message
major: Symbol table
minor: Can't get value
#015: H5Gobj.c line 326 in H5G__obj_get_linfo(): unable to read object header
major: Symbol table
minor: Can't get value
#016: H5Omessage.c line 883 in H5O_msg_exists(): unable to protect object header
ma1.12.0) threajor: dObject header
0:
minor: #Unable to protect metadata
000: #017: H5Oint.c line H5O.c line 1082 i1239 in H5Oclose()n : unable to close object
H5O_protect() major: Object header
: unable to load object header
major: minor: Unable to release object
Object header
minor: Unable to protect metadata
# #018: H5AC.c line 1312 in 001: H5AC_protect(): H5C_protect() failed
major: Object cache
H5I.c line minor: Unable to protect metadata
#019: H5C.c line 2242 in H5C_protect(): 1422 in H5I_dec_app_ref(): can't decrement ID ref count
major: Object atom
ring type mismatch occurred for cache entry
major: Object cache
minor: Internal error detected
minor: Unable to decrement reference count
#002: (null) line 353 in (null)()
major: Dataset
minor: Close failed
#4294967279: (null) line 2639 in (null)()
major: Virtual Object Layer
minor: Can't reset object
#4294967280: (null) line 2320 in (null)()
major: Virtual Object Layer
minor: Bad value
#4294967281: (null) line 388 in (null)()
major: Symbol table
minor: Unable to initialize object
#4294967282: (null) line 4081 in (null)()
major: Virtual Object Layer
minor: Unable to create file
#4294967283: (null) line 4047 in (null)()
major: Virtual Object Layer
minor: Unable to create file
#4294967284: (null) line 74 in (null)()
major: Symbol table
minor: Unable to initialize object
#4294967285: (null) line 158 in (null)()
major: Symbol table
minor: Unable to initialize object
#4294967286: (null) line 1804 in (null)()
major: Links
minor: Unable to initialize object
#4294967287: (null) line 2045 in (null)()
major: Links
minor: Unable to insert object
#4294967288: (null) line 855 in (null)()
major: Symbol table
minor: Object not found
#4294967289: (null) line 585 in (null)()
major: Symbol table
minor: Object not found
#4294967290: (null) line 1125 in (null)()
major: Symbol table
minor: Can't get value
#4294967291: (null) line 326 in (null)()
major: Symbol table
minor: Can't get value
#4294967292: (null) line 883 in (null)()
major: Object header
minor: Unable to protect metadata
#4294967293: (null) line 1082 in (null)()
major: Object header
minor: Unable to protect metadata
#4294967294: (null) line 1312 in (null)()
major: Object cache
minor: Unable to protect metadata
#4294967295: (null) line 2242 in (null)()
major: Object cache
minor: Internal error detected
error in running finalizer: ErrorException("Error closing object")
ERROR:
error at ./error.jl:33
h5o_close at /home/tyler/.julia/packages/HDF5/0iEnL/src/api.jl:893 [inlined]
close at /home/tyler/.julia/packages/HDF5/0iEnL/src/HDF5.jl:586
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
run_finalizer at /buildworker/worker/package_linux64/build/src/gc.c:278
jl_gc_run_finalizers_in_list at /buildworker/worker/package_linux64/build/src/gc.c:365
LoadError: run_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:394
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3260
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:880 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1204
jl_gc_alloc_ at /buildworker/worker/package_linux64/build/src/julia_internal.h:285 [inlined]
_new_array_ at /buildworker/worker/package_linux64/build/src/array.c:132 [inlined]
_new_array at /buildworker/worker/package_linux64/build/src/array.c:188 [inlined]
jl_alloc_array_2d at /buildworker/worker/package_linux64/build/src/array.c:466
Array at ./boot.jl:450 [inlined]
Array at ./boot.jl:458 [inlined]
Array at ./boot.jl:465 [inlined]
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:288 [inlined]
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:289
rand at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:277 [inlined]
macro expansion at /home/tyler/code/lensman/notebooks/debug.jl:30 [inlined]
#5 at ./task.jl:411
unknown function (ip: 0x7fa87413fd9c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Error creating group //5
Stacktrace:
[1] error(::String, ::String, ::String, ::String)
@ Base ./error.jl:42
[2] h5g_create
@ ~/.julia/packages/HDF5/0iEnL/src/api.jl:647 [inlined]
[3] create_group(parent::HDF5.File, path::String, lcpl::HDF5.Properties, gcpl::HDF5.Properties)
@ HDF5 ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:724
[4] create_group
@ ~/.julia/packages/HDF5/0iEnL/src/HDF5.jl:723 [inlined]
[5] h5writecsc(fid::HDF5.File, name::String, m::Int64, n::Int64, colptr::Vector{Int64}, rowval::Vector{Int64}, nzval::Vector{Bool}; overwrite::Bool, chunk::String, blosc::Int64, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ H5Sparse ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:238
[6] h5writecsc
@ ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:231 [inlined]
[7] #h5writecsc#3
@ ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:227 [inlined]
[8] h5writecsc
@ ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:226 [inlined]
[9] H5SparseMatrixCSC(fid::HDF5.File, name::String, B::SparseMatrixCSC{Bool, Int64}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ H5Sparse ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:85
[10] H5SparseMatrixCSC(fid::HDF5.File, name::String, B::SparseMatrixCSC{Bool, Int64})
@ H5Sparse ~/.julia/packages/H5Sparse/Kj4hm/src/H5Sparse.jl:85
[11] write_n_sparse_datasets(h5path::String, n::Int64, dset_size::Vector{Int64}, channel::Channel{Any}, blocker::Channel{Any}, type::Symbol)
@ Main ~/code/lensman/notebooks/debug.jl:13
[12] test(type::Symbol, n::Int64)
@ Main ~/code/lensman/notebooks/debug.jl:42
[13] top-level scope
@ ~/code/lensman/notebooks/debug.jl:52
in expression starting at /home/tyler/code/lensman/notebooks/debug.jl:52
put 6
Perhaps can just change docs to recommend sparse(Matrix(A))
instead of Matrix(A)
?
I have a file hemisphere_masks.h5
.
julia> @time lm = Matrix(sparse(H5SparseMatrixCSC("/scratch/atlas/ZBrain/hemisphere_masks.h5", "left")));
0.314911
julia> @time lm = Matrix(H5SparseMatrixCSC("/scratch/atlas/ZBrain/hemisphere_masks.h5", "left"))
[5-10+ minutes, I forget]
Each of these functions return a H5View
, for which each indexing operation requires reading from disk. To support efficient iteration over a H5SparseMatrixCSC
, the library should automatically load chunks of the colptr/rowvals/nonzeros vectors into memory that are cached.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.