andyferris / acceleratedarrays.jl Goto Github PK
View Code? Open in Web Editor NEWArrays with acceleration indices
License: Other
Arrays with acceleration indices
License: Other
AcceleratedArrays.jl/src/predicates.jl
Lines 40 to 42 in a77ffde
Should be isgreater(a, b) = isless(b, a)
.
It's correct in the README though!
AcceleratedArrays.jl/README.md
Line 120 in a77ffde
In practice I think the tests don't catch this because they're only looking at the implementations of findall
and findlast
for these on UniqueSortIndex
, which doesn't actually depend on the implementation.
using AcceleratedArrays
using BenchmarkTools
n = 1_000_000
xs = rand(1:big(n)^4, n)
a = accelerate(xs, UniqueHashIndex)
b = accelerate(xs, UniqueSortIndex)
ab = accelerate(a, UniqueSortIndex)
ba = accelerate(b, UniqueHashIndex)
@btime 1 in $a;
@btime 1 in $b;
@btime 1 in $ab;
@btime 1 in $ba;
julia> @btime 1 in $a;
129.609 ns (2 allocations: 40 bytes)
julia> @btime 1 in $b;
79.384 ns (0 allocations: 0 bytes)
julia> @btime 1 in $ab;
203.664 ns (0 allocations: 0 bytes)
julia> @btime 1 in $ba;
119.543 ns (2 allocations: 40 bytes)
For an array with multiple accelerators, how does it decide which is the best method for a function?
julia> a = accelerate([1, 2, 3], UniqueSortIndex)
3-element Array{Int64,1} + UniqueSortIndex:
1
2
3
julia> copy(a)
3-element Array{Int64,1}:
1
2
3
Unless I'm missing some subtlety where this could create an invalid state, this seems to work?
Base.copy(a::AcceleratedArray{T, N, A, I}) where {T,N,A,I} = AcceleratedArray{T, N, A, I}(copy(a.parent), a.index)
Hi @andyferris - this project looks great, and is something I am considering extending / building on. I already have some hacky implementations of something similar, but not as nicely integrate and as general.
What are you thoughts on adding a family of getindex(A::AcceleratedVector, idx::AbstractVector{Int})
methods? I.e. indexing into AcceleratedVector
s with integer StepRange
s, integer Vector
s, etc, and returning AcceleratedVector
s of the same type.
For SortIndex
, this would be quite easy. When indexing with StepRange
s, we would have to check the step is positive. For indexing with arrays, we would have to check the array is sorted first. Since we would be constructing SortIndex
es from known-sorted arrays, it would be good to have a SortIndex
constructor that doesn't do any checks. This might also be useful generally.
For HashIndex
, the hash table would have to be modified, but there are likely lots of optimisations / shortcuts to minimise the work. For example, when indexing with UnitRange{Int}
s, we could add to a global integer offset that is subtracted from the values in the Dict
when they are accessed. Similar to the above, we might want a more direct HashIndex
constructor that accepts a pre-built dictionary & offset, etc.
I'd be happy to do some initial work on this after hearing your thoughts.
I saw there is a spatialindex
branch with one commit and was curious what the background/status of that branch is?
I asked about it on Slack a while back but nobody seemed to know about it.
Performance was much worse than expected for the UniqueHashIndex that I was using, so I started digging into it. I could be doing something wrong, but from the below it doesn't look like findfirst is dispatching to the AcceleratedArrays methods, and instead falling back to the base one?
julia> using AcceleratedArrays
julia> a1 = accelerate(["string$i" for i in 1:1000], UniqueHashIndex);
julia> a2 = accelerate(["string$i" for i in 1:1000], HashIndex);
julia> @which findfirst(==("string940"), a2)
findfirst(testf::Function, A::Union{AbstractString, AbstractArray}) in Base at array.jl:1832
julia> @which findfirst(==("string940"), a1)
findfirst(testf::Function, A::Union{AbstractString, AbstractArray}) in Base at array.jl:1832
julia> @btime findfirst(==("string940"), a1)
4.276 μs (3 allocations: 64 bytes)
940
julia> @btime findfirst(==("string940"), a2)
4.195 μs (3 allocations: 64 bytes)
940
julia> @btime a2.index.dict["string940"]
128.458 ns (3 allocations: 64 bytes)
1-element Array{Int64,1}:
940
julia> @btime a1.index.dict["string940"]
131.360 ns (4 allocations: 80 bytes)
1-element SingleVector{Int64}:
940
julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a* (2020-05-28 17:42 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 10
The timings are consistent with a naive for-loop:
julia> function findfurst(k, xs)
@inbounds for (i, x) in enumerate(xs)
if x == k return i end
end
0
end
julia> @btime findfurst("string940", a1)
3.908 μs (1 allocation: 16 bytes)
940
For HashIndex I can find a method here that I think should match:
https://github.com/andyferris/AcceleratedArrays.jl/blob/master/src/HashIndex.jl#L46
although can't find the equivalent for UniqueHashIndex.
Even if this package's interval isn't the same as that in IntervalSets.jl, could it at least be an appropriate subtype?
Ref mcabbott/AxisKeys.jl#13, where that package could use this information to route AcceleratedArrays.Interval
back to this package's findall(in(Interval), ...)
.
It seems we haven't created a release for ages, and the registry version still doesn't work well on Julia 1.2. Let's fix that.
@JuliaRegistrator register
For indexing distributed arrays, there are two invariants to think of:
Maybe the indexes are to be kept on the processes in the interest of reducing communication.
You can assume there is a way to do distributed sort
and sortperm
.
What should be stored on the master process such that it knows enough to dispatch groupby
and join
operations efficiently?
Unrelated question: Can we use these operations to implement distributed arrays itself? (e.g. matrix multiply is a groupjoin on the remote references to the subarrays)
Thoughts?
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.