andyferris / acceleratedarrays.jl Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 10.0 430 KB

Arrays with acceleration indices

License: Other

Julia 100.00%

acceleratedarrays.jl's People

Contributors

Stargazers

Watchers

Forkers

visr nickrobinson251 tokazama fredrikekre juliatagbot kcajf zeta1999 zgornel standardgalactic bencurran98

acceleratedarrays.jl's Issues

Typo in `isgreater`

AcceleratedArrays.jl/src/predicates.jl

Lines 40 to 42 in a77ffde

 function isgreater(a, b) 

 isless(b, a) || isequal(a, b) 

 end

Should be isgreater(a, b) = isless(b, a).

It's correct in the README though!

AcceleratedArrays.jl/README.md

Line 120 in a77ffde

* `isgreater(a, b) = isless(b, a)`

In practice I think the tests don't catch this because they're only looking at the implementations of findall and findlast for these on UniqueSortIndex, which doesn't actually depend on the implementation.

Method choice given multiple accelerators

using AcceleratedArrays
using BenchmarkTools

n = 1_000_000
xs = rand(1:big(n)^4, n)

a = accelerate(xs, UniqueHashIndex)
b = accelerate(xs, UniqueSortIndex)
ab = accelerate(a, UniqueSortIndex)
ba = accelerate(b, UniqueHashIndex)


@btime 1 in $a;
@btime 1 in $b;
@btime 1 in $ab;
@btime 1 in $ba;


julia> @btime 1 in $a;
  129.609 ns (2 allocations: 40 bytes)

julia> @btime 1 in $b;
  79.384 ns (0 allocations: 0 bytes)

julia> @btime 1 in $ab;
  203.664 ns (0 allocations: 0 bytes)

julia> @btime 1 in $ba;
  119.543 ns (2 allocations: 40 bytes)

For an array with multiple accelerators, how does it decide which is the best method for a function?

Implement copy

julia> a = accelerate([1, 2, 3], UniqueSortIndex)
3-element Array{Int64,1} + UniqueSortIndex:
 1
 2
 3

julia> copy(a)
3-element Array{Int64,1}:
 1
 2
 3

Unless I'm missing some subtlety where this could create an invalid state, this seems to work?

Base.copy(a::AcceleratedArray{T, N, A, I}) where {T,N,A,I} = AcceleratedArray{T, N, A, I}(copy(a.parent), a.index)

Efficient getindex implementations for subsetting AcceleratedArrays

Hi @andyferris - this project looks great, and is something I am considering extending / building on. I already have some hacky implementations of something similar, but not as nicely integrate and as general.

What are you thoughts on adding a family of getindex(A::AcceleratedVector, idx::AbstractVector{Int}) methods? I.e. indexing into AcceleratedVectors with integer StepRanges, integer Vectors, etc, and returning AcceleratedVectors of the same type.

For SortIndex, this would be quite easy. When indexing with StepRanges, we would have to check the step is positive. For indexing with arrays, we would have to check the array is sorted first. Since we would be constructing SortIndexes from known-sorted arrays, it would be good to have a SortIndex constructor that doesn't do any checks. This might also be useful generally.

For HashIndex, the hash table would have to be modified, but there are likely lots of optimisations / shortcuts to minimise the work. For example, when indexing with UnitRange{Int}s, we could add to a global integer offset that is subtracted from the values in the Dict when they are accessed. Similar to the above, we might want a more direct HashIndex constructor that accepts a pre-built dictionary & offset, etc.

I'd be happy to do some initial work on this after hearing your thoughts.

background/status of spatialindex branch?

I saw there is a spatialindex branch with one commit and was curious what the background/status of that branch is?
I asked about it on Slack a while back but nobody seemed to know about it.

findfirst not dispatching correctly for HashIndex & UniqueHashIndex?

Performance was much worse than expected for the UniqueHashIndex that I was using, so I started digging into it. I could be doing something wrong, but from the below it doesn't look like findfirst is dispatching to the AcceleratedArrays methods, and instead falling back to the base one?

julia> using AcceleratedArrays

julia> a1 = accelerate(["string$i" for i in 1:1000], UniqueHashIndex);

julia> a2 = accelerate(["string$i" for i in 1:1000], HashIndex);

julia> @which findfirst(==("string940"), a2)
findfirst(testf::Function, A::Union{AbstractString, AbstractArray}) in Base at array.jl:1832

julia> @which findfirst(==("string940"), a1)
findfirst(testf::Function, A::Union{AbstractString, AbstractArray}) in Base at array.jl:1832

julia> @btime findfirst(==("string940"), a1)
  4.276 μs (3 allocations: 64 bytes)
940

julia> @btime findfirst(==("string940"), a2)
  4.195 μs (3 allocations: 64 bytes)
940

julia> @btime a2.index.dict["string940"]
  128.458 ns (3 allocations: 64 bytes)
1-element Array{Int64,1}:
 940

julia> @btime a1.index.dict["string940"]
  131.360 ns (4 allocations: 80 bytes)
1-element SingleVector{Int64}:
 940

julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a* (2020-05-28 17:42 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 10

The timings are consistent with a naive for-loop:

julia> function findfurst(k, xs)
           @inbounds for (i, x) in enumerate(xs)
               if x == k return i end
           end
           0
       end

julia> @btime findfurst("string940", a1)
  3.908 μs (1 allocation: 16 bytes)
940

For HashIndex I can find a method here that I think should match:
https://github.com/andyferris/AcceleratedArrays.jl/blob/master/src/HashIndex.jl#L46
although can't find the equivalent for UniqueHashIndex.

Interval <: IntervalSets.Domain?

Even if this package's interval isn't the same as that in IntervalSets.jl, could it at least be an appropriate subtype?

Ref mcabbott/AxisKeys.jl#13, where that package could use this information to route AcceleratedArrays.Interval back to this package's findall(in(Interval), ...).

Register 0.2.2 0.3.0

It seems we haven't created a release for ages, and the registry version still doesn't work well on Julia 1.2. Let's fix that.

@JuliaRegistrator register

Distributed Indexes

For indexing distributed arrays, there are two invariants to think of:

A DArray is a chain of subarrays
The subarrays are on other processes

Maybe the indexes are to be kept on the processes in the interest of reducing communication.

You can assume there is a way to do distributed sort and sortperm.

What should be stored on the master process such that it knows enough to dispatch groupby and join operations efficiently?

Unrelated question: Can we use these operations to implement distributed arrays itself? (e.g. matrix multiply is a groupjoin on the remote references to the subarrays)

Thoughts?

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

andyferris / acceleratedarrays.jl Goto Github PK

acceleratedarrays.jl's People

Contributors

Stargazers

Watchers

Forkers

acceleratedarrays.jl's Issues

Typo in `isgreater`

Method choice given multiple accelerators

Implement copy

Efficient getindex implementations for subsetting AcceleratedArrays

background/status of spatialindex branch?

findfirst not dispatching correctly for HashIndex & UniqueHashIndex?

Interval <: IntervalSets.Domain?

Register 0.2.2 0.3.0

Distributed Indexes

TagBot trigger issue

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent