Git Product home page Git Product logo

acceleratedarrays.jl's People

Contributors

andyferris avatar fredrikekre avatar juliatagbot avatar kcajf avatar nickrobinson251 avatar visr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

acceleratedarrays.jl's Issues

Typo in `isgreater`

function isgreater(a, b)
isless(b, a) || isequal(a, b)
end

Should be isgreater(a, b) = isless(b, a).

It's correct in the README though!

* `isgreater(a, b) = isless(b, a)`

In practice I think the tests don't catch this because they're only looking at the implementations of findall and findlast for these on UniqueSortIndex, which doesn't actually depend on the implementation.

Method choice given multiple accelerators

using AcceleratedArrays
using BenchmarkTools

n = 1_000_000
xs = rand(1:big(n)^4, n)

a = accelerate(xs, UniqueHashIndex)
b = accelerate(xs, UniqueSortIndex)
ab = accelerate(a, UniqueSortIndex)
ba = accelerate(b, UniqueHashIndex)


@btime 1 in $a;
@btime 1 in $b;
@btime 1 in $ab;
@btime 1 in $ba;


julia> @btime 1 in $a;
  129.609 ns (2 allocations: 40 bytes)

julia> @btime 1 in $b;
  79.384 ns (0 allocations: 0 bytes)

julia> @btime 1 in $ab;
  203.664 ns (0 allocations: 0 bytes)

julia> @btime 1 in $ba;
  119.543 ns (2 allocations: 40 bytes)

For an array with multiple accelerators, how does it decide which is the best method for a function?

Implement copy

julia> a = accelerate([1, 2, 3], UniqueSortIndex)
3-element Array{Int64,1} + UniqueSortIndex:
 1
 2
 3

julia> copy(a)
3-element Array{Int64,1}:
 1
 2
 3

Unless I'm missing some subtlety where this could create an invalid state, this seems to work?

Base.copy(a::AcceleratedArray{T, N, A, I}) where {T,N,A,I} = AcceleratedArray{T, N, A, I}(copy(a.parent), a.index)

Efficient getindex implementations for subsetting AcceleratedArrays

Hi @andyferris - this project looks great, and is something I am considering extending / building on. I already have some hacky implementations of something similar, but not as nicely integrate and as general.

What are you thoughts on adding a family of getindex(A::AcceleratedVector, idx::AbstractVector{Int}) methods? I.e. indexing into AcceleratedVectors with integer StepRanges, integer Vectors, etc, and returning AcceleratedVectors of the same type.

For SortIndex, this would be quite easy. When indexing with StepRanges, we would have to check the step is positive. For indexing with arrays, we would have to check the array is sorted first. Since we would be constructing SortIndexes from known-sorted arrays, it would be good to have a SortIndex constructor that doesn't do any checks. This might also be useful generally.

For HashIndex, the hash table would have to be modified, but there are likely lots of optimisations / shortcuts to minimise the work. For example, when indexing with UnitRange{Int}s, we could add to a global integer offset that is subtracted from the values in the Dict when they are accessed. Similar to the above, we might want a more direct HashIndex constructor that accepts a pre-built dictionary & offset, etc.

I'd be happy to do some initial work on this after hearing your thoughts.

background/status of spatialindex branch?

I saw there is a spatialindex branch with one commit and was curious what the background/status of that branch is?
I asked about it on Slack a while back but nobody seemed to know about it.

findfirst not dispatching correctly for HashIndex & UniqueHashIndex?

Performance was much worse than expected for the UniqueHashIndex that I was using, so I started digging into it. I could be doing something wrong, but from the below it doesn't look like findfirst is dispatching to the AcceleratedArrays methods, and instead falling back to the base one?

julia> using AcceleratedArrays

julia> a1 = accelerate(["string$i" for i in 1:1000], UniqueHashIndex);

julia> a2 = accelerate(["string$i" for i in 1:1000], HashIndex);

julia> @which findfirst(==("string940"), a2)
findfirst(testf::Function, A::Union{AbstractString, AbstractArray}) in Base at array.jl:1832

julia> @which findfirst(==("string940"), a1)
findfirst(testf::Function, A::Union{AbstractString, AbstractArray}) in Base at array.jl:1832

julia> @btime findfirst(==("string940"), a1)
  4.276 μs (3 allocations: 64 bytes)
940

julia> @btime findfirst(==("string940"), a2)
  4.195 μs (3 allocations: 64 bytes)
940

julia> @btime a2.index.dict["string940"]
  128.458 ns (3 allocations: 64 bytes)
1-element Array{Int64,1}:
 940

julia> @btime a1.index.dict["string940"]
  131.360 ns (4 allocations: 80 bytes)
1-element SingleVector{Int64}:
 940

julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a* (2020-05-28 17:42 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 10

The timings are consistent with a naive for-loop:

julia> function findfurst(k, xs)
           @inbounds for (i, x) in enumerate(xs)
               if x == k return i end
           end
           0
       end

julia> @btime findfurst("string940", a1)
  3.908 μs (1 allocation: 16 bytes)
940

For HashIndex I can find a method here that I think should match:
https://github.com/andyferris/AcceleratedArrays.jl/blob/master/src/HashIndex.jl#L46
although can't find the equivalent for UniqueHashIndex.

Interval <: IntervalSets.Domain?

Even if this package's interval isn't the same as that in IntervalSets.jl, could it at least be an appropriate subtype?

Ref mcabbott/AxisKeys.jl#13, where that package could use this information to route AcceleratedArrays.Interval back to this package's findall(in(Interval), ...).

Distributed Indexes

For indexing distributed arrays, there are two invariants to think of:

  1. A DArray is a chain of subarrays
  2. The subarrays are on other processes

Maybe the indexes are to be kept on the processes in the interest of reducing communication.

You can assume there is a way to do distributed sort and sortperm.

What should be stored on the master process such that it knows enough to dispatch groupby and join operations efficiently?

Unrelated question: Can we use these operations to implement distributed arrays itself? (e.g. matrix multiply is a groupjoin on the remote references to the subarrays)

Thoughts?

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.