juliagpu / arrayfire.jl Goto Github PK

Julia wrapper for the ArrayFire library

License: Other

Julia 99.95% C 0.05%

julia arrayfire cuda-backend opencl-backend arrayfire-library machine-learning

arrayfire.jl's Introduction

ArrayFire.jl

ArrayFire is a library for GPU and accelerated computing. ArrayFire.jl wraps the ArrayFire library for Julia, and provides a Julia interface.

Installation

Install ArrayFire library: either download a binary from the official site, or you can build from source.

In Julia 1.0 and up:

] add ArrayFire

Simple Usage

Congratulations, you've now installed ArrayFire.jl! Now what can you do?

Let's say you have a simple Julia array on the CPU:

a = rand(10, 10)

You can transfer this array to the device by calling the AFArray constructor on it.

using ArrayFire  # Don't forget to load the library
ad = AFArray(a)

Now let us perform some simple arithmetic on it:

bd = (ad + 1) / 5

Of course, you can do much more than just add and divide numbers. Check the supported functions section for more information.

Now that you're done with all your device computation, you can bring your array back to the CPU (or host):

b = Array(bd)

Here are other examples of simple usage:

using ArrayFire, LinearAlgebra

# Random number generation
a = rand(AFArray{Float64}, 100, 100)
b = randn(AFArray{Float64}, 100, 100)

# Transfer to device from the CPU
host_to_device = AFArray(rand(100,100))

# Transfer back to CPU
device_to_host = Array(host_to_device)

# Basic arithmetic operations
c = sin(a) + 0.5
d = a * 5

# Logical operations
c = a .> b
any_trues = any(c)

# Reduction operations
total_max = maximum(a)
colwise_min = min(a,2)

# Matrix operations
determinant = det(a)
b_positive = abs(b)
product = a * b
dot_product = a .* b
transposer = a'

# Linear Algebra
lu_fact = lu(a)
cholesky_fact = cholesky(a*a')  # Multiplied to create a positive definite matrix
qr_fact = qr(a)
svd_fact = svd(a)

# FFT
fast_fourier = fft(a)

The Execution Model

ArrayFire.jl introduces an AFArray type that is a subtype of AbstractArray. Operations on AFArrays create other AFArrays, so data always remains on the device unless it is specifically transferred back. This wrapper provides a simple Julian interface that aims to mimic Base Julia's versatility and ease of use.

REPL Behaviour: On the REPL, whenever you create an AFArray, the REPL displays the values, just like in Base Julia. This happens because the showarray method is overloaded to ensure that every time it is needed to display on the REPL, values are transferred from device to host. This means that every single operation on the REPL involves an implicit memory transfer. This may lead to some slowdown while working interactively depending on the size of the data and memory bandwidth available. You can use a semicolon (;) at the end of each statement to disable displaying and avoid that memory transfer. Also, note that in a script, there would be no memory transfer unless a display function is explicitly called (or if you use the Array constructor like in the above example).

Async Behaviour: arrayfire is an asynchronous library. This essentially means that whenever you call a particular function in ArrayFire.jl, it would return control to the host almost immediately (which in this case in Julia) and continue executing on the device. This is pretty useful because it would mean that host code that's independent of the device can simply execute while the device computes, resulting in better real world performance.

The library also performs some kernel fusions on elementary arithmetic operations (see the arithmetic section of the Supported Functions). arrayfire has an intelligent runtime JIT compliation engine which converts array expressions into the smallest number of OpenCL/CUDA kernels. Kernel fusion not only decreases the number of kernel calls, but also avoids extraneous global memory operations. This asynchronous behaviour ends only when a non-JIT operation is called or an explicit synchronization barrier sync(array) is called.

A note on benchmarking: In Julia, one would use the @time macro to time execution times of functions. However, in this particular case, @time would simply time the function call, and the library would execute asynchronously in the background. This would often lead to misleading timings. Therefore, the right way to time individual operations is to run them multiple times, place an explicit synchronization barrier at the end, and take the average of multiple runs.

Also, note that this does not affect how the user writes code. Users can simply write normal Julia code using ArrayFire.jl and this asynchronous behaviour is abstracted out. Whenever the data is needed back onto the CPU, an implicit barrier ensures that the computatation is complete, and the values are transferred back.

Operations between CPU and device arrays: Consider the following code. It will return an error:

a = rand(Float32, 10, 10)
b = AFArray(a)
a - b # Throws Error

This is because the two arrays reside in different regions of memory (host and device), and for any coherent operation to be performed, one array would have to be transferred to other region in memory. ArrayFire.jl does not do this automatically for performance considerations. Therefore, to make this work, you would have to manually transfer one of the arrays to the other memory. The following operations would work:

a - Array(b) # Works!
AFArray(a) - b # This works too!

A note on correctness: Sometimes, ArrayFire.jl and Base Julia might return marginally different values from their computation. This is because Julia and ArrayFire.jl sometimes use different lower level libraries for BLAS, FFT, etc. For example, Julia uses OpenBLAS for BLAS operations, but ArrayFire.jl would use clBLAS for the OpenCL backend and CuBLAS for the CUDA backend, and these libraries might not always the exact same values as OpenBLAS after a certain decimal point. In light of this, users are encouraged to keep testing their codes for correctness.

A note on performance: Some operations can be slow due to Base's generic implementations. This is intentional, to enable a "make it work, then make it fast" workflow. When you're ready you can disable slow fallback methods:

julia> allowslow(AFArray, false)
julia> xs[5]
ERROR: getindex is disabled

Supported Functions

Creating AFArrays

rand, randn, convert, diagm, eye, range, zeros, ones, trues, falses
constant, getSeed, setSeed, iota

Arithmetic

+, -, *, /, ^, &, $, |
.+, .-, .*, ./, .>, .>=, .<, .<=, .==, .!=,
complex, conj, real, imag, max, min, abs, round, floor, hypot
sigmoid
signbit (works only in vectorized form on Julia v0.5 - Ref issue #109)

Linear Algebra

cholesky, svd, lu, qr, svdfact!, lufact!, qrfact!
*(matmul), A_mul_Bt, At_mul_B, At_mul_Bt, Ac_mul_B, A_mul_Bc, Ac_mul_Bc
transpose, transpose!, ctranspose, ctranspose!
det, inv, rank, norm, dot, diag, \
isLAPACKAvailable, chol!, solveLU, upper, lower

Signal Processing

fft, ifft, fft!, ifft!
conv, conv2
fftC2R, fftR2C, conv3, convolve, fir, iir, approx1, approx2

Statistics

mean, median, std, var, cov
meanWeighted, varWeighted, corrcoef

Vector Algorithms

sum, min, max, minimum, maximum, findmax, findmin
countnz, any, all, sort, union, find, cumsum, diff
sortIndex, sortByKey, diff2, minidx, maxidx

Backend Functions

get_active_backend, get_backend_count, get_available_backends, set_backend, get_backend_id, sync, get_active_backend_id

Device Functions

get_device, set_device, get_device_count

Image Processing

scale, hist
loadImage, saveImage
isImageIOAvailable
colorspace, gray2rgb, rgb2gray, rgb2hsv, rgb2ycbcr, ycbcr2rgb, hsv2rgb
regions, SAT
bilateral, maxfilt, meanshift, medfilt, minfilt, sobel, histequal
resize, rotate, skew, transform, transformCoordinates, translate
dilate, erode, dilate3d, erode3d, gaussiankernel

Computer Vision

orb, sift, gloh, diffOfGaussians, fast, harris, susan, hammingMatcher, nearestNeighbour, matchTemplate

Performance

ArrayFire was benchmarked on commonly used operations.

Another interesting benchmark is Non-negative Matrix Factorization:

CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz.

GPU: GRID K520, 4096 MB, CUDA Compute 3.0.

ArrayFire v3.4.0

The benchmark scripts are in the benchmark folder, and be run from there by doing by doing:

include("benchmark.jl")
include("nmf_benchmark.jl")

Backends

There are three backends in ArrayFire.jl:

CUDA Backend
OpenCL Backend
CPU Backend

There is yet another backend which essentially allows the user to switch backends at runtime. This is called the unified backend. ArrayFire.jl starts up with the unified backend.

If the backend selected by ArrayFire by default (depends on the available drivers) is not the desired one (depending on the available hardware), you can override the default by setting the environment variable $JULIA_ARRAYFIRE_BACKEND before starting Julia (more specifically, before loading the ArrayFire module). Possible values for $JULIA_ARRAYFIRE_BACKEND are cpu, cuda and opencl.

You may also change the backend at runtime via, e.g., set_backend(AF_BACKEND_CPU) (resp. AF_BACKEND_CUDA or AF_BACKEND_OPENCL). The unified backend isn't a computational backend by itself but represents an interface to switch between different backends at runtime. ArrayFire.jl starts up with the unified backend, but get_active_backend() will return either a particular default backend, depending on how you have installed the library. For example, if you have built ArrayFire.jl with the CUDA backend, get_active_backend() will return AF_BACKEND_CUDA backend.

Troubleshooting

ArrayFire.jl isn't working! What do I do?

Error loading libaf

Try adding the path to libaf to your LD_LIBRARY_PATH.

ArrayFire Error (998): Internal Error whenever you call rand

If you're using the CUDA backend, try checking if libcudart and libnvvm are both in your LD_LIBRARY_PATH. This is because libafcuda will try to link to these libraries when it loads into Julia. If they're not in your system, install CUDA for your platform.

ArrayFire.jl loads, but a = rand(AFArray{Float32}, 10) is stuck.

If you want to use the CUDA backend, check if you have installed CUDA for your platform. If you've installed CUDA, simply downloaded a binary and it still doens't work, try adding libnvvm, libcudart to your path.

ArrayFire.jl does not work with Atom.

Create a file in your home directory called .juliarc.jl and write ENV["LD_LIBRARY_PATH"] = "/usr/local/lib/" (or the path to libaf) in it. Atom should now be able to load it.

ERROR: ArrayFire Error (401) : Double precision not supported for this device

This error message pops up on devices that do not support double precision: a good example would be the Iris Pro on Macbooks. If you get this message, you should work with single precision. For example, if you're generating random numbers directly on the device, the correct usage in this scenario would be rand(AFArray{Float32}, 10) instead of rand(AFArray{Float64}, 10).

arrayfire.jl's People

Contributors

Stargazers

Watchers

arrayfire.jl's Issues

FFT differences between CPU and ArrayFire (OpenCL, GPU)

julia> a = rand(Float32, 100,100);

julia> b = AFArray(a);

julia> fft(a)-fft(b);
ERROR: BoundsError: attempt to access (100,100)
  at index [3]
 in - at arraymath.jl:97

Need to write tests

A lot of tests are needed.

`inv` and `det` do not support integer types

a = rand(AFArray{Int32}, 10, 10)
det(a)
inv(a)

Both det and inv return with a ArrayFire Exception(202): Invalid input argument

Windows support

If it doesn't work on Windows, would be great to add one sentence to the README stating so (and saving people time to figure it out themselves).

Longer term it would of course be great if this had first class windows support, i.e. including downloading of the arrayfire binaries via BinDeps etc.

Error showing value of type ArrayFire.AFArray{Float32,2}:

On v0.5rc2

ERROR: MethodError: no method matching print_matrix(::IOContext{Base.Terminals.TTYTerminal}, ::ArrayFire.AFArray{Float32,2}, ::String, ::String, ::String)
Closest candidates are:
  print_matrix(::IO, ::Union{AbstractArray{T,1},AbstractArray{T,2}}, ::AbstractString, ::AbstractString, ::AbstractString) at show.jl:1378
  print_matrix(::IO, ::Union{AbstractArray{T,1},AbstractArray{T,2}}, ::AbstractString, ::AbstractString, ::AbstractString, ::AbstractString) at show.jl:1378
  print_matrix(::IO, ::Union{AbstractArray{T,1},AbstractArray{T,2}}, ::AbstractString, ::AbstractString, ::AbstractString, ::AbstractString, ::AbstractString) at show.jl:1378
  ...
 in #showarray#254(::Bool, ::Function, ::IOContext{Base.Terminals.TTYTerminal}, ::ArrayFire.AFArray{Float32,2}, ::Bool) at .\show.jl:1617
 in display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::ArrayFire.AFArray{Float32,2}) at .\REPL.jl:132
 in display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::ArrayFire.AFArray{Float32,2}) at .\REPL.jl:135
 in display(::ArrayFire.AFArray{Float32,2}) at .\multimedia.jl:143
 in print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at .\REPL.jl:154
 in print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at .\REPL.jl:139
 in (::Base.REPL.##22#23{Bool,Base.REPL.##33#42{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at .\REPL.jl:652
 in run_interface(::Base.Terminals.TTYTerminal, ::Base.LineEdit.ModalInterface) at .\LineEdit.jl:1579
 in run_frontend(::Base.REPL.LineEditREPL, ::Base.REPL.REPLBackendRef) at .\REPL.jl:903
 in run_repl(::Base.REPL.LineEditREPL, ::Base.##852#853) at .\REPL.jl:188
 in _start() at .\client.jl:360

Also https://github.com/JuliaComputing/ArrayFire.jl/blob/master/src/config.jl#L8
should be pulled out of the if statement

Should hook up gc

I should hook up AFArrays to the finalizer.

Getting an arbitrary sub-array

Is there an ability to get specific sub-arrays from an AFArray?

I know you can use ranges to get contiguous or regularly spaced sub-arrays, but I get an error when I try to get an arbitrary sub-array using a vector of coordinates, for instance if I had an AFArray A which was 10x10 and I wanted the columns [1,2,9], I would normally do

A[:,[1,2,9]]

but this doesn't work. Looking at the list of indexing functions on the ArrayFire documentation website I believe the cols function is what I want, has this not be wrapped yet, or is there a way of doing this in Julia?

a*a' not working

Where a is a vector

a bug in ones

julia> ones(AFArray{Float32},10,10)
10x10 ArrayFire.AFArray{Float32,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

How to run loops efficiently?

So this might be related to #27, but the only way I seem to be able to run loops which don't degrade in performance is to call gc() after every iteration of the loop. Otherwise the more loops I go through the more things slow down. For instance:

function f()
    r = AFArray(zeros(Float32, 100, 100000))
    a = AFArray(rand(Float32, 100, 100000))
    for d in 1:100:90000
        r[:,d:d+99] = a[:,d:d+99] .* a[:,d:d+99]
    end
    nothing
end

for _ in 1:15
    @time f()
end

I have to garbage collect in the for-loop inside f in order to avoid increasingly degraded performance the more the more times I run f, and since running garbage collection so often slows things down immensely, I'm not really sure what to do.

Getting active backend call fails

julia> Pkg.test("ArrayFire")
INFO: Testing ArrayFire
INFO: ArrayFire tests passed

julia> getActiveBackend()
ERROR: ccall: could not find function af_get_active_backend in library libaf
 in af_get_active_backend at /Users/tamasnagy/.julia/v0.4/ArrayFire/src/wrap.jl:955
 in getActiveBackend at /Users/tamasnagy/.julia/v0.4/ArrayFire/src/backend.jl:14

julia> AFInfo()
ArrayFire v3.2.2 (OpenCL, 64-bit Mac OSX, build 7507b61)
[0] APPLE   : Iris Pro
-1- APPLE   : GeForce GT 750M

julia> a = rand(AFArray{Float32}, 10)
10-element ArrayFire.AFArray{Float32,1}:
 0.410738
 0.822371
 0.9518
 0.179365
 0.419824
 0.00807349
 0.377542
 0.302661
 0.645568
 0.559079

julia> versioninfo()
Julia Version 0.4.5
Commit 2ac304d* (2016-03-18 00:58 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin15.4.0)
  CPU: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

I'm using a fresh install of the Homebrew version of ArrayFire and the latest version of ArrayFire.

Why is the supertype of AFAbstractArray 4-dimensional?

I really like the idea of using this package, but I'm slightly troubled that the ultimate supertype of all AFArrays (via AFAbstractArray) is a 4-dimensional AbstractArray if I'm reading the code correctly. This seems (to me) both counterintuitive and unhelpful in terms of handling it in a standard way. I'm sure there's a good reason for it (it might be helpful to know what it is), but is it really necessary?

As it is, I'm not sure how to generally write functions that (say) operate on matrices, which I would like to say require an AbstractArray{T, 2}, so that they can handle ArrayFire arrays too.

mean and var test failing sometimes

Most of the time (although not always) the mean and var tests fail:

...v0.4/ArrayFire/test(master)  >> julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-unknown-linux-gnu

julia> include("runtests.jl")
Device[0] has no support for OpenGL Interoperation
ERROR: LoadError: test failed: 0.090069480240345 == 0.090069495f0
 in expression: var(ad) == var(a)
 in error at ./error.jl:21
 in default_handler at test.jl:28
 in do_test at test.jl:53
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
while loading /home/mauro/.julia/v0.4/ArrayFire/test/runtests.jl, in expression starting on line 47

julia> include("runtests.jl")
ERROR: LoadError: test failed: 0.5107571f0 == 0.51075715f0
 in expression: mean(ad) == mean(a)
 in error at ./error.jl:21
 in default_handler at test.jl:28
 in do_test at test.jl:53
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
while loading /home/mauro/.julia/v0.4/ArrayFire/test/runtests.jl, in expression starting on line 42

Also note that it is a bit strange (wrong?) that var returns a double.

I'm running on the built-in Intel HD graphics using Beignet but it also happens using the CPU backend.

ArrayFire.jl hangs

I trying out ArrayFire.jl on a cluster with GPUs. Unfortunately, I don't have access to the CUDA Toolkit so I downloaded the compiled version of ArrayFire instead. I think the installation succeeded but I don't think I can compile the examples in ArrayFire without CUDA so I'm not completely sure. (The GPU driver should work, e.g. in MATLAB). I can load ArrayFire.jl but when running a command it just hangs, e.g.

[anoack@gpu-1 ~]$ julia/julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.6-pre+28 (2016-04-22 00:59 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 022917e* (40 days old release-0.4)
|__/                   |  x86_64-redhat-linux

julia> using ArrayFire

julia> A = rand(AFArray{Float64}, 100, 100)
^CERROR: InterruptException:
 in af_randu at /home/gridsan/anoack/.julia/v0.4/ArrayFire/src/wrap.jl:1033
 in rand at /home/gridsan/anoack/.julia/v0.4/ArrayFire/src/create.jl:7

Any ideas about how to debug this?

Is it possible to interact ArrayFire with cuBLAS?

Since the native ArrayFire dosen't support low-level BLAS operations like gemm, axpy and symv, it'll be great to glue ArrayFire and cuBLAS together. However, is this feasible?

Failed to load dynamic library.

With cuda 8.0 & 1080 I keep seeing this error message.
How could I fix this problem?

julia> setBackend(AF_BACKEND_CPU)
true

julia> setBackend(AF_BACKEND_CUDA)
ERROR: ArrayFire Error (501) : Failed to load dynamic library.
in af_set_backend at /home/bluehope/.julia/v0.4/ArrayFire/src/wrap.jl:941
in setBackend at /home/bluehope/.julia/v0.4/ArrayFire/src/backend.jl:68

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2682 G /usr/lib/xorg/Xorg 49MiB |
| 0 25894 C julia_0.4 141MiB |

README needs to say how to install

It would be nice if the README could have instructions for installation, and also examples.

getActiveBackend() fails

On my Mac using julia 0.4.6 I'm getting the following error

julia> using ArrayFire
julia> Pkg.test("ArrayFire")
INFO: Testing ArrayFire
INFO: ArrayFire tests passed
julia> getActiveBackend()
ERROR: ccall: could not find function af_get_active_backend in library libaf
 in af_get_active_backend at /Users/lruthot/.julia/v0.4/ArrayFire/src/wrap.jl:955
 in getActiveBackend at /Users/lruthot/.julia/v0.4/ArrayFire/src/backend.jl:14

Compatibility with other JuliaGPU packages

It would be nice if ArrayFire could easily interface with other JuliaGPU packages such as CUBLAS etc. for the in-place BLAS and finer lower-level control.

Dispatch issue

Julia dispatch issue:
Let’s say we have AFArray <: AbstractArray{T,4}

now there are methods that are dispatched across AFArray and AFAbstractArray

f1 = sum{T,N}(a::AbstractArray{T<:Any, T<:Any})
f2 = sum{T}(a::AFArray{T})

now suppose a is an AFArray, and I call sum(a), Julia should call f2, but doesn't.

GLFW wasn't able to initalize

What does this mean?

julia> using ArrayFire

julia> AFArray(zeros(10,10))
ERROR: GLFW wasn't able to initalize
10x10 ArrayFire.AFArray{Float64,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

set_device and release_array

As per the suggestion of ArrayFire folk, I'm posting some questions/answers here.
I built CUDA 8 and ArrayFire on Xenial from source, and a number of symlinks were causing issues during the build. It was probably my own fault. Notably though, I had to add /usr/local/cuda/nvvm/lib64 explicitly to ld.so.conf.d because the installer didn't do that automatically and CUDA just silently failed to load. Two ideas came up in my first minutes of testing:

Since my machine has 4 GPUs it'd be nice to be able to switch using af_set_device() from julia. For the time being setting the env variable AF_OPENCL_DEFAULT_DEVICE=3 is a workaround.
memory does run out and it would be nice to have af_release_array()

Thanks for this great package!

Can't get ArrayFire working

Ubuntu 16.04
Julia 0.4.5

lspci 01:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)

sudo ldconfig -v|grep af
    libaf.so.3 -> libaf.so.3.3.2
    libafopencl.so.3 -> libafopencl.so.3.3.2
    libafcuda.so.3 -> libafcuda.so.3.3.2
    libafcpu.so.3 -> libafcpu.so.3.3.2

julia> using ArrayFire

julia> getAvailableBackends()
None

julia> Pkg.test("CUDArt")
INFO: Testing CUDArt
INFO: CUDArt tests passed

Pkg.test("ArrayFire") and any attempts to use device arrays get stuck. Ideas?

Kaj

Examples

Can you provide with a broader set of examples ?

Linear Algebra routines are slow

Some linear algebra routines are slow on ArrayFire.

a = rand(1000, 1000) #Generate double precision random values
ad = AFArray(a) #Transfer to GPU
@time svd(a); # CPU
0.487003 seconds (43 allocations: 53.529 MB, 0.69% gc time)
@time svd(ad); # GPU
5.481788 seconds (14 allocations: 336 bytes)
@time lu(a); #CPU
0.023986 seconds (38 allocations: 22.905 MB, 14.06% gc time)
@time lu(ad); # GPU
0.057869 seconds (17 allocations: 384 bytes)
@time qr(a); # CPU
0.113873 seconds (46 allocations: 31.068 MB, 2.55% gc time)
@time qr(ad); #GPU
0.891739 seconds (14 allocations: 336 bytes)

digamma and trigamma functions

Does anyone know if the digamma and trigamma functions will ever be implemented?

I fear this might be something which has to be handled by the original ArrayFire and not just by the wrapper, but I thought I'd ask since I've been unable able to design a sufficiently efficient workaround.

Installation Help

Since I have just changed from windows to linux, I'm not familiar with software installation on linux. Here's the error I encountered

ERROR: LoadError: LoadError: could not load library "libaf"
libaf: cannot open shared object file: No such file or directory
 in dlopen at ./libdl.jl:36
 in dlopen at libdl.jl:36
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 in require at ./loading.jl:259
while loading /home/astupidbear/.julia/v0.4/ArrayFire/src/config.jl, in expression starting on line 6
while loading /home/astupidbear/.julia/v0.4/ArrayFire/src/ArrayFire.jl, in expression starting on line 5

Could you help me? Thanks in advance!

sign bug

julia> ju=[0.5,-0.5];

julia> af=AFArray(ju);

julia> sign(ju)
2-element Array{Float64,1}:
  1.0
 -1.0

julia> sign(af)
2-element ArrayFire.AFArray{Float64,1}:
 0.0
 1.0

 # if this is due to the definition of sign in ArrayFire, I cannot rescue
 # this problem
 julia> -af
2-element ArrayFire.AFArray{Float64,1}:
  0.5
 -0.5

julia> 0-af
2-element ArrayFire.AFArray{Float64,1}:
  0.5
 -0.5

julia> 2*sign(af)-1
2-element ArrayFire.AFArray{Float64,1}:
 -1.0
  1.0

julia> 1-2*sign(af)
2-element ArrayFire.AFArray{Float64,1}:
 -1.0
  1.0

Those are fatal to my project !
I think it's time to write test for all functions.

Feature Request: Extend Base.LinAlg functions

For numerical linear algebra it would be very useful to extend the implementations of Base.LinAlg.scale! and Base.LinAlg.axpy! etc. Right now, everything goes back to the implementation for AbstractArrays (which is terribly slow).

What I'm looking for is, for example:

using ArrayFire
using Base.Test
a = map(Float32,randn(128,128,128));
ad = AFArray(a)
t1d = Array( Base.LinAlg.axpy!(Float32(1.2),ad,ad) )
t1   = Base.LinAlg.axpy!(Float32(1.2),a  ,a  )
t2d = Array( Base.LinAlg.scale!(ad,Float32(2.3)))
t2   = Base.LinAlg.scale!(a, Float32,2.3))
@test_approx_eq t1 t1d
@test_approx_eq t2 t2d

How to run the benchmarks?

Is there a way to run the benchmark and generate the graph you have in the README? Would be nice to have a script that does it for all available backends.

Cannot use ArrayFire in Atom

I have added export LD_LIBRARY_PATH=/usr/local/lib/ and it works when I use ArrayFire.jl in terminal. However in Atom, I got the following errors:

In-place BLAS commands?

Are the BLAS commands axpy!, scal!, etc. implemented?

Strange pivots array in `lufact`

When one does

lu(a::AFAbstractArray)

one gets L, U, P, but P doesn't return the same values as P returned from lu(a::Array)

matrix multiplication doesn't support bit arrays.

in Julia, matrix multiplication for bits works just fine.

julia> x = [1; 2; 3; 4]
4-element Array{Int64,1}:
 1
 2
 3
 4

julia> y = x .>2
4-element BitArray{1}:
 false
 false
  true
  true

julia> x' * y
1-element Array{Int64,1}:
 7

with ArrayFire, it doesn't seem to

julia> x = AFArray([1; 2; 3; 4])
4-element ArrayFire.AFArray{Int64,1}:
 1
 2
 3
 4

julia> y = x .> 2
4-element ArrayFire.AFArray{Bool,1}:
 false
 false
  true
  true

julia> x' * y
ERROR: "ArrayFire Error (205) : Input types are not the same"
 in af_matmul at C:\Users\Blair\.julia\v0.4\ArrayFire\src\wrap.jl:978

weirdly enough, .*s work just fine.

julia> x .* y
4-element ArrayFire.AFArray{Int64,1}:
 0
 0
 3
 4

Is this a bug? How do I make matrix multiplication to work?

Cannot use clamp on AFArray

try this code

using ArrayFire
# AFArray
af=zeros(AFArray{Float32},10,10);
clamp(af,0.2,0.8)
af[1]
# Julia Array
ju=zeros(10,10);
clamp(ju,0.2,0.8);
ju[1]

You can see that the origin of this error lies in that af[1] gives an array not a number.
I could write a function

import Base.clamp!
function clamp!(x::AFArray, lo, hi)
    @inbounds for i in length(x)
        x[i] = clamp(Array(x[i])[1], lo, hi)
    end
    x
end

However, this will affect performance. Is there a better way to deal with this?

ArrayFire is unreasonably faster than julia!

Look at the following code

using Devectorize
using ArrayFire

julia(x,y)=x.^4.*y.^3+10.*x.^2
function devectorize(x,y)
  @devec r=x.^4.*y.^3+10.*x.^2
end

x=randn(10000000);y=randn(10000000);
xaf=AFArray(x);yaf=AFArray(y);
julia(x,y);devectorize(x,y);julia(xaf,yaf);
@time a=julia(x,y);@time c=julia(xaf,yaf);@time b=devectorize(x,y);
@assert a==b
@assert a==Array(c)

The result is (cpu)

julia> @time a=julia(x,y);@time c=julia(xaf,yaf);@time b=devectorize(x,y);
  5.534964 seconds (27 allocations: 457.764 MB, 2.95% gc time)
  0.000339 seconds (152 allocations: 4.250 KB)
  4.853867 seconds (8 allocations: 76.294 MB, 0.02% gc time)

How could you explain this?

Internal Error

It works fine when I use only cpu on my local machine. However, after I installed ArrayFire on a remote server with cuda support,

julia> using ArrayFire

julia> zeros(AFArray{Float32},10,10);
ERROR: "ArrayFire Error (998) : Internal error"
 in af_constant! at /home/rluser/.julia/v0.4/ArrayFire/src/wrap.jl:989
 in constant at /home/rluser/.julia/v0.4/ArrayFire/src/create.jl:41
 in zeros at /home/rluser/.julia/v0.4/ArrayFire/src/create.jl:111

ERROR

Instructions for setting up a GPU Instance on Amazon for testing

sudo apt-get update
sudo apt-get install linux-generic linux-headers-$(uname -r)
sudo apt-get install bzip2 gcc gfortran git g++ make m4 ncurses-dev cmake libedit-dev libz-dev
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install cuda
sudo reboot

Now verify that installation was successful:

cd /usr/local/cuda-7.0/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

To get julia/Cxx follow the instructions in the Cxx.jl README.
To install ArrayFire:

sudo add-apt-repository ppa:george-edison55/cmake-3.x
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install libboost-all-dev
sudo apt-get install libglew-dev libglewmx-dev
sudo apt-get install libfreeimage-dev
sudo apt-get install opencl-dev
cmake -G Ninja ../../src/arrayfire/ -DCMAKE_BUILD_TYPE=Release -DBUILD_CUDA=ON -DBUILD_OPENCL=ON -DBUILD_CPU=ON -DCBLAS_LIBRARIES:STRING="/home/ubuntu/julia/usr/lib/libopenblas.so" -DFFTW_ROOT:STRING="/home/ubuntu/julia/usr" -DLAPACKE_INCLUDES:STRING="/home/ubuntu/julia/deps/openblas/lapack-netlib/lapacke/include" -DLAPACK_LIB:STRING="/home/ubuntu/julia/usr/lib/libopenblas.so" -DLAPACKE_LIB:STRING="/home/ubuntu/julia/usr/lib/libopenblas.so" -DGLFW_ROOT_DIR:STRING=/home/ubuntu/.julia/v0.4/GLFW/deps/usr64/ -DBLAS_SYM_FILE=/home/ubuntu/julia/deps/openblas/exports/objcopy.def

TODO List

Things I have done so far since my update on the list:

Fix arrayfire's build system to work well for us
Integrate the subarray mechanism (this turned out to be be a lot easier than expected, but the build system turned out harder, so it balances out)
A few more library wrappers

At this point we can run the example in arrayfire's README without any trouble.

Things to do:

Figure out binary distribution (ask me if you need help, it's tricky)
Wrap the remaining ArrayFire functionality
Examples
Documentation
Testing

Things to do for me:

Fix memory management - Currently it's leaking GPU memory - Should be simple
Exceptions

config.jl bug

https://github.com/JuliaComputing/ArrayFire.jl/blob/master/src/config.jl#L8
should be pulled out

sync() barrier in BlackScholes example?

The speed comparison seen in the BlackScholes example is very impressive but shouldn't there be a sync() barrier before measuring time? Since the sum function is now called outside of the timing it seems to me the timing could be misleading.

Different `tau` array in qr

The array tau that's returned looks different from what Julia returns. Why are they different?

ERROR: ArrayFire Error (401)

I installed with success ArrayFire in my computer and in Julia, and runned the test and passed whithout problems. Then I tried to follow the Simple Usage steps in readme.md, but when I try to use the constructor AFArray() it appears the following error:

julia> Pkg.build("ArrayFire")

julia> Pkg.test("ArrayFire")
INFO: Testing ArrayFire
INFO: ArrayFire tests passed

julia> a = rand(10, 10);
10x10 Array{Float64,2}:

julia> using ArrayFire

julia> ad = AFArray(a)
ERROR: ArrayFire Error (401) : Double precision not supported for this device
 in convert at /Users/claudiopierard/.julia/v0.4/ArrayFire/src/create.jl:27
 in call at /Users/claudiopierard/.julia/v0.4/ArrayFire/src/create.jl:31

Strange issue running example

When I run the example I get the following error:

~ >> julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-unknown-linux-gnu

julia> include(".julia/v0.4/ArrayFire/examples/blackscholes.jl")
Device[0] has no support for OpenGL Interoperation
ERROR: LoadError: MethodError: `call` has no method matching call(::Void, ::Type{ArrayFire.AFArray{Float32,N}}, ::Ptr{Void})
Closest candidates are:
  BoundsError(::Any...)
  TypeVar(::Any...)
  TypeConstructor(::Any...)
  ...
 in blackscholes_serial at /home/mauro/.julia/v0.4/ArrayFire/examples/blackscholes.jl:10
 in driver at /home/mauro/.julia/v0.4/ArrayFire/examples/blackscholes.jl:57
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
while loading /home/mauro/.julia/v0.4/ArrayFire/examples/blackscholes.jl, in expression starting on line 67

it goes away if I change the variable call to call_ on lines https://github.com/JuliaComputing/ArrayFire.jl/blob/e90e9eb6b68c2a1948c70f64297e4f435fe9698e/examples/blackscholes.jl#L19-20

Maybe this is a Julia bug, so let me know if I should repost there. Also, does anyone else see this?

Fast version of `AFArray(collect(1.0f0:10))`?

Not sure if this is possible to do directly on the GPU, but I want to create a vector containing 1.0f0:10.

Any ideas? I think if this is possible to do efficiently, then an override of AFArray(::Range) would also make sense. (Creating it on the CPU and copying is really slow...)

Problem installing AF on Mac OS

Hey,
I followed the instructions in README.md to install Array Fire on my Mac. brew install arrayfire did work without an issue as did Pkg.add("ArrayFire") in julia 0.4.6. When trying to test the package I get this.

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.6-pre+41 (2016-06-03 10:12 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit bc363de* (31 days old release-0.4)
|__/                   |  x86_64-apple-darwin14.5.0

julia> Pkg.build("ArrayFire")

julia> Pkg.test("ArrayFire")
INFO: Testing ArrayFire
ERROR: LoadError: LoadError: LoadError: could not load library "libaf"
dlopen(libaf.dylib, 1): image not found
 in dlopen at ./libdl.jl:36
 in dlopen at libdl.jl:36
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 in require at ./loading.jl:259
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 in process_options at ./client.jl:280
 in _start at ./client.jl:378
while loading /Users/lruthot/.julia/v0.4/ArrayFire/src/config.jl, in expression starting on line 6
while loading /Users/lruthot/.julia/v0.4/ArrayFire/src/ArrayFire.jl, in expression starting on line 5
while loading /Users/lruthot/.julia/v0.4/ArrayFire/test/runtests.jl, in expression starting on line 1
===============================================[ ERROR: ArrayFire ]================================================

failed process: Process(`/Users/lruthot/Software/julia-mkl/usr/bin/julia --check-bounds=yes --code-coverage=none --color=yes /Users/lruthot/.julia/v0.4/ArrayFire/test/runtests.jl`, ProcessExited(1)) [1]

===================================================================================================================

When looking for binaries I found

lrMacBook:~ lr$ ls /usr/local/Cellar/arrayfire/3.0.2/lib/
libafcpu.3.0.2.dylib    libafcpu.dylib      libafopencl.3.dylib libforge.dylib
libafcpu.3.dylib    libafopencl.3.0.2.dylib libafopencl.dylib

Thanks for looking into this. Let me know if you need further information about my system.

`dot` doesn't have the same signature as `Base.dot`

vv = AFArray( [1.] )
typeof(dot(vv, vv))

gives

ArrayFire.AFArray{Float64,1}

But dot should always return a scalar, not a vector.

ERROR: LoadError: "ArrayFire Error (998) : Internal error" during Pkg.test

I've only tested creating AFArrays so far, but that seems to work.

julia> Pkg.test("ArrayFire")
INFO: Testing ArrayFire
ERROR: LoadError: "ArrayFire Error (998) : Internal error"
 in convert at /home/james/.julia/v0.4/ArrayFire/src/create.jl:27
while loading /home/james/.julia/v0.4/ArrayFire/test/runtests.jl, in expression starting on line 6
==============================[ ERROR: ArrayFire ]==============================

failed process: Process(`/usr/bin/julia --check-bounds=yes --code-coverage=none --color=no /home/james/.julia/v0.4/ArrayFire/test/runtests.jl`, ProcessExited(1)) [1]

================================================================================
ERROR: ArrayFire had test errors
 in test at ./pkg/entry.jl:803
 in anonymous at ./pkg/dir.jl:31
 in cd at ./file.jl:22

How can I use function find?

Since ArrayFire dosen't support find, is there any way to get around?

c=randn(AFArray{Float32},10000000)
find(Array(c).>0);
find(Array(c.>0));

could do this but it's not efficient.