yaohuaxin / aparapi Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/aparapi
License: Other
Automatically exported from code.google.com/p/aparapi
License: Other
Everything worked,I ran examples,everyone of them. today I got 12.4 ,so I
figured i would rebuild it all.I also updates my headers 1.2, from the debian
repo
I redownloaded svn checkout http://aparapi.googlecode.com/svn/trunk aparapi
I ran ant to see if i got the same error ,I did yesterday to make sure it was
the same code. I did not get the error, so I knew something had been changed on
your end. I went to edit the build file, like I did ,the first time.
It appears to no longer be reading that build file, no matter what I put in
there it will not even throw and error now.It just keeps telling me it needs a
path to the SDK.
Someone broke the build files, please put them back the way they were. I can
not understand why people shoot themselves in the foot making changes like that
when you have working code and and pretty good directions how to build.Changing
build files just pisses people off and makes your directions not work,which
pisses them off more, they then go download CUDA.AMD started this GPU code long
before the other people even got started, I used folding at home,but all the
shooting themselves in the foot have now put them way behind,please stop the
foot shooting and move forward,it should be a given at this point the software
will build.
While it was working it was super fast! Thanks for your your time !
Original issue reported on code.google.com by [email protected]
on 10 May 2012 at 6:22
I noticed a comment in the Wiki describing the lack of struct support due to a
mismatch in the C and Java memory models.
On a different Java/JNI open-source project I am working with structs are
supported using custom annotations.
Here is a link to the documentation:
https://www.alljoyn.org/sites/default/files/alljoyn-development-guide-java-sdk.p
df
5.4.2 Complex data types using the @Position annotation
Here is some example code from the documentation:
public class ImageInfo{
@Position(0)
public String fileName;
@Position(1)
public int isoValue;
@Position(2)
public int dateTaken;
}
Is this something we would like to consider for Aparapi and OpenCL structs?
Original issue reported on code.google.com by [email protected]
on 29 Dec 2011 at 4:21
We have a use case which requires us to execute a single kernel and enter the
kernel from multiple different entry points.
For example, we need to perform an initial calculation on a dataset and store
the results on the GPU with one kernel call. Then we need to calculate a
secondary result based on the initial results with a second kernel call.
Currently, the following methods do not appear to be completely implemented:
Kernel.execute(Entry _entry, int _globalSize)
Kernel.execute(String _entryPoint, int _globalSize)
Kernel.execute(String _entryPoint, int _globalSize, int _passes)
Original issue reported on code.google.com by [email protected]
on 22 Nov 2011 at 10:47
What steps will reproduce the problem?
1. Create a kernel with the following run() body:
int x = 0;
x += 128;
2. Execute the kernel
What is the expected output? What do you see instead?
Expected is that the kernel runs on the GPU, without side effects in this
minimum example. What happens is the following exception, and JTP execution:
com.amd.aparapi.ClassParseException: java.lang.NullPointerException
at com.amd.aparapi.MethodModel.init(MethodModel.java:1542)
at com.amd.aparapi.MethodModel.<init>(MethodModel.java:1452)
at com.amd.aparapi.ClassModel.getMethodModel(ClassModel.java:2344)
at com.amd.aparapi.ClassModel.getEntrypoint(ClassModel.java:2377)
at com.amd.aparapi.ClassModel.getEntrypoint(ClassModel.java:2386)
at com.amd.aparapi.KernelRunner.execute(KernelRunner.java:1335)
at com.amd.aparapi.Kernel.execute(Kernel.java:1682)
at com.amd.aparapi.Kernel.execute(Kernel.java:1613)
at com.amd.aparapi.Kernel.execute(Kernel.java:1583)
at mypackage.MyClass.main(MyClass.java)
Caused by: java.lang.NullPointerException
at com.amd.aparapi.MethodModel.foldExpressions(MethodModel.java:587)
at com.amd.aparapi.MethodModel.init(MethodModel.java:1491)
... 11 more
What version of the product are you using? On what operating system?
aparapi-2012-02-15 on Linux 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:39:49
UTC 2012 x86_64 GNU/Linux, using a somewhat unorthodox setup using Maven and
Eclipse
Please provide any additional information below.
The bytecode provided by the compiler is the following:
0 iconst_0
1 istore_1 [x]
2 wide
3 iinc 1 128 [x]
8 return
Alternative, working kernels:
int x = 0;
x += 127;
0 iconst_0
1 istore_1 [x]
2 iinc 1 127 [x]
5 return
and
int x = 0, y = 128;
x -= y;
0 iconst_0
1 istore_1 [x]
2 sipush 128
5 istore_2 [y]
6 iload_1 [x]
7 iload_2 [y]
8 isub
9 istore_1 [x]
10 return
It seems clear to me that Aparapi is choking on the "wide" opcode. What
Wikipedia has to say about the "wide" opcode: "execute opcode, where opcode is
either iload, fload, aload, lload, dload, istore, fstore, astore, lstore,
dstore, or ret, but assume the index is 16 bit; or execute iinc, where the
index is 16 bits and the constant to increment by is a signed 16 bit short".
Original issue reported on code.google.com by [email protected]
on 28 Mar 2012 at 1:26
What steps will reproduce the problem?
1. Have a primitive annotated with @Local.
What is the expected output? What do you see instead?
Try to use the variable with the assumption that each block/group has a copy.
Notice that each thread has its own copy.
Workaround:
Declare the variable as an array of size 1.
What version of the product are you using? On what operating system?
aparapi R288, Ubuntu 11.10 amd64, Java 7
Please provide any additional information below.
I need 2-3 single variables to be available to all threads in a block/group. I
declare local memory variables and have the thread with localId() 0 read the
value from the global memory and write it there (I am not sure this is a good
idea, so any comments on that are welcome). After the assignment, I have a
localBarrier().
-If you declare the variable as a primitive, only thread 0 sees the correct
value (all others see the default value 0.0)
-If you declare the variable as an array of size [0], then the behaviour is the
expected from @Local
Original issue reported on code.google.com by [email protected]
on 27 Feb 2012 at 10:26
What steps will reproduce the problem?
1. Try to build/run Aparapi on Mac OS
What is the expected output? What do you see instead?
Expected to build and execute.
Current build does not support Mac OS.
Current runtime component does not support Apple's OpenCL.
Original issue reported on code.google.com by [email protected]
on 12 Oct 2011 at 9:47
What steps will reproduce the problem?
1. Run Aparapi program on AMD Linux 64bit
2. Set device to GPU.
What is the expected output? What do you see instead?
Expected to execute on GPU, runs JTP.
verboseJNI gives:
platform name 0 Advanced Micro Devices, Inc.
platform version 0 OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
platform Advanced Micro Devices, Inc. does not support requested device type
skipping!
What version of the product are you using? On what operating system?
Aparapi 2011-10-13
Linux 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 14:56:25 UTC 2011 x86_64 x86_64
x86_64 GNU/Linux
Please provide any additional information below.
The machine has two Radeon cards, and lshw reports them as this:
*-display
description: VGA compatible controller
product: Antilles [AMD Radeon HD 6990]
vendor: ATI Technologies Inc
physical id: 0
bus info: pci@0000:0c:00.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=fglrx_pci latency=0
resources: irq:95 memory:d0000000-dfffffff memory:fe9e0000-fe9fffff ioport:e000(size=256) memory:fe9c0000-fe9dffff
Any special tricks I can use to figure out why it fails to recognize the cards?
Original issue reported on code.google.com by [email protected]
on 11 Nov 2011 at 11:28
What steps will reproduce the problem?
1. Make simple kernel
2. Run on machine with more than 1 GPU card
3. Fails with "clEnqueueNDRangeKernel() failed invalid work group size"
What is the expected output? What do you see instead?
Error message:
!!!!!!! clEnqueueNDRangeKernel() failed invalid work group size
after clEnqueueNDRangeKernel, globalSize=16 localSize=32 usingNull=0
Nov 15, 2011 4:07:37 PM com.amd.aparapi.KernelRunner executeOpenCL
WARNING: ### CL exec seems to have failed. Trying to revert to Java ###
What version of the product are you using? On what operating system?
2011-10-13 Ubuntu
Please provide any additional information below.
There is a check in KernelRunner.java:1081 that ensures that localSize <=
globalSize, but in aparapi.c:1073 it does this:
size_t globalSizeAsSizeT = (globalSize /jniContext->deviceIdc);
This is done to work on multiple devices, and the following loop enqueues the
work on multiple devices, but calls clEnqueueNDRangeKernel() with these
numbers. According to the OpenCL docs, the error code means:
"CL_INVALID_WORK_GROUP_SIZE if local is specified and number of workitems
specified by global is not evenly divisable by size of work-given by
local_work_size or ..."
I am not sure how it is supposed to work, but according to the error
description "global should be evenly divisible by local", but since we have
global=16 and local=32 they are not, hence the error.
Original issue reported on code.google.com by [email protected]
on 15 Nov 2011 at 4:01
What steps will reproduce the problem?
1. Run Aparapi program on Windows 7 64bit
2. Set device to GPU.
What is the expected output? What do you see instead?
com.amd.aparapi.KernelRunner warnFallBackAndExecute
WARNING: Reverting to Java Thread Pool (JTP) for class AparapiSample$1: initJNI
failed to return a valid handle
.....
platform name 0 Advanced Micro Devices, Inc.
platform version 0 OpenCL 1.2 AMD-APP (923.1)
platform Advanced Micro Devices, Inc. version OpenCL 1.2 AMD-APP (923.1) is not
OpenCL 1.1 skipping!
What version of the product are you using? On what operating system?
Aparapi 2012-05-06
Please provide any additional information below.
The output from clinfo:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 AMD-APP (923.1)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 3
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Board name: AMD Radeon HD 6480G
Max compute units: 3
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 444Mhz
Address bits: 32
Max memory allocation: 199753728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 000007FEE71B2A08
Name: BeaverCreek
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: CAL 1.4.1720 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (923.1)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd
_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Board name: AMD Radeon 6600M and 6700M Series
Max compute units: 6
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 444Mhz
Address bits: 32
Max memory allocation: 536870912
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 000007FEE71B2A08
Name: Turks
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: CAL 1.4.1720 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (923.1)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd
_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Board name:
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 1896Mhz
Address bits: 64
Max memory allocation: 2147483648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 3735633920
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 539
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 000007FEE71B2A08
Name: AMD A4-3300M APU with Radeon(tm) HD Graphics
Vendor: AuthenticAMD
Device OpenCL C version: OpenCL C 1.2
Driver version: 2.0 (sse2)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (923.1)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing
cl_ex
t_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf
cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
Original issue reported on code.google.com by [email protected]
on 7 May 2012 at 10:22
I'm trying to leverage OpenCL to calculate the Levenstein difference between
two strings. I've altered the algorithm so only primitives and 1D arrays are
used, and no char primitive is used, but I always get:
Feb 06, 2012 8:32:45 PM com.amd.aparapi.KernelRunner warnFallBackAndExecute
WARNING: Reverting to Java Thread Pool (JTP) for class
org.quelea.SongDuplicateChecker$1: OpenCL compile failed
I've attached the code - I'm not sure if this is a bug or something I'm doing
wrongly?
Original issue reported on code.google.com by [email protected]
on 6 Feb 2012 at 8:35
Attachments:
We need Aparapi to support multiple GPU's instead of just the first GPU it
finds available.
We have two use cases for this:
- A single workstation with multiple GPU's located on separate cards
- A cluster of computers with +2 GPU's per node
It would be ideal if we did not have to specify specific information about our
environment and Aparapi/OpenCL would automatically partition the work and
distribute it out as required.
I believe both CUDA +4.0 and OpenCL +1.1 support multi-threaded multi-GPU
environments.
I've attached a small presentation related to OpenCL 1.1 multi-GPU enhancements
given at 2011 SigGraph.
Of course, it would be nice to see an AMD presentation of this same material
highlighting Aparapi :)
Original issue reported on code.google.com by [email protected]
on 25 Nov 2011 at 1:20
Attachments:
What steps will reproduce the problem?
1.while trying to run samle code squares
2.
3.
What is the expected output? What do you see instead?
it is runing in JTP modde instead GPU mode
WARNING: Check your environment. Failed to load aparapi native library
aparapi_x86 or possibly failed to locate opencl native library
(opencl.dll/opencl.so). Ensure that both are in your PATH (windows) or in
LD_LIBRARY_PATH (linux).
Execution mode=JTP
What version of the product are you using? On what operating system?
i m using centos 5 and AMD card with AMD-APP-SDK-v2.5-RC2-lnx64 driver
Please provide any additional information below.
i already set LD_LIBRARY_PATH
Original issue reported on code.google.com by [email protected]
on 1 Dec 2011 at 11:24
I ran the latest Aparapi trunk code (as of today) through FindBugs and it
exposed 83 areas for investigation and possible improvement, including a number
of high priority bugs. I have attached the XML output.
Two suggestions for this ticket:
- Include FindBugs as an integral component of the Ant build scripts
- Either fix potential bugs or comment in the code reasons why changes are not
needed (possibly use FindBugs annotations to avoid Ant output)
Original issue reported on code.google.com by [email protected]
on 23 Nov 2011 at 7:47
Attachments:
We are at a point where we need access to a number of items returned by
clGetDeviceInfo.
Operating without knowledge of the following parameter's return value in
particular is really giving us grief:
CL_DEVICE_MAX_MEM_ALLOC_SIZE
Return type: cl_ulong
Max size of memory object allocation in bytes. The minimum value is max (1/4th
of CL_DEVICE_GLOBAL_MEM_SIZE, 128*1024*1024)
Original issue reported on code.google.com by [email protected]
on 7 May 2012 at 11:34
OpenCL allows the developer to query the underlying hardware for available
information which can then be used at runtime to determine appropriate kernel
parameters. We are specifically interested in this information in order to
properly partition our data based on the available GPU memory constraints on
the deployed hardware platform.
Ideally, this would be returned in a Map<String,String> or Map<Enum,String>.
For example:
CL_DEVICE_ADDRESS_BITS
CL_DEVICE_AVAILABLE
CL_DEVICE_COMPILER_AVAILABLE
CL_DEVICE_ENDIAN_LITTLE
CL_DEVICE_ERROR_CORRECTION_SUPPORT
CL_DEVICE_EXECUTION_CAPABILITIES
CL_DEVICE_EXTENSIONS
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE
CL_DEVICE_GLOBAL_MEM_SIZE
CL_DEVICE_HOST_UNIFIED_MEMORY
CL_DEVICE_IMAGE2D_MAX_HEIGHT
CL_DEVICE_IMAGE2D_MAX_WIDTH
CL_DEVICE_IMAGE3D_MAX_DEPTH
CL_DEVICE_IMAGE3D_MAX_HEIGHT
CL_DEVICE_IMAGE3D_MAX_WIDTH
CL_DEVICE_IMAGE_SUPPORT
CL_DEVICE_LOCAL_MEM_SIZE
CL_DEVICE_LOCAL_MEM_TYPE
CL_DEVICE_MAX_CLOCK_FREQUENCY
CL_DEVICE_MAX_COMPUTE_UNITS
CL_DEVICE_MAX_CONSTANT_ARGS
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE
CL_DEVICE_MAX_MEM_ALLOC_SIZE
CL_DEVICE_MAX_PARAMETER_SIZE
CL_DEVICE_MAX_READ_IMAGE_ARGS
CL_DEVICE_MAX_SAMPLERS
CL_DEVICE_MAX_WORK_GROUP_SIZE
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS
CL_DEVICE_MAX_WORK_ITEM_SIZES
CL_DEVICE_MAX_WRITE_IMAGE_ARGS
CL_DEVICE_MEM_BASE_ADDR_ALIGN
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE
CL_DEVICE_NAME
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT
CL_DEVICE_OPENCL_C_VERSION
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT
CL_DEVICE_PROFILE
CL_DEVICE_PROFILING_TIMER_RESOLUTION
CL_DEVICE_QUEUE_PROPERTIES
CL_DEVICE_SINGLE_FP_CONFIG
CL_DEVICE_TYPE
CL_DEVICE_VENDOR
CL_DEVICE_VENDOR_ID
CL_DEVICE_VERSION
CL_DRIVER_VERSION
CL_PLATFORM_EXTENSIONS
CL_PLATFORM_NAME
CL_PLATFORM_PROFILE
CL_PLATFORM_VENDOR
CL_PLATFORM_VERSION
Original issue reported on code.google.com by [email protected]
on 14 Feb 2012 at 8:50
What steps will reproduce the problem?
Try to do a fresh checkout from svn and compile code.
What is the expected output? What do you see instead?
Missing file during compilation.
I guess com.amd.aparapi.jni/include directory is missing in trunk.
Original issue reported on code.google.com by [email protected]
on 28 Sep 2011 at 7:28
What steps will reproduce the problem?
Create a simple kernel like this:
class MyKernel extends Kernel {
int[] a = new int[1];
int[] b = new int[1];
public void run() {
a[b[0]++] = 1;
}
}
What is the expected output? What do you see instead?
As this is valid Java (and bytecode) I expected it to be parsed.
Instead I get:
com.amd.aparapi.ClassParseException: @16 IASTORE Detected an non-reducable operand consumer/producer mismatch
at com.amd.aparapi.MethodModel.applyTransformations(MethodModel.java:1320)
at com.amd.aparapi.MethodModel.foldExpressions(MethodModel.java:606)
at com.amd.aparapi.MethodModel.init(MethodModel.java:1493)
at com.amd.aparapi.MethodModel.<init>(MethodModel.java:1454)
at com.amd.aparapi.ClassModel.getMethodModel(ClassModel.java:2344)
at com.amd.aparapi.ClassModel.getEntrypoint(ClassModel.java:2377)
What version of the product are you using? On what operating system?
OSX, Eclipse
Please provide any additional information below.
The bytecode looks like this (annotated with stack state after instruction
execution):
public void run();
Code:
0: aload_0 -- this
1: getfield -- a
4: aload_0 -- a, this
5: getfield -- a, b
8: iconst_0 -- a, b, 0
9: dup2 -- a, b, 0, b, 0
10: iaload -- a, b, 0, int
11: dup_x2 -- a, int, b, 0, int
12: iconst_1 -- a, int, b, 0, int, 1
13: iadd -- a, int, b, 0, int+1
14: iastore -- a, int
15: iconst_1 -- a, int, 1
16: iastore --> Error, "int" is produced @11, a comes from @1
17: return
The parser detects that @14 is not a producer, causing it to activate the
transform, but no suitable transform is found, hence the exception.
I do not understand the code parser design well enough to suggest a fix, but
the above example is the smallest I can think of that looks like a real version
in a test application.
Original issue reported on code.google.com by [email protected]
on 9 Feb 2012 at 3:19
What steps will reproduce the problem?
1. Create a simple kernel like "data[getGlobalId()]=0"
2. Execute it with mode=JTP and an array of 5 elements
What is the expected output? What do you see instead?
Expected kernel to complete, but instead it hangs forever.
What version of the product are you using? On what operating system?
2011-10-13, or r89 on Win7 x64, but 32bit Java
Please provide any additional information below.
It looks as if the barrier is placed inside the loop that generates the
threads. If numThreadSets is more than 1, it will need to do an extra loop with
thrSetId (line 690), but the synchronization happens inside the loop, so it
will block waiting for the threads to finish, but they are not created, hence a
deadlock.
I have attached a patch where I have moved the synchronization outside the
loop, and that seems to work for me. It should not do bad things because it is
apparently just waiting for all threads to complete before starting the next
pass.
Original issue reported on code.google.com by [email protected]
on 18 Oct 2011 at 1:53
Attachments:
What steps will reproduce the problem?
1.Code a static method in kernel class
2.Attempt to call this method from kernel
3.
What is the expected output? What do you see instead?
Expect kernel to execute
Please use labels and text to provide additional information.
static call is trapped during bytecode parsing
Here is the email from Witold Bolt
Hi,
Finally I had some time last evening to hack on Aparapi.
Original issue reported on code.google.com by [email protected]
on 20 Nov 2011 at 5:22
Attachments:
What steps will reproduce the problem?
1. set -Dcom.amd.aparapi.enable.GETSTATIC=true
2. Run a kernel with a "final static int" field
What is the expected output? What do you see instead?
Should run or fallback, instead it throws "Field not found".
What version of the product are you using? On what operating system?
r89, Win7 x86
Please provide any additional information below.
This is caused by incomplete native code that does not read the static fields
correctly.
Attached is a patch that fixes this.
I assume that static fields are not supported because they may go cross
threads, which means that they should use an array instead. But in the simple
case where the "final" keyword is applied, it is essentially a const. Removing
the "static" keyword will in theory allocate storage on a pr. object basis.
Original issue reported on code.google.com by [email protected]
on 28 Oct 2011 at 11:25
Attachments:
What steps will reproduce the problem?
1. Open BlackScholes example code
2. Change all floats to double
3. Compile and run (nvidia quadro 600)
What is the expected output? What do you see instead?
Q600 has native fp64 support, but the compiler generates some errors:
clBuildProgram failed
************************************************
:1:26: warning: unknown '#pragma OPENCL EXTENSION' - ignored
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
^
:4:13: error: must specify '#pragma OPENCL EXTENSION cl_khr_fp64: enable'
before using 'double'
__global double *randArray;
^
:14:16: error: use of undeclared identifier 'NaN'
double c2 = NaN;
^
:17:16: error: use of undeclared identifier 'NaN'
double c5 = NaN;
^
:25:103: error: use of undeclared identifier 'NaN'
double y = 1.0 - (((0.3989422917366028 * exp(((-X * X) / 2.0))) * t) * (0.3193815350532532 + (t * (NaN + (t * (1.781477928161621 + (t * (-1.8212559223175049 + (t * NaN)))))))));
^
:48:53: error: use of undeclared identifier 'NaN'
double R = (0.009999999776482582 * inRand) + (NaN * (1.0 - inRand));
^
:49:60: error: use of undeclared identifier 'NaN'
double sigmaVal = (0.009999999776482582 * inRand) + (NaN * (1.0 - inRand));
What version of the product are you using? On what operating system?
aparapi-2011-10-13, windows XP
Please provide any additional information below.
FWIW floats work fine.
Original issue reported on code.google.com by [email protected]
on 6 Dec 2011 at 3:48
For some of our use cases, we've been trying to find ways to avoid the
expensive initialization costs of using Aparapi.
For example, one of our tests is taking ~8ms to complete the kernel execution,
but the initial Aparapi execution and OpenCL generation is taking ~250-300ms.
This cost really adds up over multiple different kernels or even re-executions
of the same kernel outside of a loop (different execution scopes).
One solution would be the following:
- Allow the user to specify that Aparapi should serialize the generated OpenCL
code to a local .cl file during regular execution
- Allow the user to specify that Aparapi should deserialize a user-defined .cl
file instead of generating OpenCL from Java code
- Allow Aparapi to follow all of its existing auto-fallback options if the .cl
file cannot be found, is invalid, etc.
- Log an error
- Revert to existing behavior
Original issue reported on code.google.com by [email protected]
on 27 Nov 2011 at 7:32
What steps will reproduce the problem?
Follow the build instruction here:
https://code.google.com/p/aparapi/wiki/DevelopersGuideLinux
What is the expected output? What do you see instead?
According to the wiki the expected output includes 4 things:
aparapi.jar containing Aparapi classes for all platforms.
the shared library for your platform (aparapi_x86.so or aparapi_x86_64.so).
an /api subdirectory containing the 'public' javadoc for Aparapi.
a samples directory containing the source and binaries for the mandel and squares sample projects.
I get everything except the shared library.
What version of the product are you using? On what operating system?
aparapi r388, Ubuntu 11.10 64-bit, Java 7
g++ v4.6.1, ant v1.8.2
Original issue reported on code.google.com by [email protected]
on 2 Apr 2012 at 10:12
As we begin to integrate Aparapi into more generalized and production-ready
projects, it is becoming obvious that we need a way for Aparapi to bundle all
of its JNI libraries into the Aparapi JAR and load them automatically, instead
of relying on the java.library.path to be set by the calling code.
This is entirely possible and has been done by a number of other projects, but
will require both changes to how we load native libraries and how we build and
deploy Aparapi.
Original issue reported on code.google.com by [email protected]
on 3 Apr 2012 at 5:16
When writing the value of a float, it will just write:
f.toString() + "f"
When f == Float.Inifinity the output becomes:
Infinitiyf
Which obviously fails to compile.
The attached patch fixes this.
Original issue reported on code.google.com by [email protected]
on 28 Oct 2011 at 10:15
Attachments:
What steps will reproduce the problem?
1. While trying to compile or run example nbody
2.
3.
What is the expected output? What do you see instead?
Buildfile: C:\Aparapi\Aparapi\examples\nbody\build.xml
clean:
[delete] Deleting directory C:\Aparapi\Aparapi\examples\nbody\classes
clean:
check:
build:
[mkdir] Created dir: C:\Aparapi\Aparapi\examples\nbody\classes
[javac] Compiling 1 source file to C:\Aparapi\Aparapi\examples\nbody\classes
[javac] C:\Aparapi\Aparapi\examples\nbody\src\com\amd\aparapi\examples\nbody\Main.java:290: error: method enable in class Texture ca
nnot be applied to given types;
[javac] texture.enable();
[javac] ^
[javac] required: GL
[javac] found: no arguments
[javac] reason: actual and formal argument lists differ in length
[javac] 1 error
BUILD FAILED
C:\Aparapi\Aparapi\examples\nbody\build.xml:59: Compile failed; see the
compiler error output for details.
What version of the product are you using? On what operating system?
I m using the provided trunk till date on Windows 7 X64 with AMD GPU supporting
OpenCl.
Please provide any additional information below.
Even Facing problem in examples also saying ->
Error: Could not find or load main class com.amd.aparapi.examples.nbody.Main
Original issue reported on code.google.com by [email protected]
on 7 Oct 2011 at 4:07
During our testing of Aparapi, we've encountered a number of instances where
the OpenCL code has been successfully created, but fails to execute on the
targeted GPU.
But we have also noticed that if we force Aparapi to use CPU mode instead of
JTP mode, the OpenCL code has generally executed 2x or greater performance on
CPU than JTP.
Since OpenCL is intended to support GPU/CPU/FPGA with the same code, we would
like to request the following change:
(pseudo code)
if opencl_generated_successfully
{
try GPU
if GPU fails
try CPU
if CPU fails
try JTP
}
else
{
try JTP
}
Original issue reported on code.google.com by [email protected]
on 10 Nov 2011 at 6:13
One thing that I think would be extremely useful and valuable would be if
Aparapi supplied a library of pre-configured and optimized kernels for end-user
use. For example, it would be nice to have a library of kernels with
functionality similar to the library of example code available for CUDA, except
for OpenCL via Aparapi. This could also help to augment any documentation in
the Wiki needed to explain each use case.
My main motivation behind this request is the fact that even though all of the
Aparapi examples appear to use single class files with kernels defined as inner
classes, in my experience most production use of Aparapi will define kernels in
separate classes which are then instantiated and executed from somewhere in the
application code. There have been a number of times it would have been nice if
there was a pre-configured XYZ kernel to use instead of writing one from
scratch (after investigating the necessary logic in OpenCL or CUDA).
Original issue reported on code.google.com by [email protected]
on 23 Jan 2012 at 2:10
What steps will reproduce the problem?
Here You see a small example for the problem:
Java-code:
213 gauss[ zz ] = 1.0e-10 * rint( 1.0e10 * v0 * q );
The resulting bytecode from the Eclipse Class File Editor:
918 aload_0 [this]
919 getfield de.nsa_gmbh.hs.sixin.SxKernel.gauss : double[] [40]
922 iload 39 [zz]
924 ldc2_w <Double 1.0E-10> [142]
927 aload_0 [this]
928 ldc2_w <Double 1.0E10> [68]
931 dload 40 [v0]
933 dmul
934 dload 44 [q]
936 dmul
937 invokevirtual de.nsa_gmbh.hs.sixin.SxKernel.rint(double) : double [64]
940 dmul
941 dastore
The resulting bytecode from my favorite Bytecode-Plugin for Eclipse:
LINENUMBER 213 L69
ALOAD 0
GETFIELD de/nsa_gmbh/hs/sixin/SxKernel.gauss : [D
ILOAD 39
LDC 1.0E-10
ALOAD 0
LDC 1.0E10
DLOAD 40
DMUL
DLOAD 44
DMUL
INVOKEVIRTUAL de/nsa_gmbh/hs/sixin/SxKernel.rint(D)D
DMUL
DASTORE
The resulting openCL code for the GPU from Aparapi:
this->gauss[zz] = NAN * rint(((1.0E10 * v0) * q));
Another example (without the bytecode):
Java:
tryAgain = ( min( q, 1D - q ) < 1.0e-8 );
The resulting openCL code for the GPU from Aparapi:
tryAgain = (fmin(q, (1.0 - q))<NAN)?1:0)
or even much more simple:
Java:
double th = 1.0e-8;
The resulting openCL code for the GPU from Aparapi:
double th = NAN;
What is the expected output? What do you see instead?
expected: 1.0E-10 instead: NAN
What version of the product are you using? On what operating system?
I works on a Dell Latitude E6520 under MS Windows 7 SP1 with
the Eclipse SDK ( Version: 3.7.2, Build id: M20120208-0800 )
The graphic card is NVIDIA Quattro NVS 4200M where the most uptodate driver
version and CUDA 4.2
installed.
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 16 May 2012 at 10:00
Aparapi won't work with "Intel OpenCL SDK 1.5" despite the latter provides
support for OpenCL 1.1.
The error message in verbose mode is:
platform name 1 Intel(R) Corporation
platform version 1 OpenCL 1.1 LINUX
platform Intel(R) Corporation does not support requested device type skipping!
Wondering if this just an overly strick check or if the device type information
is actually fundamental for aparapi to work?
Original issue reported on code.google.com by [email protected]
on 27 Oct 2011 at 1:22
What steps will reproduce the problem?
Well I went through the forum page http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=141035
So i coded biotonic sort and tried to achieve best.
The code is attached and is some what able to do better than the posted one.
What is the expected output? What do you see instead?
I was expecting to have GPU doing better than CPU. but in the end CPU still beats up GPU by 4x.
What version of the product are you using? On what operating system?
I am using Windows 7 x64. Java JDK 7 and latest Aparapi.
Hardware used are:-
Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz ( or Intel Core i3 370M)
ATI Mobility Radeon HD 5400 Series(1 GB Memory) GPU
4GB DDR3 RAM
Please provide any additional information below.
same code is running on CPU as well as GPU, with array size= 4194304 with each element less than 1000000.
Got results in 2.2 Seconds with CPU, while GPU takes 11 seconds.
Vivek Kumar Chaubey
Original issue reported on code.google.com by [email protected]
on 17 Dec 2011 at 2:31
Attachments:
Install and run Mandel sample. Runs but with JNI fallback.
18-Oct-2011 11:17:03 com.amd.aparapi.KernelRunner warnFallBackAndExecute
WARNING: Reverting to Java Thread Pool (JTP) for class
com.amd.aparapi.sample.mandel.Main$MandelKernel: initJNI failed to return a
valid handle
Execution mode=JTP;
Vista 32 bit
java version "1.6.0_26"
Intel CPU, GeForce GTX 460.
I have the latest AMD APP SDK.
Our own custom Java/openCL code using Jocl runs on either GPU or CPU. Device
information from JOCL below in case that helps.
Number of platforms: 2
Number of devices in platform NVIDIA CUDA: 1
Number of devices in platform AMD Accelerated Parallel Processing: 1
--- Info for device GeForce GTX 460: ---
CL_DEVICE_NAME: GeForce GTX 460
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 275.33
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
...
--- Info for device Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
CL_DEVICE_NAME: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
CL_DRIVER_VERSION: 2.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
Original issue reported on code.google.com by [email protected]
on 18 Oct 2011 at 10:28
I am trying to implement a basic reduction algorithm, like "max".
Using the basic approach, each kernel will handle two elements and gradually
compose the result. This approach requires that the kernels run in perfect
lockstep otherwise the combined result will not be correct.
Sample implementation idea:
http://developer.apple.com/library/mac/#samplecode/OpenCL_Parallel_Reduction_Exa
mple/Listings/reduce_float_kernel_cl.html#//apple_ref/doc/uid/DTS40008188-reduce
_float_kernel_cl-DontLinkElementID_7
Pseudo-code:
int id = getGlobalId();
for(int scan = 0; i < scans; i++) {
int other = (1 << scan) + id;
if (other < length)
shared[id] = Math.max(shared[id], shared[other]);
}
Is it possible to explicitly emit lock values?
One solution is to use the "passes" to issues the "scan" number, but this seems
fairly wastefull as it will re-execute the kernels rather than keep them
running.
In my case the reduction happens after a larger number of operations, so I need
to produce a separate kernel to do the reduction as I cannot just run multiple
passes on the entire kernel. Using the extra reducer kernel means that I need
to replicate both support code and copy large arrays between the two.
Are there any thoughts on how this should be solved with Aparapi?
Original issue reported on code.google.com by [email protected]
on 28 Oct 2011 at 12:04
The Java Math library only has Math.log(double), which means that floats must
handled with a cast to float.
For a simple kernel, this looks like:
public class LogKernel extends Kernel {
private float[] data;
private int offset;
@Override public void run(){
int i= getGlobalId();
data[offset+i]=(float)Math.log(data[offset+i]);
}
}
This works fine if the hardware supports double precision, otherwise it fails.
Is there any way to rewrite the kernel to instruct Aparapi to use the single
precision version of log when executed on a GPU ?
Original issue reported on code.google.com by [email protected]
on 25 Oct 2011 at 9:19
What steps will reproduce the problem?
I'm trying to use aparapi in one of my projects. When running the example apps
I get the error:
Apr 04, 2012 5:41:12 PM com.amd.aparapi.KernelRunner warnFallBackAndExecute
WARNING: Reverting to Java Thread Pool (JTP) for class com.amd.aparapi.sample.sq
uares.Main$1: Range workgroup size 256 > device 128
com.amd.aparapi.RangeException: Range workgroup size 256 > device 128
at com.amd.aparapi.KernelRunner.executeOpenCL(KernelRunner.java:1239)
at com.amd.aparapi.KernelRunner.execute(KernelRunner.java:1513)
at com.amd.aparapi.Kernel.execute(Kernel.java:1682)
at com.amd.aparapi.Kernel.execute(Kernel.java:1613)
at com.amd.aparapi.Kernel.execute(Kernel.java:1583)
at com.amd.aparapi.sample.squares.Main.main(Main.java:82)
Execution mode=JTP
After creating a project in Eclipse I added the aparapi.jar to the build path,
added the aparapi_x86.dll and opencl.dll to the PATH, installed the newest
version of OpenCL 1.1 still when trying to recreate the square example I get:
com.amd.aparapi.Kernel$EXECUTION_MODE <clinit>
WARNING: Check your environment. Failed to load aparapi native library
aparapi_x86 or possibly failed to locate opencl native library
(opencl.dll/opencl.so). Ensure that both are in your PATH (windows) or in
LD_LIBRARY_PATH (linux).
Where am I going wrong? Both java and system are 32-bit
Original issue reported on code.google.com by [email protected]
on 4 Apr 2012 at 4:45
What steps will reproduce the problem?
1. Make a simple kernel
2. Execute with execute(n, 2)
What is the expected output? What do you see instead?
Kernel should run two passes, it only runs one.
What version of the product are you using? On what operating system?
Latest, r258, on OSX
Please provide any additional information below.
I think the new Range calculations have somehow made the "passes" go away.
I can see that the value is passed along, but there is no loop using the value.
Searching for "passid" inside KernelRunner.java shows that it is only used with
SEQ mode (line 682).
Original issue reported on code.google.com by [email protected]
on 15 Feb 2012 at 12:30
What steps will reproduce the problem?
1. Execute an aparapi enabled application on platform supporting OpenCL 1.1
What is the expected output? What do you see instead?
Expect application to execute.
Instead see 'fall back' message and application runs in a thread pool instead
of GPU
Original issue reported on code.google.com by [email protected]
on 12 Oct 2011 at 4:27
When OpenCL execution throws an exception due to an uninitialized method local
variable, Aparapi does not automatically failover to JTP but instead appears to
hang.
This is a continution of http://code.google.com/p/aparapi/issues/detail?id=34
and original attached code.
Original issue reported on code.google.com by [email protected]
on 10 Feb 2012 at 11:04
All of the documentation needs to be updated to reflect the new support for Mac
OS X.
It is also important to note that while OS X 10.6 will execute Aparapi, OpenCL
1.0 is not officially supported. OS X 10.7 is the first version to support
OpenCL 1.1.
Original issue reported on code.google.com by [email protected]
on 8 Nov 2011 at 5:19
I have successfully completed work implementing Aparapi in an Applet + JNLP
environment. This required a few small changes to the Aparapi source code and
build scripts.
In summary:
- Added support for sun.jnlp.applet.launcher in Kernel
- Added support for org.jdesktop.applet.util.JNLPAppletLauncher in Kernel
- Changed build.xml to output .jnilib files in appropriate /dist directories
instead of platform specific binaries (.so, .dll, .dylib) dropped in the root
folder
Once I complete preparation of the Applet+JNLP Eclipse projects I will work on
checking those in under separate issue requests.
Original issue reported on code.google.com by [email protected]
on 11 Feb 2012 at 12:15
Attachments:
If you take a look at http://www.khronos.org/opencl/resources under the section
"Java Bindings to OpenCL" you will see that only JOCL is listed.
Aparapi should be submitted to Khronos as a first-class Java OpenCL binding to
be listed under the above link.
Original issue reported on code.google.com by [email protected]
on 2 Jan 2012 at 4:02
In our Kernel code, we need to have a way to access local shared memory buffers
for thread access within an individual group.
This would allow us to perform calculations on data stored in memory that is
local to each thread group, for example in reduction phases of map/reduce and
would also allow us to use the available Kernel.localBarrier() effectively.
It appears that right now, all variables exist solely in global memory.
Original issue reported on code.google.com by [email protected]
on 17 Nov 2011 at 10:29
What steps will reproduce the problem?
1. Create a small kernel that uses float[][]
2. Execute it
What is the expected output? What do you see instead?
Either an error message like "not supported" or a "FallBackWarning".
I see this message instead:
Failed to execute: null
java.lang.NullPointerException
at com.amd.aparapi.KernelWriter.convertType(KernelWriter.java:111)
at com.amd.aparapi.KernelWriter.write(KernelWriter.java:260)
at com.amd.aparapi.KernelRunner.execute(KernelRunner.java:1197)
at com.amd.aparapi.Kernel.execute(Kernel.java:1523)
at com.amd.aparapi.Kernel.execute(Kernel.java:1469)
What version of the product are you using? On what operating system?
r89, Windows x86, Oracle JDK 1.6.0_20
Please provide any additional information below.
The problem is the field type parser. Because the field is float[][], the
typename is "[[F".
The code extracts the first '[' and then assumes that the next char is the
typename, but it is another '['.
It would be very nice if multidimensional arrays were supported. I do not need
jagged arrays, so my solution is to rewrite the array as a single float[] and
then just do index calculations manually. However this gives a lot of memory
copy, because I copy outside Aparapi, then Aparapi copies to device, and the
same on the way back.
I would be interested in having a go a implementing this for non-jagged arrays
inside Aparapi, could you give me some hints as to where I should look in the
code?
Original issue reported on code.google.com by [email protected]
on 28 Oct 2011 at 8:20
What steps will reproduce the problem?
1. Any kernel with double values used
What is the expected output? What do you see instead?
I am testing with the kernel found in the users guide, the one that takes an
array of floats and squares each element. The kernel works fine for floats, but
for doubles, I get:
************************************************
:1:26: warning: unknown '#pragma OPENCL EXTENSION' - ignored
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
^
:4:13: error: must specify '#pragma OPENCL EXTENSION cl_khr_fp64: enable'
before using 'double'
__global double *val$out;
^
************************************************
What version of the product are you using? On what operating system?
- Ubuntu 11.10 64-bit
- Aparapi 2012-02-15 (latest version in Downloads at the time I write this)
- NVidia GTX480 with drivers v295.20 (latest at the time I write this)
Please provide any additional information below.
I assume the problem is I am using an NVidia card? I am available for any
testing required.
Original issue reported on code.google.com by [email protected]
on 19 Feb 2012 at 8:57
While building the com.amd.aparapi.jni project in OS X the following folder is
created locally and needs to be added to svn:ignore
libaparapi_x86_64.dylib.dSYM
Additionally, the build.xml "clean" target should be configured to delete the
following:
libaparapi_x86_64.dylib.dSYM
libaparapi_${x86_or_x86_64}.dylib
It also appears that the "build" target was incorrectly calling the "check"
target instead of the "clean" target (which already depends on check). I have
included that change as well so now "build" cleans up beforehand correctly.
Please find attached the patch with all of these changes.
Original issue reported on code.google.com by [email protected]
on 4 Jan 2012 at 5:11
Attachments:
What steps will reproduce the problem?
1.Execute a single Kernel instance from many threads
2.
3.
What is the expected output? What do you see instead?
Expect to see each thread to execute correctly (although we can't expect data
integrity)
Please use labels and text to provide additional information.
Instead we get an JVM crash.
This is the latest branch for supporting Multi-Dim kernel access. But I
suspect the same issue will be in the main branch.
My guess is that we are inadvertantly sharing JNIEnv* data across threads. I
just checked in a potential fix (in the multi-dim branch) but can't really test
until Monday.
Original issue reported on code.google.com by [email protected]
on 15 Jan 2012 at 9:36
Add support for Java's char-type by mapping it to an unsigned short.
Java doesn't support unsigned numeric value but char happens to map precisely
to an unsigned short. It would therefore be convenient to be able to use it as
a numeric value.
Original issue reported on code.google.com by [email protected]
on 10 Oct 2011 at 9:10
What steps will reproduce the problem?
1.check the LD_LIBRARY_PATH
2.check all .so file whether they r cupperted or not
3.
What is the expected output? What do you see instead?
it suppose to run on GPU instead it run on jtp mode
What version of the product are you using? On what operating system?
aparapi_2011_09_13 and centoS 5.5
Please provide any additional information below.
thiswarning is coming
WARNING: Check your environment. Failed to load aparapi native library
aparapi_x86 or possibly failed to locate opencl native library
(opencl.dll/opencl.so). Ensure that both are in your PATH (windows) or in
LD_LIBRARY_PATH (linux).
Original issue reported on code.google.com by [email protected]
on 1 Dec 2011 at 11:16
When we try to use a new AMD Radeon HD 7970 GPU, which ships with OpenCL 1.2,
we receive the following exception:
May 2, 2012 12:04:29 PM com.amd.aparapi.KernelRunner warnFallBackAndExecute
WARNING: Reverting to Java Thread Pool (JTP) for class abc.def.Kernel: initJNI
failed to return a valid handle
Original issue reported on code.google.com by [email protected]
on 3 May 2012 at 3:03
I believe that according to
http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/get_global_id.html
the method getGlobalId() and correspondingly kernel.execute() should support an
array of ints. This would allow us to navigate a table structure using
different index values from the same global id.
The current work-around for this is to encode the table values in the single
global id similar to:
final int i = this.getGlobalId() / some_variable;
final int j = this.getGlobalId() % some_variable;
Ideally we could do something similar to:
final int i = this.getGlobalId()[0]
final int j = this.getGlobalId()[1]
Original issue reported on code.google.com by [email protected]
on 8 Nov 2011 at 5:15
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.