Git Product home page Git Product logo

cf4ocl's Introduction

ci codecov Coverity Scan Build Status Latest release LGPL Licence Supported platforms

News

6 Jan. 2024

  • This repository is now archived and will not be further developed by the original author. It's hard to make a case for using OpenCL in pure C nowadays, given the existence of excellent wrappers for other programming languages. If someone wants to pick up where I left off, let me know.

4 July 2016

  • Version 2.1.0 is available for download in the releases page.

Summary

The C Framework for OpenCL, cf4ocl, is a cross-platform pure C object-oriented framework for developing and benchmarking OpenCL projects. It aims to:

  1. Promote the rapid development of OpenCL host programs in C (with support for C++) and avoid the tedious and error-prone boilerplate code usually required.
  2. Assist in the benchmarking of OpenCL events, such as kernel execution and data transfers. Profiling comes for free with cf4ocl.
  3. Simplify the analysis of the OpenCL environment and of kernel requirements.
  4. Allow for all levels of integration with existing OpenCL code: use as much or as few of cf4ocl required for your project, with full access to the underlying OpenCL objects and functions at all times.

Features

  • Object-oriented interface to the OpenCL API
    • New/destroy functions, no direct memory alloc/free
    • Easy (and extensible) device selection
    • Simple event dependency mechanism
    • User-friendly error management
  • OpenCL version and platform independent
  • Integrated profiling
  • Advanced device query utility
  • Offline kernel compiler and linker

Documentation

Feedback and collaboration

Download or clone cf4ocl, build and install it, and code a small example, such as the one below, which shows a clean and fast way to create an OpenCL context with a user-selected device:

#include <cf4ocl2.h>
int main() {

    /* Variables. */
    CCLContext * ctx = NULL;

    /* Code. */
    ctx = ccl_context_new_from_menu(NULL);
    if (ctx == NULL) exit(-1);

    /* Destroy context wrapper. */
    ccl_context_destroy(ctx);

    return 0;
}

If you like this project and want to contribute, take a look at the existing issues. We also need help with binary packaging for different OSes. Other improvements or suggestions are of course, welcome. We appreciate any feedback.

Not yet integrated

A few OpenCL API calls, most of which introduced with OpenCL 2.1, are not yet integrated with cf4ocl. However, this functionality is still available to client code, because cf4ocl can be used simultaneously with raw OpenCL objects and functions.

Reference

License

Library code is licensed under LGPLv3, while the remaining code (utilities, examples and tests) is licensed under GPLv3.

Other useful C frameworks/utilities for OpenCL

If cf4ocl does not meet your requirements, take a look at the following projects:

cf4ocl's People

Contributors

danielnachun avatar ljramalho avatar ngaloppo avatar nunofachada avatar sebastient avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cf4ocl's Issues

Create SVM module

Create a module/class with wrapper methods for the shared virtual memory functionality of OpenCL 2.0. The clEnqueueSVMMigrateMem API call (OpenCL 2.1) should be integrated at the same time.

In demo code, show CL_PROGRAM_BUILD_LOG if ccl_program_build fails

Many folks will probably use the example code as the basis for their first projects - which will likely have bugs in kernels.

Make life easier for them by replacing eg. this code

	ccl_program_build(prg, NULL, &err);
	HANDLE_ERROR(err);

with this code:

	ccl_program_build(prg, NULL, &err);
	if (err != NULL)
	{
		CCLErr* err2 = NULL;
		fprintf(stderr, "\n%s\nBuild Error: Log Start\n", err->message);
		char* log = ccl_program_get_build_info_array(prg, dev, CL_PROGRAM_BUILD_LOG, char*, &err2);
		fprintf(stderr, "\n%s\nBuild Log End\n", log);
		HANDLE_ERROR(err2);
	}
	HANDLE_ERROR(err);

Tests fail with OCL stub if device > 0

Specifically, test_kernel fails in test /wrappers/kernel/native in clEnqueueNativeKernel() function (ocl_enqueue.c, line *arg_loc = mem_list[i]->mem;).

Create OpenGL sharing module

Although it is perfectly possible to use OpenGL sharing with cf4ocl, it's necessary to use the appropriate OpenCL functions directly (e.g. clCreateFromGLBuffer, etc), loosing the benefits offered by cf4ocl such as error management, event wrapping or integrated profiling.

Implement a wrapper for the clEnqueueNativeKernel() function

A possible prototype for such wrapper function could be:

CCLEvent* ccl_kernel_enqueue_native_full(CCLQueue* cq,
void (CL_CALLBACK user_func)(void *), CCLKernelNativeArgs args,
CCLEventWaitList
evt_wait_lst, GError** err);

Some links regarding the use of clEnqueueNativeKernel():

canon example randomly crashed on my i.MX6Q device

I downloaded the latest version in master branch and could build it successfully with the 1.1 embedded profile OpenCL library on a Freescale (now NXP) i.MX6Q device. The utilities are running fine and can show the device information correctly:

* Platform #0: Vivante OpenCL Platform (Vivante Corporation)
               OpenCL 1.1 , EMBEDDED_PROFILE

    [ Device #0: Vivante OpenCL Device ]

        TYPE                                 | GPU
        VENDOR                               | Vivante Corporation
        OPENCL_C_VERSION                     | OpenCL C 1.1
        MAX_COMPUTE_UNITS                    | 4
        GLOBAL_MEM_SIZE                      | 128.0 MiB (134217728 bytes)
        MAX_MEM_ALLOC_SIZE                   | 64.0 MiB (67108864 bytes)
        LOCAL_MEM_SIZE                       | 1.0 KiB (1024 bytes)
        LOCAL_MEM_TYPE                       | GLOBAL
        MAX_WORK_GROUP_SIZE                  | 1024 work-items

However, the canon example sometimes works and most times crashes (note that I added more debug output below:

Working case:

  =========================== Device Selection ============================

              0. Vivante OpenCL Device [Vivante OpenCL Platform]

 * Global worksize: 16
 * Local worksize : 16

device buffers created.
device buffer write events added to wait list...
kernel termination event added to wait list...
kernel done, wait list empty.
device result back.
Kernel exec passed result check.

perform profiling...

   =========================== Timming/Profiling ===========================

     Aggregate times by event  :
       ------------------------------------------------------------------
       | Event name                     | Rel. time (%) | Abs. time (s) |
       ------------------------------------------------------------------
       | WRITE_BUFFER                   |       58.8535 |    4.6200e-04 |
       | READ_BUFFER                    |       22.9299 |    1.8000e-04 |
       | MARKER                         |       17.4522 |    1.3700e-04 |
       | NDRANGE_KERNEL                 |        0.7643 |    6.0000e-06 |
       ------------------------------------------------------------------
                                        |         Total |    7.8500e-04 |
                                        ---------------------------------
     Event overlaps            : None
profiling result in out.tsv
host buffers destroyed.
device buffers destroyed.
memory leak check passed.

Crash case:

   =========================== Device Selection ============================

              0. Vivante OpenCL Device [Vivante OpenCL Platform]

 * Global worksize: 16
 * Local worksize : 16

device buffers created.
device buffer write events added to wait list...
kernel termination event added to wait list...
kernel done, wait list empty.
Segmentation fault (core dumped)

The crashing point looks like at the result checking stage. Not sure if this is a cf4ocl issue or FSL OpenCL library issue. Please teach if there are ways to identify it.

Make examples optional

Currently the examples have a (heavyweight) dependency of OpenMP. It would be nice to avoid this dependency by making the examples build optional in the CMake process.

Compilation issue on arm(aarch64)

In file included from /home/ubuntu/amd/opencl/compiler/llvm/tools/clang/include/clang/Frontend/ASTUnit.h:18:0,
from /home/ubuntu/amd/opencl/compiler/llvm/tools/clang/include/clang/Frontend/FrontendAction.h:24,
from /home/ubuntu/amd/opencl/compiler/llvm/tools/clang/include/clang/CodeGen/CodeGenAction.h:13,
from /home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:47:
/home/ubuntu/amd/opencl/compiler/llvm/tools/clang/include/clang/AST/ASTContext.h: In member function ‘const clang::ast_type_traits::DynTypedNode* clang::ASTContext::DynTypedNodeList::begin() const’:
/home/ubuntu/amd/opencl/compiler/llvm/tools/clang/include/clang/AST/ASTContext.h:558:13: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
->begin();
^
/home/ubuntu/amd/opencl/compiler/llvm/tools/clang/include/clang/AST/ASTContext.h: In member function ‘const clang::ast_type_traits::DynTypedNode* clang::ASTContext::DynTypedNodeList::end() const’:
/home/ubuntu/amd/opencl/compiler/llvm/tools/clang/include/clang/AST/ASTContext.h:565:13: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
->end();
^
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp: At global scope:
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:108:6: error: prototype for ‘bool amd::opencl_driver::File::WriteData(const char*, size_t)’ does not match any in class ‘amd::opencl_driver::File’
bool File::WriteData(const char* ptr, size_t size)
^
In file included from /home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:1:0:
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.h:85:8: error: candidate is: bool amd::opencl_driver::File::WriteData(const char*, int)
bool WriteData(const char* ptr, size_t size);
^
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp: In member function ‘virtual amd::opencl_driver::FileReference* amd::opencl_driver::BufferReference::ToInputFile(amd::opencl_driver::Compiler*, amd::opencl_driver::File*)’:
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:170:26: error: ‘size’ was not declared in this scope
if (!f->WriteData(ptr, size)) { return 0; }
^
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp: At global scope:
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:315:20: error: ‘amd::opencl_driver::BufferReference* amd::opencl_driver::AMDGPUCompiler::NewBufferReference(amd::opencl_driver::DataType, const char*, size_t, const string&)’ marked ‘override’, but does not override
BufferReference* NewBufferReference(DataType type, const char* ptr, size_t size, const std::string& id) override;
^
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp: In member function ‘virtual bool amd::opencl_driver::AMDGPUCompiler::DumpExecutableAsText(amd::opencl_driver::Buffer*, amd::opencl_driver::File*)’:
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:710:40: error: ‘class amd::opencl_driver::Buffer’ has no member named ‘Size’
StringRef execRef(exec->Ptr(), exec->Size());
^
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp: In member function ‘amd::opencl_driver::Compiler* amd::opencl_driver::CompilerFactory::CreateAMDGPUCompiler(const string&)’:
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:761:36: error: invalid new-expression of abstract class type ‘amd::opencl_driver::AMDGPUCompiler’
return new AMDGPUCompiler(llvmBin);
^
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:266:7: note: because the following virtual functions are pure within ‘amd::opencl_driver::AMDGPUCompiler’:
class AMDGPUCompiler : public Compiler {
^
In file included from /home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.cpp:1:0:
/home/ubuntu/amd/opencl/compiler/driver/src/driver/AmdCompiler.h:194:28: note: virtual amd::opencl_driver::BufferReference* amd::opencl_driver::Compiler::NewBufferReference(amd::opencl_driver::DataType, const char*, int, const string&)
virtual BufferReference* NewBufferReference(DataType type, const char* ptr, size_t size, const std::string& id = "") = 0;
^
compiler/driver/src/driver/CMakeFiles/opencl_driver.dir/build.make:62: recipe for target 'compiler/driver/src/driver/CMakeFiles/opencl_driver.dir/AmdCompiler.cpp.o' failed
make[2]: *** [compiler/driver/src/driver/CMakeFiles/opencl_driver.dir/AmdCompiler.cpp.o] Error 1
CMakeFiles/Makefile2:19547: recipe for target 'compiler/driver/src/driver/CMakeFiles/opencl_driver.dir/all' failed
make[1]: *** [compiler/driver/src/driver/CMakeFiles/opencl_driver.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

Create Pipes module

Create a module/class with wrapper methods for the Pipes functionality of OpenCL 2.0.

Create DirectX 9, 10 and 11 sharing modules

Although it is perfectly possible to use DirectX sharing with cf4ocl, it's necessary to use the appropriate OpenCL functions directly (e.g. clGetDeviceIDsFromD3D10KHR, etc), loosing the benefits offered by cf4ocl such as error management, event wrapping or integrated profiling.

Wrap clGetHostTimer and clGetDeviceAndHostTimer API calls (OpenCL 2.1) and integrate them in the profiler module

Wrap the clGetHostTimer and clGetDeviceAndHostTimer API calls (OpenCL 2.1). The former returns the current value of the host clock as seen by device, while the latter returns a reasonably synchronized pair of timestamps from the device timer and the host timer as seen by device.

This functionality should be integrated in the cf4ocl profiler module, with care taken to consider the new CL_PLATFORM_HOST_TIMER_RESOLUTION platform info query.

In demo code, demonstrate zero-copy memory alignment

For many GPUs (IoT devices using unified memory models, Intel iGPUs, and GPUs with pinned memory), zero-copy buffers are really useful. You use malloc() in the examples currently, which only aligns to 8-byte boundaries typically. Intel requires 4096-byte boundaries, ARM usually requires 64-bytes, etc. Unfortunately Windows uses different alignment methods than Linux, but this stackoverflow question gives a nice example of how to #ifdef wrap a mimic of posix_memalign : https://stackoverflow.com/a/33696858 and https://stackoverflow.com/a/38291021

To make life easy for people you could create a function that extracts CL_DEVICE_MEM_BASE_ADDR_ALIGN and uses this to correctly align buffers (and have a corresponding free call, to abstract windows/linux differences).

Add profiler support for CL_PROFILING_COMMAND_COMPLETE (OpenCL 2.0)

The CL_PROFILING_COMMAND_COMPLETE, introduced in OpenCL 2.0, describes the current device time counter in nanoseconds when the command identified by event and any child commands enqueued by this command on the device have finished execution.

Add support for this profiling info to the cf4ocl profiler.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.