tugrul512bit / libgpgpu Goto Github PK
View Code? Open in Web Editor NEWMulti-GPU & CPU OpenCL kernel executor with load-balancing as if there is one big GPU.
License: MIT License
Multi-GPU & CPU OpenCL kernel executor with load-balancing as if there is one big GPU.
License: MIT License
Internal data of parameter objects should not be destructed before computer object. Same for worker objects and their threads.
Test with different scenarios:
Some devices can not allocate arrays bigger than 128MB. With this feature, all free space (like 16GB) of VRAM can be used at once. It just requires power-of-2 sized sub-arrays and log(n) number of indexing-steps to select an element. As long as all workitems do similar indexing, it should work fast.
I don't know how else an OpenCL 1.2 kernel can access whole VRAM from a single buffer parameter.
If a buffer has CL_MEM_USE_HOST_PTR, then remove the buffer from worker array, use it directly on HostParameter for mapping/unmapping.
If not fast, add explicit pinning option for fast device I/O.
Backing store: SSD with 3GB/s bandwidth, serialized access from all threads.
L2: combined VRAMs of devices, with PCIE bandwidth (so quad titans can make good cache layer) but high latency, LRU.
L1: RAM that has 60+ GB/s for DDR5 (even more if data fits into CPU cache), direct-mapped.
This works only for dynamic-load balancing with static chunk size and atomic signaling from kernel on only RAM-sharing devices or normal devices with periodic buffer copies to express memory region request within kernel and only for opencl 2.x.
Good for:
Reduction algorithms.
Complex algorithms where output of a kernel directly used by another kernel.
State machines with temporary/non-host arrays.
Good for initialization performance and development time.
It should work like this:
because if a device writes its result before another device reads its input (or worse, while a RAM-sharing device directly working on the host buffer), the results will be undefined.
But when a buffer is read-only or write-only, then all non-ram-sharing devices can work independently (overlapping read-only or non-overlapping write-only/read-only). Also readAll flag not set is ok when reading + writing.
If 2 devices are sharing RAM, their mapping should be unified on a single device buffer to evade opencl-side undefined behavior. (todo)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.