Comments (9)
Interesting. I'll take a look at oclgrind to get a better understand of what you want. I'll come back to you soon. Perhaps also the Collective Knowledge framework might be of some help: https://github.com/ctuning/ck
from cltune.
wow, I wasn't even aware that something like ck existed ... gotta have to do some reading now.
from cltune.
I found some time and I think I understand your idea. By the way, in your first post you meant "emulate an OpenCL device", not "emulate an OpenCL kernel", right?
I am not sure if CLTune is what you are looking for though. What is your use-case exactly? I can interpret your goal in two ways:
- You are trying to machine-learn a kernel optimiser/tuner based on previous kernels it has seen. So you'll need a lot of kernels and some static and run-time information (that's where oclgrind comes into play). Then you can learn what optimisations are a good choice given static and run-time information of a previously unseen kernel.
- You are trying to optimise a single kernel but the optimisation-space is too vast. In that case you'll hope that some static and run-time information (oclgrind again here) can help you guide a machine-learned model faster towards a good (or the best) solution.
In the first case CLTune is really not your choice: it can only perform 'optimisations' that are pre-programmed using pre-processor variables into a kernel. CLTune is a tool to help you explore those options, optionally using machine learning to guide you faster towards a decision space. Better to hook this up in the compiler itself I would say.
For the second case it might be a better fit, but I am not so sure if this extra information will be helpful to train a model. With the extra data we might also need to look at larger models that can capture this new information. Keep in mind that I am currently not even using the static data that is readily available (number of instructions of some sort, number of branches, vector width, architecture details), I am only using the current user-defined 'configuration'. So perhaps it is better to start there, instead of using run-time information from device emulation?
from cltune.
yes, I meant "device" like you said - your 2) describes the idea pretty well, i.e. it has more to do with kernel-specific runtime information and using that come up with/guide different transformations
I will have to do some reading to see if this is really feasible, for all the reasons you mentioned - however, I did reference a few papers that basically describe doing this sort of thing.
So it really is more about narrowing-down and guiding the search space based on kernel-specific information that can be gathered via emulated execution.
from cltune.
OK! Which papers are those? I'm interested as well to see what's possible.
from cltune.
I basically worked through the referenced paper and its references section: http://arxiv.org/pdf/1506.00842v1.pdf
We have developed and validated a machine learning
based auto-tuning framework for OpenCL. The frame-
work measures the performance of several candidate im-
plementations from a parameter configuration space and
uses this result to build a artificial neural network, which
works as a performance model. This model is then used
to find interesting parts of the configuration space, which are explored exhaustively to find good candidate imple-
mentations. Our neural network model achieves a mean
relative error as low as 6.1% for three different bench-
marks executed on three different devices, a Intel i7 3770
CPU, an Nvidia K40 GPU and a AMD Radeon HD 7970.
The autotuner is able to find good configurations, at best
only 1.3% slower than the best configuration.
Future work includes enhancing the performance of the
model, in particular with regard to invalid configurations,
evaluating the model on novel hardware architectures, be-
yond just CPUs and GPUs, and integrating problem pa-
rameters into the performance model. Incorporating ad-
vanced new features specific to a given architecture[39]
will remain challenging. However, studying multi-GPU
systems[40] and looking into multi-variate analysis[41]
may also be interesting avenues of inquiry.
from cltune.
This is exactly what CLTune is also doing. I mainly wrote CLTune because the other paper's authors did not made any tool available. However, I did not evaluate the machine learning part too much, the paper actually doesn't include it at all (http://www.cedricnugteren.nl/downloads/Nugteren2015a.pdf). Perhaps someone should do some more experiments using CLTune an a small neural network?
Or are you perhaps referring to the future work part of the paper:
and integrating problem parameters into the performance model
I am not sure exactly what the authors mean with this, but it could be that this is what you are referring to? In that case I would also contact the authors and see if they haven't already done this?
from cltune.
A really interesting thread. /cc @hughperkins
from cltune.
See also http://chriscummins.cc/pub/2016-adapt.pdf
from cltune.
Related Issues (13)
- Mismatch between README and example programs HOT 4
- Hard-coded relative path in test/tuner.cc HOT 1
- clGetKernelWorkGroupInfo does not tell you the size of the output variable when passed nullptr and size=0 HOT 4
- GEMM on input sizes that are not a power of 2 HOT 3
- Using CLTune for optimizing saxpy HOT 8
- compiling with Clover (Mesa OpenCL) sample_conv gives errors HOT 3
- setting global size HOT 1
- [ERROR] Entries for a single kernel with multiple argument values HOT 5
- Not complied on embdded GPU HOT 3
- crash on my mac HOT 2
- half.h HOT 4
- ubuntu 16.04+g++ (Ubuntu 4.9.3-13ubuntu2) 4.9.3 HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cltune.