Git Product home page Git Product logo

Comments (9)

CNugteren avatar CNugteren commented on July 17, 2024

Interesting. I'll take a look at oclgrind to get a better understand of what you want. I'll come back to you soon. Perhaps also the Collective Knowledge framework might be of some help: https://github.com/ctuning/ck

from cltune.

UniqueFool avatar UniqueFool commented on July 17, 2024

wow, I wasn't even aware that something like ck existed ... gotta have to do some reading now.

from cltune.

CNugteren avatar CNugteren commented on July 17, 2024

I found some time and I think I understand your idea. By the way, in your first post you meant "emulate an OpenCL device", not "emulate an OpenCL kernel", right?

I am not sure if CLTune is what you are looking for though. What is your use-case exactly? I can interpret your goal in two ways:

  1. You are trying to machine-learn a kernel optimiser/tuner based on previous kernels it has seen. So you'll need a lot of kernels and some static and run-time information (that's where oclgrind comes into play). Then you can learn what optimisations are a good choice given static and run-time information of a previously unseen kernel.
  2. You are trying to optimise a single kernel but the optimisation-space is too vast. In that case you'll hope that some static and run-time information (oclgrind again here) can help you guide a machine-learned model faster towards a good (or the best) solution.

In the first case CLTune is really not your choice: it can only perform 'optimisations' that are pre-programmed using pre-processor variables into a kernel. CLTune is a tool to help you explore those options, optionally using machine learning to guide you faster towards a decision space. Better to hook this up in the compiler itself I would say.

For the second case it might be a better fit, but I am not so sure if this extra information will be helpful to train a model. With the extra data we might also need to look at larger models that can capture this new information. Keep in mind that I am currently not even using the static data that is readily available (number of instructions of some sort, number of branches, vector width, architecture details), I am only using the current user-defined 'configuration'. So perhaps it is better to start there, instead of using run-time information from device emulation?

from cltune.

UniqueFool avatar UniqueFool commented on July 17, 2024

yes, I meant "device" like you said - your 2) describes the idea pretty well, i.e. it has more to do with kernel-specific runtime information and using that come up with/guide different transformations

I will have to do some reading to see if this is really feasible, for all the reasons you mentioned - however, I did reference a few papers that basically describe doing this sort of thing.

So it really is more about narrowing-down and guiding the search space based on kernel-specific information that can be gathered via emulated execution.

from cltune.

CNugteren avatar CNugteren commented on July 17, 2024

OK! Which papers are those? I'm interested as well to see what's possible.

from cltune.

UniqueFool avatar UniqueFool commented on July 17, 2024

I basically worked through the referenced paper and its references section: http://arxiv.org/pdf/1506.00842v1.pdf

We have developed and validated a machine learning
based auto-tuning framework for OpenCL. The frame-
work measures the performance of several candidate im-
plementations from a parameter configuration space and
uses this result to build a artificial neural network, which
works as a performance model. This model is then used
to find interesting parts of the configuration space, which are explored exhaustively to find good candidate imple-
mentations. Our neural network model achieves a mean
relative error as low as 6.1% for three different bench-
marks executed on three different devices, a Intel i7 3770
CPU, an Nvidia K40 GPU and a AMD Radeon HD 7970.
The autotuner is able to find good configurations, at best
only 1.3% slower than the best configuration.
Future work includes enhancing the performance of the
model, in particular with regard to invalid configurations,
evaluating the model on novel hardware architectures, be-
yond just CPUs and GPUs, and integrating problem pa-
rameters into the performance model. Incorporating ad-
vanced new features specific to a given architecture[39]
will remain challenging. However, studying multi-GPU
systems[40] and looking into multi-variate analysis[41]
may also be interesting avenues of inquiry.

from cltune.

CNugteren avatar CNugteren commented on July 17, 2024

This is exactly what CLTune is also doing. I mainly wrote CLTune because the other paper's authors did not made any tool available. However, I did not evaluate the machine learning part too much, the paper actually doesn't include it at all (http://www.cedricnugteren.nl/downloads/Nugteren2015a.pdf). Perhaps someone should do some more experiments using CLTune an a small neural network?

Or are you perhaps referring to the future work part of the paper:

and integrating problem parameters into the performance model

I am not sure exactly what the authors mean with this, but it could be that this is what you are referring to? In that case I would also contact the authors and see if they haven't already done this?

from cltune.

bhack avatar bhack commented on July 17, 2024

A really interesting thread. /cc @hughperkins

from cltune.

bhack avatar bhack commented on July 17, 2024

See also http://chriscummins.cc/pub/2016-adapt.pdf

from cltune.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.