Git Product home page Git Product logo

cuda's Introduction

CUDA

Fundametal concepts

How to declare functions that will run on GPU

The keyword to declare kernel functions (the ones that can be run on gpu) is __global__. This will tell the CUDA C++ compiler that the function will run on the GPU and it can be called from CPU code. For example the code below shows how to use the keyword

__global__
void myProcedure()
{
    .
    .
    . some code here
    .
}

How to allocate memory on the GPU

We can allocate memory on the GPU (device) but it can be also accessible from the CPU (host). The command is similar to the c function malloc. The syntax is cudaMallocManaged(&pointer, size_to_allocate) We have to pass a pointer's address and the size that we want to allocate. The function will return the inizialized pointer.

int *x;
cudaMallocManaged(&x, sizeof(int));
// x will point to a memory location of the first allocated byte

How to deallocate memory on the GPU

We can also deallocate memory by using cudaFree(what_to_deallocate)

int *x;
cudaMallocManaged(&x, sizeof(int));
// now we have memory allocated

cudaFree(x);
// now we don't have memory allocated

How the GPU splits the work

The GPU can achive massive parallelism performance if we take advantage of the number of cores. We split the work in the following way. Grid -> Block -> Thread.

  • The Grid is the major block. We can have many grids, depending of what GPU we are using. The important thing is that the number of grids must be a power of 2.
  • In every grid we have blocks that must also be a power of 2.
  • And finally in every block we have threads, also a power of 2.

The picture below shows a division of the work on an array and how to retrive the index of the elements. alt text

How to call funtions that will run on the GPU

We have a special syntax to run functions on GPU which is myFunction<<< #blocks , #threads >>>(parameters_of_the_function) For example the code below will run the function myFunction using 16 blocks of 128 threads each with no parameters.

myFunction<<< 16, 128 >>>()

If we want to know how many blocks we need given the number of threads (or the dimension of the block) we can use

int blockSize = 256;
int numBlocks = (len_of_array + blockSize - 1) / blockSize;

How to synchronize the host and the device

When lunching the CUDA kernel will not block the calling CPU thread, so we have to say the CPU to wait for the GPU to finish the computation before accessing the result. The funcion to do that is cudaDeviceSynchronize()

cuda's People

Contributors

cmancio00 avatar lorenzofelletti avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.