Git Product home page Git Product logo

hands-on-gpu-programming-with-python-and-cuda's Introduction

Hands-On GPU Programming with Python and CUDA

Hands-On GPU Programming with Python and CUDA

This is the code repository for Hands-On GPU Programming with Python and CUDA, published by Packt.

Explore high-performance parallel computing with CUDA

What is this book about?

Hands-On GPU Programming with Python and CUDA hits the ground running: you’ll start by learning how to apply Amdahl’s Law, use a code profiler to identify bottlenecks in your Python code, and set up an appropriate GPU programming environment. You’ll then see how to “query” the GPU’s features and copy arrays of data to and from the GPU’s own memory.

This book covers the following exciting features:

  • Launch GPU code directly from Python
  • Write effective and efficient GPU kernels and device functions
  • Use libraries such as cuFFT, cuBLAS, and cuSolver
  • Debug and profile your code with Nsight and Visual Profiler
  • Apply GPU programming to datascience problems
  • Build a GPU-based deep neuralnetwork from scratch
  • Explore advanced GPU hardware features, such as warp shuffling

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

cublas.cublasDestroy(handle)
print 'cuBLAS returned the correct value: %s' % np.allclose(np.dot(A,x), y_gpu.get())

Following is what you need for this book: Hands-On GPU Programming with Python and CUDA is for developers and data scientists who want to learn the basics of effective GPU programming to improve performance using Python code. You should have an understanding of first-year college or university-level engineering mathematics and physics, and have some experience with Python as well as in any C-based programming language such as C, C++, Go, or Java.

With the following software and hardware list you can run all code files present in the book (Chapter 1-12).

Software and Hardware List

Chapter Software required OS required
1-11 Anaconda 5 (Python 2.7 version) Windows, Linux
2-11 CUDA 9.2, CUDA 10.x Windows, Linux
2-11 PyCUDA (latest) Windows, Linux
7 Scikit-CUDA (latest) Windows, Linux
2-11 Visual Studio Community 2015 Windows
2-11 GCC, GDB, Eclipse Linux
Chapter Hardware required OS required
1-11 64-bit Intel/AMD PC Windows, Linux
1-11 4 Gigabytes RAM Windows, Linux
2-11 NVIDIA GPU (GTX 1050 or better) Windows, Linux

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Dr Brian Tuomanen has been working with CUDA and general-purpose GPU programming since 2014. He received his bachelor of science in electrical engineering from the University of Washington in Seattle, and briefly worked as a software engineer before switching to mathematics for graduate school. He completed his PhD in mathematics at the University of Missouri in Columbia, where he first encountered GPU programming as a means for studying scientific problems. Dr. Tuomanen has spoken at the US Army Research Lab about general-purpose GPU programming and has recently led GPU integration and development at a Maryland-based start-up company. He currently works as a machine learning specialist (Azure CSI) for Microsoft in the Seattle area.

Suggestions and Feedback

Click here if you have any feedback or suggestions.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781788993913

hands-on-gpu-programming-with-python-and-cuda's People

Contributors

btuomanen avatar lawrenceveigas1994 avatar mehulsingh7 avatar packt-itservice avatar packtutkarshr avatar riddesh25 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hands-on-gpu-programming-with-python-and-cuda's Issues

Question about ElementwiseKernel speed performance

I have a question about ElementwiseKernel speed performance, specifically in the example code of simple_element_kernel_example0.py.

I twisted this sample code to fit into Python3 environment just by changing print function parts as follows:

import numpy as np
import pycuda.autoinit    # initialize pycuda
from pycuda import gpuarray
from time import time
from pycuda.elementwise import ElementwiseKernel

host_data = np.float32( np.random.random(50000000) )

gpu_2x_ker = ElementwiseKernel(
"float *in, float *out",
"out[i] = 2*in[i];",
"gpu_2x_ker")

def speedcomparison():
    t1 = time()
    host_data_2x =  host_data * np.float32(2)
    t2 = time()
    print('total time to compute on CPU: %f' % (t2 - t1))
    device_data = gpuarray.to_gpu(host_data)
    # allocate memory for output
    device_data_2x = gpuarray.empty_like(device_data)
    t1 = time()
    gpu_2x_ker(device_data, device_data_2x)
    t2 = time()
    from_device = device_data_2x.get()
    print('total time to compute on GPU: %f' % (t2 - t1))
    print('Is the host computation the same as the GPU computation? : {}'.format(np.allclose(from_device, host_data_2x) ))
    

if __name__ == '__main__':
    speedcomparison()

By reading this book, I know there is a compilation issue so that it takes way more time for the first run. So I ran this code several times, but even after 10 times run, my result doesn't look good.

total time to compute on CPU: 0.076065
total time to compute on GPU: 1.330138
Is the host computation the same as the GPU computation? : True

In time_calc0.py example, I got the result I expected with the code and the result as follows:

import numpy as np
import pycuda.autoinit
from pycuda import gpuarray
from time import time

host_data = np.float32( np.random.random(50000000) )

t1 = time()
host_data_2x =  host_data * np.float32(2)
t2 = time()

print('total time to compute on CPU: %f' % (t2 - t1))


device_data = gpuarray.to_gpu(host_data)

t3 = time()
device_data_2x =  device_data * np.float32(2)
t4 = time()

from_device = device_data_2x.get()


print('total time to compute on GPU: %f' % (t4 - t3))
print('GPU is %f times faster than CPU'  % ((t2-t1)/(t4 - t3)))
print('Is the host computation the same as the GPU computation? : {}'.format(np.allclose(from_device, host_data_2x) ))
total time to compute on CPU: 0.097082
total time to compute on GPU: 0.010009
GPU is 9.699898 times faster than CPU
Is the host computation the same as the GPU computation? : True

Could you point out what I missed or did wrong? Thanks.

For your information, my environment is as follows:

  • O/S : Windows10
  • CPU : AMD 3950x
  • GPU : 1080ti
  • Python : Anaconda 3.7 64bit
  • pycuda : 2019.1.2
  • Visual C++ Compiler : 19.12.25835
  • CUDA : 10.2

cuda_cores_per_mp KeyError: 7.5

I got the following error when running the python deviceQuery.py command.

CUDA device query (PyCUDA version)

Detected 1 CUDA Capable device(s)

Device 0: Quadro T1000 with Max-Q Design
         Compute Capability: 7.5
         Total Memory: 4096 megabytes
Traceback (most recent call last):
  File "deviceQuery.py", line 35, in <module>
    cuda_cores_per_mp = { 5.0 : 128, 5.1 : 128, 5.2 : 128, 6.0 : 64, 6.1 : 128, 6.2 : 128}[compute_capability]
KeyError: 7.5

Can you update the code to include key 7.5 in the dict?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.