Hands-On GPU Programming with Python and CUDA, published by Packt

License: MIT License

Python 91.96% Cuda 7.53% Batchfile 0.51%

hands-on-gpu-programming-with-python-and-cuda's Introduction

Hands-On GPU Programming with Python and CUDA

This is the code repository for Hands-On GPU Programming with Python and CUDA, published by Packt.

Explore high-performance parallel computing with CUDA

What is this book about?

Hands-On GPU Programming with Python and CUDA hits the ground running: you’ll start by learning how to apply Amdahl’s Law, use a code profiler to identify bottlenecks in your Python code, and set up an appropriate GPU programming environment. You’ll then see how to “query” the GPU’s features and copy arrays of data to and from the GPU’s own memory.

This book covers the following exciting features:

Launch GPU code directly from Python
Write effective and efficient GPU kernels and device functions
Use libraries such as cuFFT, cuBLAS, and cuSolver
Debug and profile your code with Nsight and Visual Profiler
Apply GPU programming to datascience problems
Build a GPU-based deep neuralnetwork from scratch
Explore advanced GPU hardware features, such as warp shuffling

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

cublas.cublasDestroy(handle)
print 'cuBLAS returned the correct value: %s' % np.allclose(np.dot(A,x), y_gpu.get())

Following is what you need for this book: Hands-On GPU Programming with Python and CUDA is for developers and data scientists who want to learn the basics of effective GPU programming to improve performance using Python code. You should have an understanding of first-year college or university-level engineering mathematics and physics, and have some experience with Python as well as in any C-based programming language such as C, C++, Go, or Java.

With the following software and hardware list you can run all code files present in the book (Chapter 1-12).

Software and Hardware List

Chapter	Software required	OS required
1-11	Anaconda 5 (Python 2.7 version)	Windows, Linux
2-11	CUDA 9.2, CUDA 10.x	Windows, Linux
2-11	PyCUDA (latest)	Windows, Linux
7	Scikit-CUDA (latest)	Windows, Linux
2-11	Visual Studio Community 2015	Windows
2-11	GCC, GDB, Eclipse	Linux

Chapter	Hardware required	OS required
1-11	64-bit Intel/AMD PC	Windows, Linux
1-11	4 Gigabytes RAM	Windows, Linux
2-11	NVIDIA GPU (GTX 1050 or better)	Windows, Linux

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Get to Know the Author

Dr Brian Tuomanen has been working with CUDA and general-purpose GPU programming since 2014. He received his bachelor of science in electrical engineering from the University of Washington in Seattle, and briefly worked as a software engineer before switching to mathematics for graduate school. He completed his PhD in mathematics at the University of Missouri in Columbia, where he first encountered GPU programming as a means for studying scientific problems. Dr. Tuomanen has spoken at the US Army Research Lab about general-purpose GPU programming and has recently led GPU integration and development at a Maryland-based start-up company. He currently works as a machine learning specialist (Azure CSI) for Microsoft in the Seattle area.

Suggestions and Feedback

Click here if you have any feedback or suggestions.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781788993913

hands-on-gpu-programming-with-python-and-cuda's People

Contributors

Stargazers

Watchers

Forkers

riddesh25 sconde btuomanen stars-archive jlee1011 rasoolianbehnam hs-zheng gehongpeng pcjimmmy heruix techlord-rce mmtaksuu starseeker718 jk-ayaan acruzpr auserj xanderchou shafaypro dsinsight himelbarua carlosvaztm geomaticsandrs eagle40 wangshubo90 dlcen astralfreak schaeferrodrigo gyf135 evilyingyun youngsonzhao alitabet doytsujin simonryu djkormo 200429ref livingthingsnow fgreve tangzheng1104 kannab98 zhenyu wilsonify bluestupidyu float123 truecoder34 fishdda dw-liedji alexios-xi itspectre mkollam utmcontent salvadorhm wesmith rancychepchirchir elraro bainro turchaev arifmudi librechou clementetienam imzongjian ardaunal wjw6692353 guodaochong timeon kurumifans ab-elhilali baldrlector marcelo-soares-github arunava98 davidalphafox stjordanis harkovjohn shuai220 sungsoosmess liopeer matrix137 miaviles bailang1208 liangtsao hsq79815 doyeon412a quant-kangchen fouad-asil 3dalgolab otannous peter-zyj krajit puppemeister ramakantgadhewal boudiccadaain fengsiyu zjnhq xiedlq legendlc 278937623 maxclchen haimiaozh wuzerun-888 filipemilani zelasyou

hands-on-gpu-programming-with-python-and-cuda's Issues

Question about ElementwiseKernel speed performance

I have a question about ElementwiseKernel speed performance, specifically in the example code of simple_element_kernel_example0.py.

I twisted this sample code to fit into Python3 environment just by changing print function parts as follows:

import numpy as np
import pycuda.autoinit    # initialize pycuda
from pycuda import gpuarray
from time import time
from pycuda.elementwise import ElementwiseKernel

host_data = np.float32( np.random.random(50000000) )

gpu_2x_ker = ElementwiseKernel(
"float *in, float *out",
"out[i] = 2*in[i];",
"gpu_2x_ker")

def speedcomparison():
    t1 = time()
    host_data_2x =  host_data * np.float32(2)
    t2 = time()
    print('total time to compute on CPU: %f' % (t2 - t1))
    device_data = gpuarray.to_gpu(host_data)
    # allocate memory for output
    device_data_2x = gpuarray.empty_like(device_data)
    t1 = time()
    gpu_2x_ker(device_data, device_data_2x)
    t2 = time()
    from_device = device_data_2x.get()
    print('total time to compute on GPU: %f' % (t2 - t1))
    print('Is the host computation the same as the GPU computation? : {}'.format(np.allclose(from_device, host_data_2x) ))
    

if __name__ == '__main__':
    speedcomparison()

By reading this book, I know there is a compilation issue so that it takes way more time for the first run. So I ran this code several times, but even after 10 times run, my result doesn't look good.

total time to compute on CPU: 0.076065
total time to compute on GPU: 1.330138
Is the host computation the same as the GPU computation? : True

In time_calc0.py example, I got the result I expected with the code and the result as follows:

import numpy as np
import pycuda.autoinit
from pycuda import gpuarray
from time import time

host_data = np.float32( np.random.random(50000000) )

t1 = time()
host_data_2x =  host_data * np.float32(2)
t2 = time()

print('total time to compute on CPU: %f' % (t2 - t1))


device_data = gpuarray.to_gpu(host_data)

t3 = time()
device_data_2x =  device_data * np.float32(2)
t4 = time()

from_device = device_data_2x.get()


print('total time to compute on GPU: %f' % (t4 - t3))
print('GPU is %f times faster than CPU'  % ((t2-t1)/(t4 - t3)))
print('Is the host computation the same as the GPU computation? : {}'.format(np.allclose(from_device, host_data_2x) ))

total time to compute on CPU: 0.097082
total time to compute on GPU: 0.010009
GPU is 9.699898 times faster than CPU
Is the host computation the same as the GPU computation? : True

Could you point out what I missed or did wrong? Thanks.

For your information, my environment is as follows:

O/S : Windows10
CPU : AMD 3950x
GPU : 1080ti
Python : Anaconda 3.7 64bit
pycuda : 2019.1.2
Visual C++ Compiler : 19.12.25835
CUDA : 10.2

cuda_cores_per_mp KeyError: 7.5

I got the following error when running the python deviceQuery.py command.

CUDA device query (PyCUDA version)

Detected 1 CUDA Capable device(s)

Device 0: Quadro T1000 with Max-Q Design
         Compute Capability: 7.5
         Total Memory: 4096 megabytes
Traceback (most recent call last):
  File "deviceQuery.py", line 35, in <module>
    cuda_cores_per_mp = { 5.0 : 128, 5.1 : 128, 5.2 : 128, 6.0 : 64, 6.1 : 128, 6.2 : 128}[compute_capability]
KeyError: 7.5

Can you update the code to include key 7.5 in the dict?

packtpublishing / hands-on-gpu-programming-with-python-and-cuda Goto Github PK

hands-on-gpu-programming-with-python-and-cuda's Introduction

Hands-On GPU Programming with Python and CUDA

What is this book about?

Instructions and Navigations

Software and Hardware List

Related products

Get to Know the Author

Suggestions and Feedback

Download a free PDF

hands-on-gpu-programming-with-python-and-cuda's People

Contributors

Stargazers

Watchers

Forkers

hands-on-gpu-programming-with-python-and-cuda's Issues

Question about ElementwiseKernel speed performance

cuda_cores_per_mp KeyError: 7.5

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent