Git Product home page Git Product logo

cuda-profiler's Introduction

Tools and extensions for CUDA profiling

Extension Extends tool Description
one-hop profiling NVIDIA Visual Profiler Remotely profile a CUDA program when the machine actually running it is not accessible from the machine running the NVIDIA Visual Profiler
NVTX MPI Wrappers nvprof Inserts NVTX ranges for many common Message Passing Interface (MPI) functions.

cuda-profiler's People

Contributors

jefflarkin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cuda-profiler's Issues

nvtx_pmpi Fortran interface crashes when using MPI_IN_PLACE

nvtx_pmpi interfaces Fortran MPI_* calls to C PMPI_* calls itself, rather than leaving that step up to the underlying MPI library.
Unfortunately it gets some things wrong in the process, in particular, handling special constants that are used instead of data pointers, like MPI_IN_PLACE and MPI_BOTTOM.

I have a workaround for OpenMPI/SpectrumMPI, but it's not general, and I'm not positive that it's possible to do this generically in the first place. Anyway, I guess the first question is whether there is interest in addressing this issue, if so, it'd be worth discussing options on how to do it.

dlprof tools

The dlprof tool analyzed the deep model and proposed that the data shape did not meet the requirements of tensor core. The original script set five full connection layers, namely, shape (8,1024), (1024,1024), (1024,512), (512,1) and batch=128. When the batch=64, the improvements proposed by DLprof were resolved. why?I did not change the shape(512,1) to (512,8)

this is my code in github, https://github.com/fenfaqingnian/dlprof_v100/tree/master/Profiler_DLprof_TF1-master

compile failure using PGI

git reflog
5a6577f (HEAD -> master, origin/master, origin/HEAD) HEAD@{0}: clone: from https://github.com/NVIDIA/cuda-profiler.git

pgcc --version
pgcc 19.5-0 LLVM 64-bit target on x86-64 Linux -tp sandybridge
PGI Compilers and Tools
Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

65> make
python2.7 wrap/wrap.py -f -o nvtx_pmpi.c nvtx.w
mpicc -I/nasa/cuda/10.1/include -DPIC -fPIC -c nvtx_pmpi.c -o nvtx_pmpi.o
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 627)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 729)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 729)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 825)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 825)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 921)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 921)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1017)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1017)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1017)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1017)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1065)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1065)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1065)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1065)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1065)
PGC-S-0094-Illegal type conversion required (nvtx_pmpi.c: 1065)
PGC/x86-64 Linux 19.5-0: compilation completed with severe errors
Makefile:5: recipe for target 'nvtx_pmpi.o' failed
make: *** [nvtx_pmpi.o] Error 2

compiles fine with gcc-8.2.0

Nvprof event resampling

Dear developers:

How to reduce Nvporf output nvvp file sizes through re-sampling events? One of the options I can find from Nvprof help that I thought it might work is the option: --continuous-sampling-interval. When I specify it as 10 ms (the default is 2 ms according to the document), it still produces as the same sizes of output files as without specifying it. Is it a bug, or something else that I am missing with it?

Thanks,
Shelton

MPI annotation option does not output any MPI information

Dear Nvprof developers:

I want to use nvprof to profile my cuda+mpi application. But the little test shows that the options --annote-mpi openmpi does not produce any information about MPI interface as described in the nvprof document. The following is the information of example for the test:

Sample Test:
From Link: http://geco.mines.edu/tesla/cuda_tutorial_mio/
Source Files: mpi_hello_gpu.cu, vecadd.cu
OpenMPI Version: 4.0.2
Cuda Version: 10.1
Command: $ mpirun -np 2 nvprof --annotate-mpi openmpi ./mpi_cuda

Output ( using 2 mpi processes):
rank 0 of 2 on p3dev02 received bcastme[3]=3 [gpu 0]
rank 1 of 2 on p3dev02 received bcastme[3]=3 [gpu 1]
==70253== NVPROF is profiling process 70253, command: ./mpi_cuda
==70254== NVPROF is profiling process 70254, command: ./mpi_cuda
rank 0: cudaGetDevice()=0
rank 1: cudaGetDevice()=1
rank 1: C[0]=0.000000
ranksum= 1
==70253== Profiling application: ./mpi_cuda
==70253== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 62.58% 3.1040us 2 1.5520us 1.3440us 1.7600us [CUDA memcpy HtoD]
37.42% 1.8560us 1 1.8560us 1.8560us 1.8560us [CUDA memcpy DtoH]
API calls: 86.74% 352.44ms 3 117.48ms 10.267us 352.42ms cudaMalloc
5.39% 21.910ms 582 37.645us 258ns 2.0794ms cuDeviceGetAttribute
4.75% 19.303ms 50000 386ns 303ns 102.73us cudaLaunchKernel
2.07% 8.3917ms 6 1.3986ms 1.1406ms 1.4661ms cuDeviceTotalMem
0.68% 2.7607ms 1 2.7607ms 2.7607ms 2.7607ms cudaGetDeviceProperties
0.34% 1.3713ms 6 228.55us 215.41us 247.59us cuDeviceGetName
0.02% 66.319us 3 22.106us 14.092us 30.931us cudaMemcpy
0.01% 20.708us 3 6.9020us 1.8690us 16.755us cudaFree
0.00% 12.278us 6 2.0460us 1.3700us 4.3850us cuDeviceGetPCIBusId
0.00% 7.5770us 12 631ns 375ns 973ns cuDeviceGet
0.00% 6.6190us 1 6.6190us 6.6190us 6.6190us cudaSetDevice
0.00% 6.2070us 4 1.5510us 867ns 2.3670us cuPointerGetAttributes
0.00% 2.3390us 6 389ns 354ns 461ns cuDeviceGetUuid
0.00% 1.8280us 3 609ns 437ns 780ns cuDeviceGetCount
0.00% 1.5210us 1 1.5210us 1.5210us 1.5210us cudaGetDevice
0.00% 1.2300us 1 1.2300us 1.2300us 1.2300us cudaGetDeviceCount
==70254== Profiling application: ./mpi_cuda
==70254== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 179.83ms 50000 3.5960us 3.5510us 4.0640us vecAdd(float*, float*, float*)
0.00% 3.0400us 2 1.5200us 1.3440us 1.6960us [CUDA memcpy HtoD]
0.00% 2.0480us 1 2.0480us 2.0480us 2.0480us [CUDA memcpy DtoH]
API calls: 68.49% 884.64ms 50000 17.692us 16.647us 1.4335ms cudaLaunchKernel
28.85% 372.61ms 3 124.20ms 15.212us 372.57ms cudaMalloc
1.55% 20.003ms 582 34.368us 453ns 1.2518ms cuDeviceGetAttribute
0.76% 9.7675ms 6 1.6279ms 1.6077ms 1.6602ms cuDeviceTotalMem
0.25% 3.2029ms 1 3.2029ms 3.2029ms 3.2029ms cudaGetDeviceProperties
0.10% 1.2356ms 6 205.93us 135.78us 224.53us cuDeviceGetName
0.01% 103.42us 3 34.473us 19.464us 60.273us cudaMemcpy
0.00% 60.895us 3 20.298us 4.2420us 51.665us cudaFree
0.00% 16.364us 4 4.0910us 2.0370us 9.1220us cuPointerGetAttributes
0.00% 14.154us 6 2.3590us 1.9510us 3.1620us cuDeviceGetPCIBusId
0.00% 11.338us 12 944ns 580ns 1.5080us cuDeviceGet
0.00% 7.3840us 1 7.3840us 7.3840us 7.3840us cudaSetDevice
0.00% 3.8410us 6 640ns 592ns 673ns cuDeviceGetUuid
0.00% 2.7020us 3 900ns 699ns 1.0970us cuDeviceGetCount
0.00% 1.9360us 1 1.9360us 1.9360us 1.9360us cudaGetDevice
0.00% 1.2750us 1 1.2750us 1.2750us 1.2750us cudaGetDeviceCount

Hope you can reproduce the issue.

Best,
Shelton

Remote profiling - failed to Create a new session (Ctrl + N)

I'm running Visual Profiler on Windows and try to remotely profile ubuntu machine.
I don't have Nvidia GPU on my Windows.
When trying to create new session, I got the following message and Visual Profiler exit.:
"Unable to locate CUDA libraries and establish connection with CUDA driver"

VisualProfilerCudaFail

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.