Git Product home page Git Product logo

lab3_css_555's Introduction

Lab3_CSS_555

Cuda Lab CSS535

Lab File setup:

Clone project into folder

IDE Setup:

Visual studio 2022

  1. Open new cuda project (in a seperate folder)

  2. Solution -> add external file (add all files from the cloned folder)

NOTE: This will keep all the source files in the cloned folder because we all have differnt development environments so you will have a seperate environmet of your choosing.

  1. After changes made to files save and push from the cloned folder.

  2. If you make any other .h files make sure they are put in the cloned folder and relinked to your chosen environment so we only have one copy floating around.

Please add to this for those using nvcc or other IDE so we are keeping this updated

Linux / local or remote

  1. nvcc -arch=sm_86 -lcublas kernel.cu -o lab3

Note: -arch=sm86 is your compute capability in this case 8.6

For debugging:

nvcc -g -arch=sm_86 -lcublas kernel.cu -o lab3

For device debugging:

nvcc -g -G -arch=sm_86 -lcublas kernel.cu -o lab3

Profile:

  1. ncu -o profile lab3

Nvprof command-line call to get the cach transactions nvprof --metrics l1_cache_global_hit_rate,l1_cache_local_hit_rate

See below link. nvprof --query-metrics outputs lots of metric, a couple I used above https://simoncblyth.bitbucket.io/env/notes/cuda/cuda_profiling/

Git Workflow:

  1. clone to local machine: git clone "https:/somerepo"
  2. update: git pull
  3. Make working branch: git branch "name of feature"
  4. checkout branch: git checkout "name of branch"
  5. Push branch to repo (so we can all see it): git push --set-upstream origin "name of branch"

NOTE: to check which branch you are on: git staus

Merging:

go to branch in github (drop down where it says main) on the branch you want to merge hit contribute and fill out pull request

Parameter Operations:

Note this section: TESTPARAM is all 2's to check basic operation REALDATA is randoms and there is a significant bit of divergence there due to the number of operations.

// NOTE: one but not both of these should be defined // Test parameters all 2's to check //#define TESTPARAM // Random values for vector and matrix #define REALDATA

Using CMake:

  1. mkdir build
  2. cd build
  3. cmake ..

to build the project:

  1. cd build
  2. make

lab3_css_555's People

Contributors

f-sossi avatar margaret3991 avatar amaoyake avatar nicolasjposey avatar

Stargazers

 avatar

Watchers

 avatar  avatar

lab3_css_555's Issues

Need to add part 3

Part 3: registers

Try a modification of the naïve version related to registers (try loop unrolling and inspect the number of registers per thread). Then try to overflow the number of registers available and see how this may (or not) impact your performance since there will be overflow to Local Memory. Note that you would have to use very large vectors.

Run and Profile

Report results of one or two runs where you observed interesting changes and explain the WHY of these differences

Finish part 2

Part 2: shared memory

Try a modification of the naïve version related to shared memory use: This one might be a little trickier. Recall that for shared memory we take advantage of locality. Discuss in your report if and how you did this modification, and if not, why. For instance: could you try out very large vectors processed by just a few blocks, where the block size is not a multiple of 32 and use partially empty blocks?

Run and Profile

Report results of one or two runs where you observed interesting changes and explain the WHY of these differences

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.