Learning CUDA. This repo includes all scratchwork from many resources.
I have a top-down approach when it comes to learning. I usually start with a broad goal and then narrow myself down to specific topics. These goals aren't necessarily something grand; I simply like thinking of possibilities that could be opened if I have something in my toolkit. And for this time around, I want to train a smaller version of CLIP myself from scratch to dive into the engineering challenges when it comes to training large models.
"Plus Iโm steadily working on these essays on running, though nobody in particular has asked me to. Just like a silent village blacksmith, tinkering away." - from What I Talk About When I Talk About Running by Haruki Murakami
I'm limiting myself to around ~$10,000 in AWS compute(the max I can get one way or another without actually paying), so I have to optimize a lot of things on my own. I was first looking at OpenAI's Triton, but I was running into two problems: the lack of resources to learn as it's still fairly new and that I did not know what Triton improved over cuda. And seeing there's a cuda mode discord that just started off this year, I'm excited to jump on and learn with their help.
- An Even Easier Introduction to CUDA
- Programming Massively Parallel Processors (Textbook)