Git Product home page Git Product logo

Comments (15)

thakkarV avatar thakkarV commented on June 10, 2024

This is expected on H100 as we have optimized H100 kernels for dense Gemm but not sparse yet. The sparse kernels you're running are most certainly Ampere ones recompiled for hopper. We have hopper sparse on our roadmap for sometime middle of 2024

from cutlass.

jimwu6 avatar jimwu6 commented on June 10, 2024

Thanks for the response, that makes sense. I've rerun the benchmarks on A100 and the results seem to be closer, however there are still some areas where the sparse kernels are about 30-50% slower than the dense ones.

E.g. take the settings above, take M=768, N=4096, K=4096. My best run time for dense is 0.105734 and my best run time for sparse is 0.150016. This is on A100. Is this still the expected result?

from cutlass.

thakkarV avatar thakkarV commented on June 10, 2024

@hwu36

from cutlass.

hwu36 avatar hwu36 commented on June 10, 2024

Could you please paste the logs of the best kernel of sparse and dense on A100?

from cutlass.

jimwu6 avatar jimwu6 commented on June 10, 2024

I have the result from the profiler in a csv in this gist for the best one for M=768, N=4096, K=4096.

from cutlass.

hwu36 avatar hwu36 commented on June 10, 2024

your sparse one is running a small tile 64x128 which your dense one is running a big 256x128? have you tried 256x128 or 128x256 or 128x128 for sparse?

from cutlass.

jimwu6 avatar jimwu6 commented on June 10, 2024

The tile size is referring to cta_{m, n, k}, correct? If so, my first thought is that the cutlass_profiler should be running all possible configurations it has given the settings I selected, and since I didn't specify explicitly any of those it should run those - however it doesn't. I went back to try manually setting e.g. --cta_m=128, but that ended not running anything (which I believe means that the kernel/doesn't exist or can't be found?). This is the same for f16 and bf16.

I believe I should have installed all the possible kernels, since my install process was the following:

cmake .. -DCUTLASS_NVCC_ARCHS=80
make cutlass_profiler -j12

and I didn't select any specific kernels to exclude/include.

from cutlass.

hwu36 avatar hwu36 commented on June 10, 2024

you could add -DCUTLASS_LIBRARY_KERNELS=bf16_s16832spgemm_bf16_*_tt_align8 to cmake to enable all sparse gemm tile sizes.

from cutlass.

jimwu6 avatar jimwu6 commented on June 10, 2024

Using that flag produces no new kernels, which is verified by no additional results in the cutlass_profiler.

from cutlass.

hwu36 avatar hwu36 commented on June 10, 2024

what do you see when you run this command

ls $BUILD/tools/library/generated/gemm/80/bf16_s16832spgemm_bf16/

this is what I have

[haichengw@computelab-build-2 build_90a]$ ls tools/library/generated/gemm/80/bf16_s16832spgemm_bf16/
all_sm80_bf16_s16832spgemm_bf16_gemm_operations.cu		    cutlass_tensorop_bf16_s16832spgemm_bf16_128x64_64x3_tt_align8.cu   cutlass_tensorop_bf16_s16832spgemm_bf16_64x128_64x6_tt_align8.cu
cutlass_tensorop_bf16_s16832spgemm_bf16_128x128_128x3_tt_align8.cu  cutlass_tensorop_bf16_s16832spgemm_bf16_256x128_64x3_tt_align8.cu  cutlass_tensorop_bf16_s16832spgemm_bf16_64x256_64x4_tt_align8.cu
cutlass_tensorop_bf16_s16832spgemm_bf16_128x128_64x3_tt_align8.cu   cutlass_tensorop_bf16_s16832spgemm_bf16_256x64_128x3_tt_align8.cu  cutlass_tensorop_bf16_s16832spgemm_bf16_64x64_128x3_tt_align8.cu
cutlass_tensorop_bf16_s16832spgemm_bf16_128x256_64x3_tt_align8.cu   cutlass_tensorop_bf16_s16832spgemm_bf16_256x64_64x3_tt_align8.cu   cutlass_tensorop_bf16_s16832spgemm_bf16_64x64_64x4_tt_align8.cu
cutlass_tensorop_bf16_s16832spgemm_bf16_128x64_128x4_tt_align8.cu   cutlass_tensorop_bf16_s16832spgemm_bf16_64x128_128x3_tt_align8.cu

from cutlass.

jimwu6 avatar jimwu6 commented on June 10, 2024

I have the same contents as you. However the profiler only seem to run the 64x128, even when I don't specify cta.
I am running exactly

../tools/profiler/cutlass_profiler --operation=spgemm --m=768 --n=4096 --k=4096 --A=bf16 --B=bf16 --C=bf16 --D=bf16 --batch_count=1 --profiling-iterations=5 --warmup-iterations=1 --output=report.csv

as per my original post

from cutlass.

hwu36 avatar hwu36 commented on June 10, 2024

what if you just do ../tools/profiler/cutlass_profiler --m=768 --n=4096 --k=4096

from cutlass.

jimwu6 avatar jimwu6 commented on June 10, 2024

When I run that the only sparse GEMMs that don't have --cta_m=64 --cta_n=128 are those which are f32 or s4 or s8 in its inputs.

from cutlass.

hwu36 avatar hwu36 commented on June 10, 2024

you need to use tn (row x col) layout for f32, s4, s8.

from cutlass.

mnicely avatar mnicely commented on June 10, 2024

@jimwu6 have your issues been resolved?

from cutlass.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.