The apnn-tc from boyuanfeng

Compile out a bin file. How to run it?

Hi! I am learning your code, and firstly, I download the whole zip file, secondly, I put cutlass package into cutlass directory here. Then I cd to cutlass_kernel, I enter "make all". And it works! It shows something like this:
(base) C:\trash_can\APNN-TC-main\cutlass_kernel>make all nvcc -I../cutlass/include -I../cutlass/tools/util/include -I../cutlass/examples/common -std=c++11 -O3 -w -arch=sm_86 bench_gemm.cu -o bench_gemm.bin nvcc warning : The -std=c++11 flag is not supported with the configured host compiler. Flag will be ignored. bench_gemm.cu 正在创建库 bench_gemm.lib 和对象 bench_gemm.exp
And... I get a bench_gemm.bin. I am not sure...how to run this bin file? Previously I meet a.exe and a.out for win and linux as nvcc's return file. But never meet bin. Also find nothing on google...

Thank you!!!!

Question about the bit-width of outputs of int1 Tensor Core

Hello, thanks for your wonderful work and available code! However, during read this paper, I have the following questions.

In the paper, the authors claim that "the int1 Tensor Core compute primitive can only generate 32 outputs". It seems that 32 bit-width output is relatively large for Int1 matric multiply-accumulate (MMA). So I want to check whether we can control the bit-width of outputs for Int1 MMA.

First, I try to access the white paper in reference [32] https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-amperearchitecture-whitepaper.pdf but I got an error with "Even AI can't find this page!".

Then, I access https://www.nvidia.com/content/PDF/[nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf](https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf) to search relative information in this white paper. However, I still can not find the introduction about the setting of Tensor Core outputs' bitwidth.

So, my question is:
(1) Where can I find the description that can support or provide evidence for "the int1 Tensor Core compute primitive can only generate 32 outputs" in white papers or other documents?
(2) Do I have any way to control the outputs' bit-width of the int1 Tensor Core compute?

Looking forward to your reply.
Best wish to you！

boyuanfeng / apnn-tc Goto Github PK

apnn-tc's People

Stargazers

Watchers

Forkers

apnn-tc's Issues

Compile out a bin file. How to run it?

Question about the bit-width of outputs of int1 Tensor Core

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent