Git Product home page Git Product logo

Comments (9)

jgong5 avatar jgong5 commented on September 25, 2024 1

In oneDNN log I observed only 2D matmuls are using quantized kernels and 3D matmuls are using FP32 kernels. How we can enable these kernels for int8.

The 3D matmuls are about bmms in the attention, right? First of all, the enabling of these ops depends on the quantization recipe, i.e., in the model conversion phase of quantization, we need to insert quant ops before these bmms. I'm not sure about "graviton3" but for x86, we are enabling this. cc @leslie-fang-intel @Valentine233

from pytorch.

leslie-fang-intel avatar leslie-fang-intel commented on September 25, 2024 1

Hi @akote123, thanks for the question. Yes, matmul quantization recipe is supported in X86InductorQuantizer https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantizer/x86_inductor_quantizer.py#L786 with PT2E Quantization flow (Refer to this tutorial for details of how to use this flow).

As for the backend optimization: If you are talking about the bmm optimization in attention, we are actually working on a customized SDPA kernel which will running the bmm with int8 data type. It depends on oneDNN BRGEMM optimization, so is still under active development.

from pytorch.

vadimkantorov avatar vadimkantorov commented on September 25, 2024

Somewhat related on dynamic quantization support - but on CUDA:

from pytorch.

akote123 avatar akote123 commented on September 25, 2024

@leslie-fang-intel

As for the backend optimization: If you are talking about the bmm optimization in attention, we are actually working on a customized SDPA kernel which will running the bmm with int8 data type. It depends on oneDNN BRGEMM optimization, so is still under active development.

Thank you . In the non compiler mode these mm ops and bmm ops are are handled by mkl and not enrouted to onednn brgemm. So with enablement of SDPA kernels these ops are handled by oneDNN instead of mkl?

from pytorch.

leslie-fang-intel avatar leslie-fang-intel commented on September 25, 2024

Yes, with enablement of SDPA kernels these ops are handled by oneDNN BRGemm.

from pytorch.

akote123 avatar akote123 commented on September 25, 2024

@leslie-fang-intel ,With non compile mode also these(matmul and bmm) are directed to oneDNN brgemm in the future?

from pytorch.

leslie-fang-intel avatar leslie-fang-intel commented on September 25, 2024

@leslie-fang-intel ,With non compile mode also these(matmul and bmm) are directed to oneDNN brgemm in the future?

We don't have plan to support non-compile mode yet. Can you give more background or details for the request? I can sync with team and feedback to you here.

from pytorch.

akote123 avatar akote123 commented on September 25, 2024

@leslie-fang-intel , Here I just wanted to understand why the oneDNN brgemm path is not followed in pytorch for matmul is it because of the reorder overhead .But in tensorflow the matmuls are handled by oneDNN.

from pytorch.

leslie-fang-intel avatar leslie-fang-intel commented on September 25, 2024

Hi @akote123, we have evaluated oneDNN Quantized Matmul path previously cc @Xia-Weiwen, the performance has some overhead as

  • The B matrix has pack overhead during runtime time.
  • U8U8 activation only supported with latest X86 CPU as SPR which means we need to covert B matrix from U8 to S8 in runtime otherwise.

from pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.