Can you add an example of BitNet from Microsoft : <a href="https://github.com/kyegomez

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

feature : implementing BitNet about mlx-examples HOT 6 CLOSED

thegodone commented on August 24, 2024

feature : implementing BitNet

from mlx-examples.

Comments (6)

RonanKMcGovern commented on August 24, 2024 1

Thanks Awni, probably my goals were ill-conceived. Seeing the BitNet and 1.58 papers, I had thought there could be merit - both for a) reducing VRAM and b) reducing FLOPS - in using smaller 1-2 bit kernels. However, it appears that: 1. Hardware doesn't support less than fp8 (although Blackwell will support fp4). So you're always up/down casting during inference, which costs time. 2. Training actually isn't stable in any of these smaller formats. Even Nvidia TransformerEngine upcasts a lot. So there isn't so much gain on training speeds (maybe a tiny bit in forward pass). 3. So one is left with two options: a. Quantize a pre-trained model to 1/1.58 bits to save on VRAM - but quality will be bad doing just that, and further training will be required. b. Train a 1/1.58 bit model from scratch. But who is going to do that at the 7B scale - because it's possibly slower than training in bf16... (perhaps even if the results of BitNet do scale and quality is good in 1/1.58 format).

…

On Wed, Mar 20, 2024 at 5:03 AM Awni Hannun ***@***.***> wrote: @RonanKMcGovern <https://github.com/RonanKMcGovern> you have two options: - Simulate the bitnet ops with casting and quantization / dequantization before matmuls. - Implement the quantized kernels themselves with custom gradients Maybe you could say more about your goals here though.. in either case training will probably be a lot slower (but the first case would be way slower). — Reply to this email directly, view it on GitHub <#512 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASVG6CWKFAAFSAYIUW4D4CTYZEKBJAVCNFSM6AAAAABEBQMXQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGY3DIOBSHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

from mlx-examples.

mzbac commented on August 24, 2024

I would love to see the mlx example for BitNet as well, but I would be very cautious about using references from unofficial implementations, especially those from keygomez. Just a heads up: https://www.reddit.com/r/LocalLLaMA/comments/15spxn3/potential_scammer_on_github/

from mlx-examples.

awni commented on August 24, 2024

I don't think we are going to add this to MLX examples. It's still a bit niche. I would love to see a community contribution though!

from mlx-examples.

RonanKMcGovern commented on August 24, 2024

@awni would you be able to point me in the right direction for how I would think about doing this? Basically the key is being able to support the linear layer instead being either binary or ternary.

from mlx-examples.

awni commented on August 24, 2024

@RonanKMcGovern you have two options:

Simulate the bitnet ops with casting and quantization / dequantization before matmuls.
Implement the quantized kernels themselves with custom gradients

Maybe you could say more about your goals here though.. in either case training will probably be a lot slower (but the first case would be way slower).

from mlx-examples.

awni commented on August 24, 2024

Yea I agree with your assessment

from mlx-examples.

feature : implementing BitNet about mlx-examples HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent