Git Product home page Git Product logo

Comments (12)

awni avatar awni commented on June 29, 2024 1

Interestingly the smaller 270m model seems to work fine in 8 bit.

from mlx-examples.

QueryType avatar QueryType commented on June 29, 2024

I am just a beginner in this project, however a quick debug, reveals that the test string "hello" isn't getting encoded properly as per the tokenizer. I will need to check more.

from mlx-examples.

awni avatar awni commented on June 29, 2024

When I try to load this model for quantization, I get the following error:

  File "/Users/awnihannun/mlx-examples/llms/mlx_lm/tokenizer_utils.py", line 327, in load_tokenizer
    AutoTokenizer.from_pretrained(model_path, **tokenizer_config_extra),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/awnihannun/miniconda3/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 891, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.011986e180a41be8d1972cba11929b1174295f8c.configuration_openelm.OpenELMConfig'> to build an AutoTokenizer.

I'm wondering how you were even able to quantize it @Blaizzy ?

from mlx-examples.

Blaizzy avatar Blaizzy commented on June 29, 2024

Yeah, you get that error because none of the OpenELM models come with the tokenizer but the generate file says they use llama-2-7B tokenizer.

https://huggingface.co/apple/OpenELM-3B/blob/main/generate_openelm.py#L16

I tweeted at you about this.

https://x.com/prince_canuma/status/1783155293943214406?s=46

The solution I found was to hardcode the tokenizer name in the tokenizer_utils.

from mlx-examples.

awni avatar awni commented on June 29, 2024

So some findings:

  • The model doesn't work in float16 either, with or without quantization
  • Works in bf16 without quantization but not so well with quantization
  • Works in fp32 without quantization and sort of with quantization

Doesn't seem like a quantization issue per-se. The model seems very sensitive to precision.

@Blaizzy for these types of issues, sometimes there is a place in the model that is particularly sensitive and you can up/down cast around it. But if it's just a general issue across all layers then this will be very difficult to fix.

from mlx-examples.

awni avatar awni commented on June 29, 2024

The numbers get to be quite large and are overflowing the range of fp16. Once that happens all is lost. I think most models are trained with some regularization to keep values from getting so big. But I don't think this model will be very amenable to low precision inference.

I think you could get it to work with 8 or maybe 4 bit quantization + fp32 as the accumulation type. Right now that's not a supported option in the conversion script but it is easy to change, just set the dtype there: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/utils.py#L592

Otherwise I don't think we can do much for this model so I will close this issue as won't fix. Sorry for the not so helpful outcome.

from mlx-examples.

awni avatar awni commented on June 29, 2024

@sacmehta I'm curious if these findings make sense to you? We're finding the larger OpenELM models don't reduce precision well. Usually models are trained with some L2 to prevent weights from getting too large (so they quantize well). Maybe OpenELM was not trained that way / the regularization parameter is not high enough?

Could be useful to work on that for future versions so we can quantize the larger models.

from mlx-examples.

Blaizzy avatar Blaizzy commented on June 29, 2024

So some findings:

  • The model doesn't work in float16 either, with or without quantization
  • Works in bf16 without quantization but not so well with quantization
  • Works in fp32 without quantization and sort of with quantization

Doesn't seem like a quantization issue per-se. The model seems very sensitive to precision.

@Blaizzy for these types of issues, sometimes there is a place in the model that is particularly sensitive and you can up/down cast around it. But if it's just a general issue across all layers then this will be very difficult to fix.

Thanks for the update, this is super helpful reference for the future!

In this case it's all layers are affected. In the report they describe the technique as "layer-wise scaling strategy"

https://arxiv.org/pdf/2404.14619

from mlx-examples.

awni avatar awni commented on June 29, 2024

In the report they describe the technique as "layer-wise scaling strategy"

That's something different. That is the strategy used to determine the number of hidden units per layer.

from mlx-examples.

awni avatar awni commented on June 29, 2024

We use a weight decay of 0.1 and gradient clipping of 1.0.

I see that in the paper.. so there should be some weight decay. I don't know if the value is comparable to other models of the same size. Maybe there is something else going on.

from mlx-examples.

Blaizzy avatar Blaizzy commented on June 29, 2024

@awni

My hypothesis is that perhaps the layer scaling is not effective with larger models and that's what's breaking.

Because the only difference between 1.1B that works well when quantized and 3B is only number of layers and their scale(ffn_multipliers, num_kv_heads and num_q_heads).

from mlx-examples.

Blaizzy avatar Blaizzy commented on June 29, 2024

I don't know if the value is comparable to other models of the same size

It is, here are the values used for a comparable model in size.

Check section 4.2

https://static1.squarespace.com/static/6213c340453c3f502425776e/t/6601c5713150412edcd56f8e/1711392114564/Stable_Code_TechReport_release.pdf

from mlx-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.