Git Product home page Git Product logo

Comments (4)

awni avatar awni commented on June 25, 2024 4

A few comments / questions:

55.8/64.0GB - swap:75.9/77.0GB

  • That is a lot of RAM for a 7B model! The swap is especially concerning. How did you measure that?

  • How did you convert the model? It would be good to be sure that the dtype for the non quantized layers is fp16 and not bf16 or fp32.

  • What version of MLX are you using? We have made some improvements that will help RAM, so make sure you use the latest.

  • Try using the lora in MLX LM instead of the lora/ example. It will default compile the norms so that will slightly reduce memory requirements for the non-LoRA layers. Otherwise, it's basically the same but with a few additional features.

I don't want to compromise on the quality of the fine tune

  • I assume this means you don't want to reduce the maximum sequence length or the number of lora layers? Either would help a lot.

There are some other things we/you can do to reduce memory:

  • Checkpointing. This is an experimental / undocumented feature. But you can look at our transformer implementation to see how to use it. Will slow things down but should also reduce memory use.
  • Compile the full graph: This may slow things down especially if your input sequences vary in length, but may also substantially reduce memory use.
  • Disable caching. In the latest MLX you can disable the buffer cache (mx.set_cache_limit(0)). You have to build from source for that. I don't expect that to help much for you since usually we try to clear the cache before we start swapping..
  • We have some additional improvements on the roadmap (flash attention, avoiding copies into matmul, ...) which should help in the future.

I will plan to update the mlx lm lora to do bucketing + compilation with an option to checkpoint. But in the meantime you can experiment with those if you are comfortable digging in to the Python.

from mlx-examples.

romansky avatar romansky commented on June 25, 2024 1

Hay! thanks for the quick reply

  • I had one ssh session running asitop, its one of the measured stats there, its actually really nice for monitoring..
  • I used the import utility like so (from the model page it looks like an FP16 one)
python convert.py \
  --hf-path NurtureAI/OpenHermes-2.5-Mistral-7B-16k \
  --quantize \
  --q-bits 8
  • I have the latest (0.5) installed but was using the scripts ("./lora.py") from 0.3, re-importing and running all again to see if it's something from an older version..
  • doing that now.
  • yup, I understand using fewer layers may help but for my use-case I want the highest quality..
  • I am using checkpointing, so a question about that, if I run a session and it crashed and I run it again, does it start from the beginning assuming the model has not been past some checkpoint? (to make sure I understand what this does..)
  • will try some of this stuff and report!

from mlx-examples.

awni avatar awni commented on June 25, 2024 1

Well that's a surprise and a delight, great to hear!

I am using checkpointing, so a question about that, if I run a session and it crashed and I run it again, does it start from the beginning assuming the model has not been past some checkpoint? (to make sure I understand what this does..)

So the term "checkpointing" is overloaded. In mlx-lm you can resume from the stored adapters in the checkpoints/ directory. You have to explicitly point to the correct adapter file there when you set the flag --resume-adapter-file.

What I was referring to re checkpointing is gradient checkpointing which is a way to reduce memory use at the cost of computation. That's a totally different thing and is not currently used in any of our Lora examples.

from mlx-examples.

romansky avatar romansky commented on June 25, 2024

@awni thanks a lot, just-re ran and it went smoooth.. great success!

from mlx-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.