Hi, Thanks everyone working on this awesome lib! I

Hay! thanks for the quick reply I had one ssh session running

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Panic while fine-tuning with LORA about mlx-examples HOT 4 CLOSED

romansky commented on August 24, 2024

Panic while fine-tuning with LORA

from mlx-examples.

Comments (4)

awni commented on August 24, 2024 4

A few comments / questions:

55.8/64.0GB - swap:75.9/77.0GB

That is a lot of RAM for a 7B model! The swap is especially concerning. How did you measure that?
How did you convert the model? It would be good to be sure that the dtype for the non quantized layers is fp16 and not bf16 or fp32.
What version of MLX are you using? We have made some improvements that will help RAM, so make sure you use the latest.
Try using the lora in MLX LM instead of the lora/ example. It will default compile the norms so that will slightly reduce memory requirements for the non-LoRA layers. Otherwise, it's basically the same but with a few additional features.

I don't want to compromise on the quality of the fine tune

I assume this means you don't want to reduce the maximum sequence length or the number of lora layers? Either would help a lot.

There are some other things we/you can do to reduce memory:

Checkpointing. This is an experimental / undocumented feature. But you can look at our transformer implementation to see how to use it. Will slow things down but should also reduce memory use.
Compile the full graph: This may slow things down especially if your input sequences vary in length, but may also substantially reduce memory use.
Disable caching. In the latest MLX you can disable the buffer cache (mx.set_cache_limit(0)). You have to build from source for that. I don't expect that to help much for you since usually we try to clear the cache before we start swapping..
We have some additional improvements on the roadmap (flash attention, avoiding copies into matmul, ...) which should help in the future.

I will plan to update the mlx lm lora to do bucketing + compilation with an option to checkpoint. But in the meantime you can experiment with those if you are comfortable digging in to the Python.

from mlx-examples.

romansky commented on August 24, 2024 1

Hay! thanks for the quick reply

I had one ssh session running asitop, its one of the measured stats there, its actually really nice for monitoring..
I used the import utility like so (from the model page it looks like an FP16 one)

python convert.py \
  --hf-path NurtureAI/OpenHermes-2.5-Mistral-7B-16k \
  --quantize \
  --q-bits 8

I have the latest (0.5) installed but was using the scripts ("./lora.py") from 0.3, re-importing and running all again to see if it's something from an older version..
doing that now.
yup, I understand using fewer layers may help but for my use-case I want the highest quality..
I am using checkpointing, so a question about that, if I run a session and it crashed and I run it again, does it start from the beginning assuming the model has not been past some checkpoint? (to make sure I understand what this does..)
will try some of this stuff and report!

from mlx-examples.

awni commented on August 24, 2024 1

Well that's a surprise and a delight, great to hear!

I am using checkpointing, so a question about that, if I run a session and it crashed and I run it again, does it start from the beginning assuming the model has not been past some checkpoint? (to make sure I understand what this does..)

So the term "checkpointing" is overloaded. In mlx-lm you can resume from the stored adapters in the checkpoints/ directory. You have to explicitly point to the correct adapter file there when you set the flag --resume-adapter-file.

What I was referring to re checkpointing is gradient checkpointing which is a way to reduce memory use at the cost of computation. That's a totally different thing and is not currently used in any of our Lora examples.

from mlx-examples.

romansky commented on August 24, 2024

@awni thanks a lot, just-re ran and it went smoooth.. great success!

from mlx-examples.

Panic while fine-tuning with LORA about mlx-examples HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent