Comments (4)
A few comments / questions:
55.8/64.0GB - swap:75.9/77.0GB
-
That is a lot of RAM for a 7B model! The swap is especially concerning. How did you measure that?
-
How did you convert the model? It would be good to be sure that the dtype for the non quantized layers is fp16 and not bf16 or fp32.
-
What version of MLX are you using? We have made some improvements that will help RAM, so make sure you use the latest.
-
Try using the lora in MLX LM instead of the
lora/
example. It will default compile the norms so that will slightly reduce memory requirements for the non-LoRA layers. Otherwise, it's basically the same but with a few additional features.
I don't want to compromise on the quality of the fine tune
- I assume this means you don't want to reduce the maximum sequence length or the number of lora layers? Either would help a lot.
There are some other things we/you can do to reduce memory:
- Checkpointing. This is an experimental / undocumented feature. But you can look at our transformer implementation to see how to use it. Will slow things down but should also reduce memory use.
- Compile the full graph: This may slow things down especially if your input sequences vary in length, but may also substantially reduce memory use.
- Disable caching. In the latest MLX you can disable the buffer cache (
mx.set_cache_limit(0)
). You have to build from source for that. I don't expect that to help much for you since usually we try to clear the cache before we start swapping.. - We have some additional improvements on the roadmap (flash attention, avoiding copies into matmul, ...) which should help in the future.
I will plan to update the mlx lm lora to do bucketing + compilation with an option to checkpoint. But in the meantime you can experiment with those if you are comfortable digging in to the Python.
from mlx-examples.
Hay! thanks for the quick reply
- I had one ssh session running asitop, its one of the measured stats there, its actually really nice for monitoring..
- I used the import utility like so (from the model page it looks like an FP16 one)
python convert.py \
--hf-path NurtureAI/OpenHermes-2.5-Mistral-7B-16k \
--quantize \
--q-bits 8
- I have the latest (0.5) installed but was using the scripts ("./lora.py") from 0.3, re-importing and running all again to see if it's something from an older version..
- doing that now.
- yup, I understand using fewer layers may help but for my use-case I want the highest quality..
- I am using checkpointing, so a question about that, if I run a session and it crashed and I run it again, does it start from the beginning assuming the model has not been past some checkpoint? (to make sure I understand what this does..)
- will try some of this stuff and report!
from mlx-examples.
Well that's a surprise and a delight, great to hear!
I am using checkpointing, so a question about that, if I run a session and it crashed and I run it again, does it start from the beginning assuming the model has not been past some checkpoint? (to make sure I understand what this does..)
So the term "checkpointing" is overloaded. In mlx-lm
you can resume from the stored adapters in the checkpoints/
directory. You have to explicitly point to the correct adapter file there when you set the flag --resume-adapter-file
.
What I was referring to re checkpointing is gradient checkpointing which is a way to reduce memory use at the cost of computation. That's a totally different thing and is not currently used in any of our Lora examples.
from mlx-examples.
@awni thanks a lot, just-re ran and it went smoooth.. great success!
from mlx-examples.
Related Issues (20)
- Error 'Received invalid kth 2along axis -1 for array with shape: (1,2)' when generate using mixtral model with MLX format 4 quantized. HOT 7
- Inferencing with adapter vs Inferencing with fused model HOT 3
- Running python -m mlx_lm.fuse - Resulting in this error: IndexError: list index out of range HOT 2
- LoRA vs. QLoRA performance comparison HOT 5
- feature : implementing BitNet HOT 6
- Reinforcement Learning from Human Feedback (RLHF) examples: Direct Preference Optimization (DPO) HOT 7
- Using token sequences as stop criteria does not work in mlx_lm HOT 7
- Suggestion for a real world example HOT 1
- Issue with fetch_from_hub tokenizer for fine-tuned models HOT 4
- Is validation data necessary? HOT 4
- can not run gguf HOT 3
- fix issues with `config.json` HOT 2
- Question: run lora in mlx, output to .gguf HOT 1
- inference speed of mlx is slower than LM Studio HOT 3
- Add a GitHub workflow to allow contributors to self-assign issues HOT 1
- train mistral 7b err, anyone help?
- Ability to convert a lora_fused_model to gguf format for use in LMStudio and others HOT 11
- Convert adds an additional token (= token missmatch to the base model) HOT 1
- FreedomIntelligence/Apollo-0.5B failed to run HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlx-examples.