Comments (9)
Seems to be related to gradient checkpointing but that's all I know so far.. turning off gradient checkpointing and the peak memory is constant for a fixed sequence length as expected.
from mlx-examples.
@mzbac, a bit off topic and please let me know if I'm breaking some rules so I'll delete the comment.
But can you share an example of your dataset used for finetuning llama3 instruct? I am a bit confued by its template.
Thanks
@Satyam7166-tech You can check the data here at https://huggingface.co/datasets/mzbac/function-calling-llama-3-format-v1.1/viewer and feel free to create a discussion there if you have any questions about the dataset.
from mlx-examples.
@awni I can confirm that the issue is related to gradient checkpointing. With gradient checkpointing disabled, I trained the model for 8000 iterations and the peak memory is consistent and works as expected.
from mlx-examples.
Just FYI, the leak should be fixed now with gradient checkpointing enabled.
from mlx-examples.
Seems to be related to gradient checkpointing but that's all I know so far.. turning off gradient checkpointing and the peak memory is constant for a fixed sequence length as expected.
After disabling gradient checkpointing, the peak memory has stabilized. I will continue training for a few thousand iterations to see if everything is running as expected. Thanks @awni 🚀
from mlx-examples.
Thanks for the fix!
from mlx-examples.
Very strange .. seems like there is a leak of some sort.
from mlx-examples.
and the crash logs:
Iter 7320: Train loss 0.216, Learning Rate 1.000e-06, It/sec 0.116, Tokens/sec 40.214, Trained Tokens 3893259, Peak mem 342.467 GB
libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 554319936 bytes.
zsh: abort mlx_lm.lora --config lora_config.yaml
from mlx-examples.
@mzbac, a bit off topic and please let me know if I'm breaking some rules so I'll delete the comment.
But can you share an example of your dataset used for finetuning llama3 instruct? I am a bit confued by its template.
Thanks
from mlx-examples.
Related Issues (20)
- [Feature Request] Add support for logprobs to the mlx_lm server HOT 3
- Request for Example on Full Parameter and Training for LLM Model HOT 5
- Phi-3 128K Context Variants' `su` RoPE Scaling HOT 12
- [REGRESSION] Some MoE models display 0% GPU utilization with mlx-ops 0.14.0 HOT 3
- I would like to inquire about a solution to the following problem. HOT 1
- link to Phi-2 example in readme broken HOT 1
- SPMStreamingDetokenizer sometimes outputs incorrect multi-byte characters HOT 1
- Why change the module decomposition of whisper HOT 3
- A simple enhancement, in dataset creation time HOT 1
- [Question]about creating the 'adapters.npz' file HOT 3
- [QUESTION] Is there a way to provide a Huggingface access token for downloading models that are private? HOT 1
- [Model Request] Add support for IBM's Granite model HOT 2
- [Feature] Export Lora Adapters as GGML HOT 3
- Error when running inference on newly converted OpenELM MLX model, ValueError(f"Received parameters not in model: {extras}.") HOT 1
- LLMEvaluator : libc++abi: terminating due to uncaught exception of type std::invalid_argument: [matmul] Last dimension of first input with shape (1,916,2048) must match second to last dimension of second input with shape (256,32000)
- Unable to allocate memory
- Proposal: Add mypy to .pre-commit-config.yml HOT 2
- Fusing adapters with llama3 cause bad performances HOT 3
- Struggling to convert models to MLX HOT 2
- mlx_lm stops generating HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlx-examples.