Comments (5)
Can this be related to model size? 70B?
Oh yes, it's almost certainly related to the size / amount of RAM required. There seems to be a performance cliff for very large models. It shouldn't be swapping because its still not using all the RAM on the machine but it does seem related to memory page demand. Still debugging.. this one might take a little while to iron out.
from mlx-examples.
I tested on M3 Max and got same result. with even lower power consumption and GPU frequency. Really strange.
from mlx-examples.
It's pretty slow for me also in 8-bit (and presumably 16 as well, but didn't test). Not sure why yet.
from mlx-examples.
@ivanfioravanti just curious you phrased this issue as if it used to be faster in previous MLX versions. Is that the case?
from mlx-examples.
Sorry for the delay I was out the whole week and happy to be back playing with MLX 🛝
I thought the issue was impacting all models using q8 but this is not the case. It seems issue is present with Llama 70B only.
- Mistral-7B-Instruct-v0.2 works like a charm
- mlx-community/dolphin-2.9.1-yi-1.5-34b-8bit works like a charm
Can this be related to model size? 70B? I will try comparing another large model.
from mlx-examples.
Related Issues (20)
- Package 'mlx_whisper.assets' is absent from the `packages` configuration HOT 1
- [Feature Request] Add support for logprobs to the mlx_lm server HOT 3
- Request for Example on Full Parameter and Training for LLM Model HOT 5
- Phi-3 128K Context Variants' `su` RoPE Scaling HOT 12
- [REGRESSION] Some MoE models display 0% GPU utilization with mlx-ops 0.14.0 HOT 3
- I would like to inquire about a solution to the following problem. HOT 1
- link to Phi-2 example in readme broken HOT 1
- SPMStreamingDetokenizer sometimes outputs incorrect multi-byte characters HOT 1
- Why change the module decomposition of whisper HOT 3
- A simple enhancement, in dataset creation time HOT 1
- [Question]about creating the 'adapters.npz' file HOT 3
- [QUESTION] Is there a way to provide a Huggingface access token for downloading models that are private? HOT 1
- [Model Request] Add support for IBM's Granite model HOT 2
- [Feature] Export Lora Adapters as GGML HOT 3
- Error when running inference on newly converted OpenELM MLX model, ValueError(f"Received parameters not in model: {extras}.") HOT 1
- LLMEvaluator : libc++abi: terminating due to uncaught exception of type std::invalid_argument: [matmul] Last dimension of first input with shape (1,916,2048) must match second to last dimension of second input with shape (256,32000)
- Unable to allocate memory HOT 1
- Proposal: Add mypy to .pre-commit-config.yml HOT 2
- Fusing adapters with llama3 cause bad performances HOT 11
- Struggling to convert models to MLX HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlx-examples.