Comments (15)
So, not sure if you have 13 or 13 pro. Makes a big difference since 13 has 4GB ram and 13 pro has 6gb.
My 14 pro has 6gb and what I've found works is a 7b model quantized to Q3 (K_M or K_S) like OpenHermes 2 Mistral quantized from TheBloke on huggingface. Then try Metal, MLock, and MMap all on and limit context to 1024 or 2048 to start and maybe try 3072.
It'll take a while to load the first time you type and may crash once or twice (especially at 3072) but should settle down. MLock especially forces everything into memory but makes the inference a lot faster.
But there's no way a 13b model will work or even a 7b at full 16 bit or even Q8. I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.
from llmfarm.
... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.
I've attached the ipa
file to the release. If you know how to install it, you don't have to wait for the testflight version.
from llmfarm.
... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.
I've attached the
ipa
file to the release. If you know how to install it, you don't have to wait for the testflight version.
Appreciate you having the ipa up. I tried using one of those online services to host it with an install link but then ran into it having the same name as the TestFlight version. I'm sure I could have opened the file and edited it on my computer (or killed the TestFlight version), but just ended up waiting. I guess Apple just dropped the 0.8.0, but today I confirmed the 3b models like Rocket and Zephyr are working on 0.8.1.
I just realized, I probably could also have backed up the data folder, deleted the TestFlight version, installed ipa, and copied the data back in. Next time...
from llmfarm.
I unchecked MLock and it seems to be working now (MMap is checked) with Metal checked.
from llmfarm.
... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.
I sent the new version to testflight 3 days ago, apparently due to the upcoming holidays it takes more time to test it.
from llmfarm.
So, not sure if you have 13 or 13 pro. Makes a big difference since 13 has 4GB ram and 13 pro has 6gb.
My 14 pro has 6gb and what I've found works is a 7b model quantized to Q3 (K_M or K_S) like OpenHermes 2 Mistral quantized from TheBloke on huggingface. Then try Metal, MLock, and MMap all on and limit context to 1024 or 2048 to start and maybe try 3072.
It'll take a while to load the first time you type and may crash once or twice (especially at 3072) but should settle down. MLock especially forces everything into memory but makes the inference a lot faster.
But there's no way a 13b model will work or even a 7b at full 16 bit or even Q8. I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.
Sadly mine is not 13 pro. It is 13 with 4gb ram. While I do believe that 4gb ram is a huge disadvantage but this particular issue doesnโt look like it is related to my ram as my entire phone is crashing(need to perform a hard restart)and only when metal is enabled.
I will try limiting the context and enabling mlock and nmap along with metal
from llmfarm.
Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.
from llmfarm.
... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.
I've attached the
ipa
file to the release. If you know how to install it, you don't have to wait for the testflight version.
Unfortunately I don't have any experience with ios development and ipa files. I think I will wait for the testflight version.
from llmfarm.
Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.
Is it working with other models for you ?
from llmfarm.
Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.
Is it working with other models for you ?
MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.
from llmfarm.
Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.
Is it working with other models for you ?
MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.
I have an iPhone 11 which has 4Gb RAM, same crashing issue.I turned off MLock and at least it responds now, although it took 221 seconds to respond to "hi". I don't know if it's just because it's the first time loading, or I have too many apps in the background, I'll have to experiment which still takes a long time, minutes to respond. Is there any way I can help troubleshoot this? I am a software developer by trade, but not an iOS developer sorry.
from llmfarm.
Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.
Is it working with other models for you ?
MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.
I have an iPhone 11 which has 4Gb RAM, same crashing issue.I turned off MLock and at least it responds now, although it took 221 seconds to respond to "hi". I don't know if it's just because it's the first time loading, or I have too many apps in the background, I'll have to experiment which still takes a long time, minutes to respond. Is there any way I can help troubleshoot this? I am a software developer by trade, but not an iOS developer sorry.
Are you using metal ?
Because for me, I can run the models without crashing but at a very slow speed if I disable metal. But if I enable metal it will immediately crash. (iPhone 13 4GB ram)
from llmfarm.
Are you using metal ?
Because for me, I can run the models without crashing but at a very slow speed if I disable metal. But if I enable metal it will immediately crash. (iPhone 13 4GB ram)
Yes, I leave (left) Metal on. I tried turning it off just now (with MLock still disabled) and it seems faster, just taking a really long time as before. Maybe we need a different model/quant for us poor 4 giggers? What's the recommendation?
from llmfarm.
Maybe try one of the 3b models based on Stability? Like here is quantized versions of Rocket-3b: https://huggingface.co/TheBloke/rocket-3B-GGUF
I'd start with one of the smaller ones and see how that goes (seems like Q3_K_M may fit in 4gb ram). I found for me (14 Pro) it didn't seem to matter if I used MLock or not for speed (unlike with a 7b model), but I'd experiment both ways on yours.
from llmfarm.
I don't know what fixed it but now it is working perfectly fine with good performance. I haven't tried any 7B model. But all 3B models (I have tested Q4_KM versions)and they are working perfectly with metal enabled.
It was not working few days before but I recently updated to IOS 17.2.1 and now it seems to be fine.
from llmfarm.
Related Issues (20)
- Apple Model Available HOT 2
- Few feature requests.. HOT 3
- Adding instruction for TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF HOT 1
- App crashes right away HOT 1
- It's so slow HOT 6
- Bugs and suggestions HOT 2
- Issue with LLAVA 1.5 HOT 4
- The development build will either crash or produce incorrect output content. HOT 11
- Add support for image generation models like stable diffusion HOT 1
- The version downloaded via git crashes on the physical device iPhone 14 Plus HOT 1
- Google Gemma 1.1 Support HOT 1
- Add support for Llama 3 HOT 6
- Support for Phi-3 models HOT 7
- Why not use the official Phi-3 model HOT 1
- Prompt format for Llama 3 HOT 1
- Can you add tinydolphin HOT 1
- App crashes when running fine tune on Llama-3 HOT 1
- Unable to support the qwen model well HOT 11
- Prompt format for SeaLLM model? HOT 2
- Add support for adjusting --n-predict N HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llmfarm.