Git Product home page Git Product logo

Comments (15)

ShawnFumo avatar ShawnFumo commented on August 16, 2024 2

So, not sure if you have 13 or 13 pro. Makes a big difference since 13 has 4GB ram and 13 pro has 6gb.

My 14 pro has 6gb and what I've found works is a 7b model quantized to Q3 (K_M or K_S) like OpenHermes 2 Mistral quantized from TheBloke on huggingface. Then try Metal, MLock, and MMap all on and limit context to 1024 or 2048 to start and maybe try 3072.

It'll take a while to load the first time you type and may crash once or twice (especially at 3072) but should settle down. MLock especially forces everything into memory but makes the inference a lot faster.

But there's no way a 13b model will work or even a 7b at full 16 bit or even Q8. I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

from llmfarm.

guinmoon avatar guinmoon commented on August 16, 2024 1

... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

I've attached the ipa file to the release. If you know how to install it, you don't have to wait for the testflight version.

from llmfarm.

ShawnFumo avatar ShawnFumo commented on August 16, 2024 1

... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

I've attached the ipa file to the release. If you know how to install it, you don't have to wait for the testflight version.

Appreciate you having the ipa up. I tried using one of those online services to host it with an install link but then ran into it having the same name as the TestFlight version. I'm sure I could have opened the file and edited it on my computer (or killed the TestFlight version), but just ended up waiting. I guess Apple just dropped the 0.8.0, but today I confirmed the 3b models like Rocket and Zephyr are working on 0.8.1.

I just realized, I probably could also have backed up the data folder, deleted the TestFlight version, installed ipa, and copied the data back in. Next time...

from llmfarm.

sytelus avatar sytelus commented on August 16, 2024 1

I unchecked MLock and it seems to be working now (MMap is checked) with Metal checked.

from llmfarm.

guinmoon avatar guinmoon commented on August 16, 2024

... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

I sent the new version to testflight 3 days ago, apparently due to the upcoming holidays it takes more time to test it.

from llmfarm.

rahulvk007 avatar rahulvk007 commented on August 16, 2024

So, not sure if you have 13 or 13 pro. Makes a big difference since 13 has 4GB ram and 13 pro has 6gb.

My 14 pro has 6gb and what I've found works is a 7b model quantized to Q3 (K_M or K_S) like OpenHermes 2 Mistral quantized from TheBloke on huggingface. Then try Metal, MLock, and MMap all on and limit context to 1024 or 2048 to start and maybe try 3072.

It'll take a while to load the first time you type and may crash once or twice (especially at 3072) but should settle down. MLock especially forces everything into memory but makes the inference a lot faster.

But there's no way a 13b model will work or even a 7b at full 16 bit or even Q8. I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

Sadly mine is not 13 pro. It is 13 with 4gb ram. While I do believe that 4gb ram is a huge disadvantage but this particular issue doesnโ€™t look like it is related to my ram as my entire phone is crashing(need to perform a hard restart)and only when metal is enabled.

I will try limiting the context and enabling mlock and nmap along with metal

from llmfarm.

davidmokos avatar davidmokos commented on August 16, 2024

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

from llmfarm.

rahulvk007 avatar rahulvk007 commented on August 16, 2024

... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

I've attached the ipa file to the release. If you know how to install it, you don't have to wait for the testflight version.

Unfortunately I don't have any experience with ios development and ipa files. I think I will wait for the testflight version.

from llmfarm.

rahulvk007 avatar rahulvk007 commented on August 16, 2024

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

Is it working with other models for you ?

from llmfarm.

ShawnFumo avatar ShawnFumo commented on August 16, 2024

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

Is it working with other models for you ?

MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.

from llmfarm.

simon0117 avatar simon0117 commented on August 16, 2024

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

Is it working with other models for you ?

MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.

I have an iPhone 11 which has 4Gb RAM, same crashing issue.I turned off MLock and at least it responds now, although it took 221 seconds to respond to "hi". I don't know if it's just because it's the first time loading, or I have too many apps in the background, I'll have to experiment which still takes a long time, minutes to respond. Is there any way I can help troubleshoot this? I am a software developer by trade, but not an iOS developer sorry.

from llmfarm.

rahulvk007 avatar rahulvk007 commented on August 16, 2024

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

Is it working with other models for you ?

MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.

I have an iPhone 11 which has 4Gb RAM, same crashing issue.I turned off MLock and at least it responds now, although it took 221 seconds to respond to "hi". I don't know if it's just because it's the first time loading, or I have too many apps in the background, I'll have to experiment which still takes a long time, minutes to respond. Is there any way I can help troubleshoot this? I am a software developer by trade, but not an iOS developer sorry.

Are you using metal ?

Because for me, I can run the models without crashing but at a very slow speed if I disable metal. But if I enable metal it will immediately crash. (iPhone 13 4GB ram)

from llmfarm.

simon0117 avatar simon0117 commented on August 16, 2024

Are you using metal ?

Because for me, I can run the models without crashing but at a very slow speed if I disable metal. But if I enable metal it will immediately crash. (iPhone 13 4GB ram)

Yes, I leave (left) Metal on. I tried turning it off just now (with MLock still disabled) and it seems faster, just taking a really long time as before. Maybe we need a different model/quant for us poor 4 giggers? What's the recommendation?

from llmfarm.

ShawnFumo avatar ShawnFumo commented on August 16, 2024

Maybe try one of the 3b models based on Stability? Like here is quantized versions of Rocket-3b: https://huggingface.co/TheBloke/rocket-3B-GGUF

I'd start with one of the smaller ones and see how that goes (seems like Q3_K_M may fit in 4gb ram). I found for me (14 Pro) it didn't seem to matter if I used MLock or not for speed (unlike with a 7b model), but I'd experiment both ways on yours.

from llmfarm.

rahulvk007 avatar rahulvk007 commented on August 16, 2024

I don't know what fixed it but now it is working perfectly fine with good performance. I haven't tried any 7B model. But all 3B models (I have tested Q4_KM versions)and they are working perfectly with metal enabled.

It was not working few days before but I recently updated to IOS 17.2.1 and now it seems to be fine.

from llmfarm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.