guinmoon / llmfarm Goto Github PK

View Code? Open in Web Editor NEW

1.0K 14.0 62.0 11.28 MB

llama and other large language models on iOS and MacOS offline using GGML library.

Home Page: https://llmfarm.site

License: MIT License

Swift 99.94% Shell 0.06%

ai ggml gpt-2 gptneox ios llama macos swift starcoder rwkv

llmfarm's Introduction

llmfarm's People

Contributors

Stargazers

Watchers

llmfarm's Issues

Can the model get information from the outside?

How can the model get information from the internet or some other source, like pdf on the iPhone?

Crashed on IPad Pro M1 running iPadOS 17 when asking llama 2 to write an 600 word essay

AWQ support?

Hello there! Is there any possibility to get AWQ support?

Thank you and all the best

Multi line input

Would it be possible to allow entering new lines? I’m on the iPad and pressing enter/shift enter just closes the keyboard.
Thanks

The development build will either crash or produce incorrect output content.

Hi Guinmoon,

I am trying to build the LLMFarm project with Xcode on my end, but it crashes when I load many models, while a few models work successfully but produce incorrect output. Could you please help me take a look?

Here is my environment:

Device: iPhone 15 Pro, iOS 17.2
Models:
tinnyllama-1.1b (Crashes immediately)
orca-mini-3b (Produces incorrect output)
phi-2 (Crashes immediately)

I didn't modify any code; I just triggered the build and ran the project. I've confirmed that the entitlements for memory and VM are already added. I also tried several versions, like 0.9 and the latest version, but I still get the same results.

Thank you.

14 Pro Max crashes

I can't run it crashes.The phone freezes completely and after about 30 seconds the phone starts working.

stablelm-zephyr-3b-GGUF model fails to load

https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF

Both on iphone and mac, so it must be a problem with the program

no descriptive error message just "model failed to load"

Mistral 7b works fine.

Will LLMFarm support llama2?

When I send out my first prompt to llm after opening the app, The app will freeze for like 5-10 seconds before llm generate the response.

Fine tuning does not work

Hi, I'm using LLMFarm testflight on M1 Max with 64 GB RAM. I cannot get fine-tuning to run. When I click the run button, nothing happens in the UI. Is there something I'm missing?

For reference, I'm trying to train mistral-7b-instruct-v0.2 on 1,000 rows of a *.JSONL file. I have also tried a basic *.TXT file and *.JSON array file, but neither worked. Changing the params or toggling "Metal" didn't help either.

Any ideas?

iOS 15 support

I have an older iPhone. Would it be possible to run this on a lower iOS such as iOS 15?

14 Pro Max crashes

I can't run it crashes.

Request: Optimize GGUF Models for Apple Neural Engine

To help with computation time on other devices for iOS. The models should run on the Apple Neural Engine.
Combined with ANE computation and model chunking, implementing dynamic embeddings from EELBERT: Tiny Models through Dynamic Embeddings
would really optimize models for devices.
(Can make model smaller by 15x, while being within 4% of the non-optimized model benchmark for GLUE)

Another developer that has a great ANE project, you two should try and collaborate More Neural Engine Transformers.

I am not sure if this is possible in Swift, but in CoreML there is multiple types of Weight Compression. Adding this with the above technologies really gets large 34B+ parameter models on mobile devices.

Pruning
Palettization
Linear 8-bit Quantization
Mixed-Bit Palettization

The latest code simulator crash

Nice project.

Env

iphone 15 iOS 17.0
M1 max
xcode Version 15.2 (15C500b)

Log

log.txt

Integrate whisper.cpp

would be so cool to transcribe audio on an iDevice much better than with Siri.

Few feature requests..

Firstly, I’d love to see the ability to save the downloaded LLM’s to be saved on external drives. I tried to download an AI on my SSD but, when I loaded onto LLM Farm, it copied the file and placed it into On My iPhone >> LLM Farm >> Models. (In other words, I want it so I get to choose where the LLM is downloaded and saved instead of being fixed to LLM Farm >> Models). Secondly, I would love to see iOS Siri Shortcuts integration. Lastly, I couldn’t seem to get the Qwen AI model to work. No matter how hard I tried and how much I looked, it kept giving errors.

Few bugs with template saving

I started with the Stability 3b template and modified it for use with Zephyr. I changed the prompt and gave it 4096 context and selected Metal (but kept MLock off since it seems fine without it).

After saving that under a new template and saving the chat setting, there's different kinds of issues.

First if you re-open the chat settings, it says Custom again. And if you start typing to change the template name, it changes some of the settings (inference to llama, prompt changes back to default, context to 2048, metal off). So then you have to change them again before re-saving.

Another issue is if I try to switch from Custom on an existing or new chat to the new template I'd made: inference goes to llama, metal turns off, and temperature changes to 512 (this one really threw me off since I didn't notice at first!). But the prompt and context are correct.

Not sure if there are diff issues for diff inferences or if I had other sampling options diff from the defaults, but those are what I ran into so far.

Clear chat, edit previous messages

Hi, thanks for all your work on this project! There are cases where it'd be nice to be able to clear the entire chat history (without having to create a whole new chat from scratch) and/or edit/delete previous messages in a conversation.

Btw, there is that button at the top with a counter-clockwise circling arrow (and touching it gives a check mark), but not sure what it actually does if anything?

Edit: Or if there was a way to duplicate the current chat setup or some other way to easily have multiple threads going at once, that could help for chat management.

How to debug?

First of all, thank you
I depend on llmfarm_core.swift package.
Code

        print("Hello.")
        var input_text = "State the meaning of life."
        var modelInference:ModelInference
        var ai = AI(_modelPath: "/Users/guyanhua/llama-2-7b-chat.Q3_K_S.gguf",_chatName: "chat")
        modelInference = ModelInference.LLama_gguf
        var params:ModelAndContextParams = .default
        params.context = 4095
        params.n_threads = 14
        //
        params.use_metal = false
        
        do{
            try ai.loadModel(modelInference,contextParams: params)
            var output=""
            try ExceptionCather.catchException {
                output = try! ai.model.predict(input_text, mainCallback)
            }
            //    llama_save_session_file(ai.model.context,"/Users/guinmoon/dev/alpaca_llama_etc/dump_state.bin",ai.model.session_tokens, ai.model.session_tokens.count)
            //    llama_save_state(ai.model.context,"/Users/guinmoon/dev/alpaca_llama_etc/dump_state_.bin")
            //
            print(output)
        }catch {
            print (error)
        }

Output

Hello.
AI init
llama_model_load: error loading model: llama_model_loader: failed to load model from /Users/guyanhua/llama-2-7b-chat.Q3_K_S.gguf

llama_load_model_from_file: failed to load model
modelLoadError
modelLoadError

And model file is exist.How to debug ,can you provide some ideas, please?

Thanks!

Add list of downloadable models to the app

When I press "add model" it only lets me add one from a file.

It would be better if you had the list of models from the website in the app, because I originally thought I was supposed to go to huggingface and figure out which models can run on my device myself, which is a lot of work since I want to run the best model... and each experiment involves downloading a 3GB+ file.

You should keep the Q/K values from TheBloke's file names in the model name so that I can know which model quantization levels my phone is capable of running.

Issue with LLAVA 1.5

Hey again, I downloaded a quantized GGUF file for LLAVA 1.5 7b, what would be the prompt format and settings to turn on to use multimodal features? I can’t seem to find it no matter how hard I look.

App crashes right away

Cloned, built and ran on iPhone 15 pro with iOS 17.2 crashes right away.
No crash logs in the Logs/CoreSimulator directory

Eval error often appear when using llama 2

When I ask llama 2 after the it successfully generated an answer, or when it is in the process of generating long answers, it will always shows up “Eval error” and I need to restart the app to let the llama 2 work again.

It's so slow

LLMFarm speed:

Jan speed:

Phi-2 does not work

Please update llama.cpp to version b1658 or any other release after commit b9e74f9. This should allow Microsoft's Phi-2 2.7B model to work.

Entire Phone crashes whenever use metal is enabled

I have an iPhone 13. I tried few models including llama 2 and orca.

Both immediately crashes the entire phone requiring a force restart if use metal is enabled. When use metal is disabled it will run but extremely slowly.

Adding instruction for TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF

Does anyone know the proper way to add instruction to a tinyllama model using LLM Farm?

My current method is this (image), but the output seems to be nonsense.

Adding support for Mistral

I just want to say I love what you are doing and I'm happy I stumbled upon this repo.

Is it possible to add support for Mistral AI?

https://huggingface.co/mistralai/Mistral-7B-v0.1

Bugs and suggestions

iPhone 15PM, latest LLM FARM version (1.0.0).

Start with bugs:
(Solved) 1: Downloaded mobile VLM 3b model and clip model. Connect them together, set template mobile VLM. Import picture, write in prompt describe this picture. Get correct answer. Import another picture, ask same question, get answer with mixed both answer. Expect this, because the model remember the history. But, clean the chat history with eraser icon, insert another picture, ask the same question, but again, get mixed answer with both the latest picture.

Expected behavior: it will look only for the text on the window. But it still "remember" the history. You must reload the whole model.

2: When you generate text, you long hold on the generated text, there is copy, if you click it and paste it back to the prompt line, nothing is pasted (because nothing was copied).

3: When I edit the VLM model, where I added clip, the clip model doesn't show there and the clip switch is turned off. But when you click the switch, it will load the clip model. So it's there, only the switch is bugged.

4: Same with setting template, you set something, save it, but when return, there is custom showed.

(Solved) 5: When downloading model from the list in app, I tried OpenHermes mistral 7b, select Q4_K_S from the list, but the model downloaded was Q4_K_M. I don't check other models, if there is same "problem".

6: When portrait picture is imported, it's showed like landscape picture after send it to the app. (iPhone screen shot doesn't do it, but photos from back camera do it).

Now suggestion:
1: It will be nice, when the text still generating, we can scroll up and read the beginning. Now it auto scroll and you cannot interrupt it.

How to quantize and built with Metal support for this model?

https://huggingface.co/openllmplayground/openalpaca_3b_600bt_preview

Crash when running on my iPhone 14 pro max

I am using this model "ggml-model-gpt-2-117M.bin", it runs well on Mac.

But crash on iPhone

2023-07-19 18:09:20.556462+0800 LLMFarm[8938:1493311] [DocumentManager] The view service did terminate with error: Error Domain=_UIViewServiceErrorDomain Code=1 "(null)" UserInfo={Terminated=disconnect method}
ggml-model-gpt-2-117M_1689761363.json

reload

AI init
gpt_neox_model_load: loading model from '/var/mobile/Containers/Data/Application/0B2FC557-3AEA-4E95-8FC5-667C71221240/Documents/models/ggml-model-gpt-2-117M.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50257
gpt_neox_model_load: n_ctx = 1024
gpt_neox_model_load: n_embd = 768
gpt_neox_model_load: n_head = 12
gpt_neox_model_load: n_layer = 12
gpt_neox_model_load: n_rot = 1
gpt_neox_model_load: par_res = 50257
gpt_neox_model_load: model_size = UNKNOWN
gpt_neox_model_load: ftype = 1
gpt_neox_model_load: qntvr = 0
2023-07-19 18:09:29.353317+0800 LLMFarm[8938:1493518] [ServicesDaemonManager] interruptionHandler is called. -[FontServicesDaemonManager connection]_block_invoke
2023-07-19 18:09:29.354747+0800 LLMFarm[8938:1493518] [xpc] <PKDaemonClient: 0x2834fdcc0>: XPC error talking to pkd: Connection interrupted

Token to string issue FYI

You may be interested in this: ggerganov/llama.cpp#4325

Request: user-configurable system prompts

Additional Support for Vision Based Models Like Llava?

I've attempted to get this working with the out-of-the-box Llava models providing local URLs, remote URL's, and base64 encoded images to no avail. The model runs and chats but not sure how best to feed images to it...

Add support for image generation models like stable diffusion

The title says it all. When I tried loading Stable Diffusion GGUF file into LLM farm and asking for prompt, it immediately crashes, even though my iPhone 15 Pro should have enough ram to handle it

Request: Add support for Qwen models

Hello,

As more people jump on board with using and developing Qwen's open-source models, we've seen a bunch of variants popping up. I think it'd be really cool if this project could support them. Right now, there's a lot of buzz around a variant from a Japanese LLM developer named Rinna, specifically the Nekomata-7b/14b based on Qwen, and it would be cool if this works on mobile devices easily.

I'm not totally sure how tough it would be to add this, but Qwen's already up and running in the original llama.cpp repo here, so this might help a bit.

(Also, English isn't my first language, so sorry for any odd bits🙏)

Thank you!

Performance/benchmarks

Hey, first of all, great repo! ESP the TestFlight part so that we can test it without having to download the dev env. Ie. I downloaded the app and had Llama2 7B running in a minute! Kudos 👏

My daily driver is quite weak(iPhone 12 mini), so the inference speed is something like 5-10 seconds per word.

I'm curious if you have a performance benchmark for each models for the different phones. Could be as simple as a video demonstration.

That would greatly help developers/designers build an intuition as to what products would be feasible at the moment!

Apple Model Available

Is anyone willing to list the Apple models and corresponding hardware details (e.g. RAM) that can run the listed LLM models using LLM Farm? If so that would be great.

Many thanks!

Request: Add Multimodal Support For Llava or Minigpt

Title

Small bugs/unexpected behavior and feedback with suggestions

Hello Artem. First, I want to thank you for the great app! Exactly what I'm searching for last week! It's cool we can import gguf models and you let us customize some settings!

My device: iPhone 15PM, latest iOS.

For bugs,

I found in the last beta 0.9.0 there is problem with closing keyboard. When you write and model answers, the keyboard doesn't close and you can't even close it with swipe down gesture.

When I create few chats and after delete them with swipe to the left, they appear after go to settings and back. I test it with only one model/chat. Create it, after swipe it to delete and open settings and it will appear back. But when you force close the app, it will be clean without the deleted chat.

And now some suggestion.

It will be nice to have settings to turn off text scrolling, so you can read the beginning of the text when the app still generating.

I think it will be nice to see (or have option to turn on) memory usage on screen when generating, you know to see if we reach maximum device memory.

In the setting template, add more settings, for example ChatML, Alpaca, Vicuna, CodeLlama (I only check few from TheBloke.

Maybe have some description for all the settings(like click small question mark to get what the settings do), you know, for inexperienced users.

The clean chat button can be in the chat interface, for easy access.

And now few suggestions. I get inspiration from other apps (not in Apple Store.

It will be nice when there will be toggle to switch between Instruct mode, story mode, chat mode and maybe some adventure mode. I get lot of inspiration from
https://github.com/LostRuins/koboldcpp
https://github.com/SillyTavern/SillyTavern

It will be nice when we can customize the AI bot answers. You can check for inspiration
https://github.com/KoboldAI/KoboldAI-Client/wiki/Memory,-Author's-Note-and-World-Info
Here you can set these things to teach the model how to speak with you. You can select different scenarios, so he can act like chatbot or other, like I wrote above (instruct, chat, story, adventure). Check the memory, authors note etc. So you can create character like in SillyTavern. It will be super cool and you already provide most of the settings there in your app.

You can check their horde website, look for scenarios under sandwich menu and memory above the chat input. This works great and you can customize the AI for whatever you want!
https://lite.koboldai.net/#

I want to thank you for your app, it’s already great, I like the customization and custom model import!

And sorry for long message.

And if you are curious about image generation, there is IOS/MacOS app called Draw Things, where I’m helping the developer with his discord server. It’s free and have lot of customization, model importing, Lora training, controlNET, infinite canvas etc. I wish there will be app like this for LLM text generation for iOS! And your isn’t so far!

https://apps.apple.com/cz/app/draw-things-ai-generation/id6444050820?l=cs

I'm sorry I don't add labels🙄. Bug, enhancement

Crashes on iPhone app via Testflight with Dolly v2 and Pythia model

Hi,
Thanks for the code. I downloaded the app via Testflight on my iPhone device (iPhone 14Pro, iOS 17).
I tried to chat with Models Dolly v2 and Pythia, but the app crashed after asking 'How is the weather today?'.
Attaching the chat and history json here.
dolly-v2-3b-ggml_v3-q5_1_1695278118.txt

dolly-v2-3b-ggml_v3-q5_1_1695278118.txt

pythia-70m-ggml_v3-q5_1_1695277588.txt

Dolphin models problem (modelLoadError)

They just doesn't seems to work, so far tested phi 2 dolphin and mistral 7b dolphin and it don't work. Maybe I'm just dumb, if that's the case can someone give me a step by step guide on how to use phi 2 dolphin ? (I'm using TestFlight version and iPhone XR)

Crash when running with RWKV 5 (Raven)

When trying to load Raven (RWKV 5), the application crashes and exits.

There are no error messages displayed.

Is it possible simply that the RWKV package needs updating?

Crashes on macOS with Llama2 model

Let me know if you don't have access to the crash logs via TestFlight. Latest stable macOS release, your Llama2 7B model file, default settings. After chatting a few times in a row, it crashes.

Sampling stuck in greedy

The sampling is reverted back to greedy if I try to change it. Device: iPhone 15 pro max

In greedy mode the app is unusable. The llm keeps inferencing.

Always crash at 2nd user input

iPhone 13 Pro
iOS 16.3.1
Orca mini 3B downloaded from readme link

Tried different params, use metal or not, always crash. Creating a new chat for each input works, but 2nd input in same chat will crash.

How to install on an iPhone?

Hi, this looks like an interesting project! However, not entirely sure how to install this on an iOS device. I tried via diawi.com but it complained about not being able to verify integrity, and tutorials I found referenced profiles, but I didn't find any in the device management tab in Settings.
Any idea what went wrong?

guinmoon / llmfarm Goto Github PK

llmfarm's Introduction

LLMFarm

Features

Inferences

Multimodal

Sampling methods

Getting Started

Inference options

Models

Development

llmfarm's People

Contributors

Stargazers

Watchers

Forkers

llmfarm's Issues

Env

Log

Recommend Projects

Recommend Topics

Recommend Org