Git Product home page Git Product logo

llmfarm's Introduction

LLMFarm

Icon Icon

Install Stable                          Install Latest


Icon     Icon     Icon     Icon     Wiki

Icon   Icon


LLMFarm is an iOS and MacOS app to work with large language models (LLM). It allows you to load different LLMs with certain parameters.With LLMFarm, you can test the performance of different LLMs on iOS and macOS and find the most suitable model for your project.
Based on ggml and llama.cpp by Georgi Gerganov.

Also used sources from:

Features

Inferences

Multimodal

Note: For Falcon, Alpaca, GPT4All, Chinese LLaMA / Alpaca and Chinese LLaMA-2 / Alpaca-2, Vigogne (French), Vicuna, Koala, OpenBuddy (Multilingual), Pygmalion/Metharme, WizardLM, Baichuan 1 & 2 + derivations, Aquila 1 & 2, Mistral AI v0.1, Refact, Persimmon 8B, MPT, Bloom select llama inferece in model settings.

Sampling methods

Getting Started

You can find answers to some questions in the FAQ section.

Inference options

When creating a chat, a JSON file is generated in which you can specify additional inference options. The chat files are located in the "chats" directory. You can see all inference options here.

Models

You can download some of the supported models here.

Development

llmfarm_core has been moved to a separate repository. To build llmfarm, you need to clone this repository recursively:

git clone --recurse-submodules https://github.com/guinmoon/LLMFarm

llmfarm's People

Contributors

guinmoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llmfarm's Issues

AWQ support?

Hello there! Is there any possibility to get AWQ support?

Thank you and all the best

Multi line input

Would it be possible to allow entering new lines? I’m on the iPad and pressing enter/shift enter just closes the keyboard.
Thanks

The development build will either crash or produce incorrect output content.

Hi Guinmoon,

I am trying to build the LLMFarm project with Xcode on my end, but it crashes when I load many models, while a few models work successfully but produce incorrect output. Could you please help me take a look?

Here is my environment:

Device: iPhone 15 Pro, iOS 17.2
Models:
tinnyllama-1.1b (Crashes immediately)
orca-mini-3b (Produces incorrect output)
phi-2 (Crashes immediately)

I didn't modify any code; I just triggered the build and ran the project. I've confirmed that the entitlements for memory and VM are already added. I also tried several versions, like 0.9 and the latest version, but I still get the same results.

Thank you.

14 Pro Max crashes

I can't run it crashes.The phone freezes completely and after about 30 seconds the phone starts working.
image

Fine tuning does not work

Hi, I'm using LLMFarm testflight on M1 Max with 64 GB RAM. I cannot get fine-tuning to run. When I click the run button, nothing happens in the UI. Is there something I'm missing?

For reference, I'm trying to train mistral-7b-instruct-v0.2 on 1,000 rows of a *.JSONL file. I have also tried a basic *.TXT file and *.JSON array file, but neither worked. Changing the params or toggling "Metal" didn't help either.

Any ideas?

iOS 15 support

I have an older iPhone. Would it be possible to run this on a lower iOS such as iOS 15?

Request: Optimize GGUF Models for Apple Neural Engine

To help with computation time on other devices for iOS. The models should run on the Apple Neural Engine.
Combined with ANE computation and model chunking, implementing dynamic embeddings from EELBERT: Tiny Models through Dynamic Embeddings
would really optimize models for devices.
(Can make model smaller by 15x, while being within 4% of the non-optimized model benchmark for GLUE)

Another developer that has a great ANE project, you two should try and collaborate More Neural Engine Transformers.

I am not sure if this is possible in Swift, but in CoreML there is multiple types of Weight Compression. Adding this with the above technologies really gets large 34B+ parameter models on mobile devices.

  • Pruning
  • Palettization
  • Linear 8-bit Quantization
  • Mixed-Bit Palettization

Few feature requests..

Firstly, I’d love to see the ability to save the downloaded LLM’s to be saved on external drives. I tried to download an AI on my SSD but, when I loaded onto LLM Farm, it copied the file and placed it into On My iPhone >> LLM Farm >> Models. (In other words, I want it so I get to choose where the LLM is downloaded and saved instead of being fixed to LLM Farm >> Models). Secondly, I would love to see iOS Siri Shortcuts integration. Lastly, I couldn’t seem to get the Qwen AI model to work. No matter how hard I tried and how much I looked, it kept giving errors.

Few bugs with template saving

I started with the Stability 3b template and modified it for use with Zephyr. I changed the prompt and gave it 4096 context and selected Metal (but kept MLock off since it seems fine without it).

After saving that under a new template and saving the chat setting, there's different kinds of issues.

First if you re-open the chat settings, it says Custom again. And if you start typing to change the template name, it changes some of the settings (inference to llama, prompt changes back to default, context to 2048, metal off). So then you have to change them again before re-saving.

Another issue is if I try to switch from Custom on an existing or new chat to the new template I'd made: inference goes to llama, metal turns off, and temperature changes to 512 (this one really threw me off since I didn't notice at first!). But the prompt and context are correct.

Not sure if there are diff issues for diff inferences or if I had other sampling options diff from the defaults, but those are what I ran into so far.

Clear chat, edit previous messages

Hi, thanks for all your work on this project! There are cases where it'd be nice to be able to clear the entire chat history (without having to create a whole new chat from scratch) and/or edit/delete previous messages in a conversation.

Btw, there is that button at the top with a counter-clockwise circling arrow (and touching it gives a check mark), but not sure what it actually does if anything?

Edit: Or if there was a way to duplicate the current chat setup or some other way to easily have multiple threads going at once, that could help for chat management.

How to debug?

First of all, thank you
I depend on llmfarm_core.swift package.
Code

        print("Hello.")
        var input_text = "State the meaning of life."
        var modelInference:ModelInference
        var ai = AI(_modelPath: "/Users/guyanhua/llama-2-7b-chat.Q3_K_S.gguf",_chatName: "chat")
        modelInference = ModelInference.LLama_gguf
        var params:ModelAndContextParams = .default
        params.context = 4095
        params.n_threads = 14
        //
        params.use_metal = false
        
        do{
            try ai.loadModel(modelInference,contextParams: params)
            var output=""
            try ExceptionCather.catchException {
                output = try! ai.model.predict(input_text, mainCallback)
            }
            //    llama_save_session_file(ai.model.context,"/Users/guinmoon/dev/alpaca_llama_etc/dump_state.bin",ai.model.session_tokens, ai.model.session_tokens.count)
            //    llama_save_state(ai.model.context,"/Users/guinmoon/dev/alpaca_llama_etc/dump_state_.bin")
            //
            print(output)
        }catch {
            print (error)
        }

Output

Hello.
AI init
llama_model_load: error loading model: llama_model_loader: failed to load model from /Users/guyanhua/llama-2-7b-chat.Q3_K_S.gguf

llama_load_model_from_file: failed to load model
modelLoadError
modelLoadError

And model file is exist.How to debug ,can you provide some ideas, please?

Thanks!

Add list of downloadable models to the app

When I press "add model" it only lets me add one from a file.

It would be better if you had the list of models from the website in the app, because I originally thought I was supposed to go to huggingface and figure out which models can run on my device myself, which is a lot of work since I want to run the best model... and each experiment involves downloading a 3GB+ file.

You should keep the Q/K values from TheBloke's file names in the model name so that I can know which model quantization levels my phone is capable of running.

Issue with LLAVA 1.5

Hey again, I downloaded a quantized GGUF file for LLAVA 1.5 7b, what would be the prompt format and settings to turn on to use multimodal features? I can’t seem to find it no matter how hard I look.

App crashes right away

Cloned, built and ran on iPhone 15 pro with iOS 17.2 crashes right away.
No crash logs in the Logs/CoreSimulator directory

Eval error often appear when using llama 2

When I ask llama 2 after the it successfully generated an answer, or when it is in the process of generating long answers, it will always shows up “Eval error” and I need to restart the app to let the llama 2 work again.
image

Entire Phone crashes whenever use metal is enabled

I have an iPhone 13. I tried few models including llama 2 and orca.

Both immediately crashes the entire phone requiring a force restart if use metal is enabled. When use metal is disabled it will run but extremely slowly.

Bugs and suggestions

iPhone 15PM, latest LLM FARM version (1.0.0).

Start with bugs:
(Solved) 1: Downloaded mobile VLM 3b model and clip model. Connect them together, set template mobile VLM. Import picture, write in prompt describe this picture. Get correct answer. Import another picture, ask same question, get answer with mixed both answer. Expect this, because the model remember the history. But, clean the chat history with eraser icon, insert another picture, ask the same question, but again, get mixed answer with both the latest picture.

Expected behavior: it will look only for the text on the window. But it still "remember" the history. You must reload the whole model.

2: When you generate text, you long hold on the generated text, there is copy, if you click it and paste it back to the prompt line, nothing is pasted (because nothing was copied).

3: When I edit the VLM model, where I added clip, the clip model doesn't show there and the clip switch is turned off. But when you click the switch, it will load the clip model. So it's there, only the switch is bugged.

4: Same with setting template, you set something, save it, but when return, there is custom showed.

(Solved) 5: When downloading model from the list in app, I tried OpenHermes mistral 7b, select Q4_K_S from the list, but the model downloaded was Q4_K_M. I don't check other models, if there is same "problem".

6: When portrait picture is imported, it's showed like landscape picture after send it to the app. (iPhone screen shot doesn't do it, but photos from back camera do it).

Now suggestion:
1: It will be nice, when the text still generating, we can scroll up and read the beginning. Now it auto scroll and you cannot interrupt it.

Crash when running on my iPhone 14 pro max

I am using this model "ggml-model-gpt-2-117M.bin", it runs well on Mac.

But crash on iPhone

2023-07-19 18:09:20.556462+0800 LLMFarm[8938:1493311] [DocumentManager] The view service did terminate with error: Error Domain=_UIViewServiceErrorDomain Code=1 "(null)" UserInfo={Terminated=disconnect method}
ggml-model-gpt-2-117M_1689761363.json

reload

AI init
gpt_neox_model_load: loading model from '/var/mobile/Containers/Data/Application/0B2FC557-3AEA-4E95-8FC5-667C71221240/Documents/models/ggml-model-gpt-2-117M.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50257
gpt_neox_model_load: n_ctx = 1024
gpt_neox_model_load: n_embd = 768
gpt_neox_model_load: n_head = 12
gpt_neox_model_load: n_layer = 12
gpt_neox_model_load: n_rot = 1
gpt_neox_model_load: par_res = 50257
gpt_neox_model_load: model_size = UNKNOWN
gpt_neox_model_load: ftype = 1
gpt_neox_model_load: qntvr = 0
2023-07-19 18:09:29.353317+0800 LLMFarm[8938:1493518] [ServicesDaemonManager] interruptionHandler is called. -[FontServicesDaemonManager connection]_block_invoke
2023-07-19 18:09:29.354747+0800 LLMFarm[8938:1493518] [xpc] <PKDaemonClient: 0x2834fdcc0>: XPC error talking to pkd: Connection interrupted

Additional Support for Vision Based Models Like Llava?

I've attempted to get this working with the out-of-the-box Llava models providing local URLs, remote URL's, and base64 encoded images to no avail. The model runs and chats but not sure how best to feed images to it...

Request: Add support for Qwen models

Hello,

As more people jump on board with using and developing Qwen's open-source models, we've seen a bunch of variants popping up. I think it'd be really cool if this project could support them. Right now, there's a lot of buzz around a variant from a Japanese LLM developer named Rinna, specifically the Nekomata-7b/14b based on Qwen, and it would be cool if this works on mobile devices easily.

I'm not totally sure how tough it would be to add this, but Qwen's already up and running in the original llama.cpp repo here, so this might help a bit.

(Also, English isn't my first language, so sorry for any odd bits🙏)

Thank you!

Performance/benchmarks

Hey, first of all, great repo! ESP the TestFlight part so that we can test it without having to download the dev env. Ie. I downloaded the app and had Llama2 7B running in a minute! Kudos 👏

My daily driver is quite weak(iPhone 12 mini), so the inference speed is something like 5-10 seconds per word.

I'm curious if you have a performance benchmark for each models for the different phones. Could be as simple as a video demonstration.

That would greatly help developers/designers build an intuition as to what products would be feasible at the moment!

Apple Model Available

Is anyone willing to list the Apple models and corresponding hardware details (e.g. RAM) that can run the listed LLM models using LLM Farm? If so that would be great.

Many thanks!

Small bugs/unexpected behavior and feedback with suggestions

Hello Artem. First, I want to thank you for the great app! Exactly what I'm searching for last week! It's cool we can import gguf models and you let us customize some settings!

My device: iPhone 15PM, latest iOS.

For bugs,

I found in the last beta 0.9.0 there is problem with closing keyboard. When you write and model answers, the keyboard doesn't close and you can't even close it with swipe down gesture.

When I create few chats and after delete them with swipe to the left, they appear after go to settings and back. I test it with only one model/chat. Create it, after swipe it to delete and open settings and it will appear back. But when you force close the app, it will be clean without the deleted chat.

And now some suggestion.

It will be nice to have settings to turn off text scrolling, so you can read the beginning of the text when the app still generating.

I think it will be nice to see (or have option to turn on) memory usage on screen when generating, you know to see if we reach maximum device memory.

In the setting template, add more settings, for example ChatML, Alpaca, Vicuna, CodeLlama (I only check few from TheBloke.

Maybe have some description for all the settings(like click small question mark to get what the settings do), you know, for inexperienced users.

The clean chat button can be in the chat interface, for easy access.

And now few suggestions. I get inspiration from other apps (not in Apple Store.

It will be nice when there will be toggle to switch between Instruct mode, story mode, chat mode and maybe some adventure mode. I get lot of inspiration from
https://github.com/LostRuins/koboldcpp
https://github.com/SillyTavern/SillyTavern

It will be nice when we can customize the AI bot answers. You can check for inspiration
https://github.com/KoboldAI/KoboldAI-Client/wiki/Memory,-Author's-Note-and-World-Info
Here you can set these things to teach the model how to speak with you. You can select different scenarios, so he can act like chatbot or other, like I wrote above (instruct, chat, story, adventure). Check the memory, authors note etc. So you can create character like in SillyTavern. It will be super cool and you already provide most of the settings there in your app.

You can check their horde website, look for scenarios under sandwich menu and memory above the chat input. This works great and you can customize the AI for whatever you want!
https://lite.koboldai.net/#

I want to thank you for your app, it’s already great, I like the customization and custom model import!

And sorry for long message.

And if you are curious about image generation, there is IOS/MacOS app called Draw Things, where I’m helping the developer with his discord server. It’s free and have lot of customization, model importing, Lora training, controlNET, infinite canvas etc. I wish there will be app like this for LLM text generation for iOS! And your isn’t so far!

https://apps.apple.com/cz/app/draw-things-ai-generation/id6444050820?l=cs

I'm sorry I don't add labels🙄. Bug, enhancement

Dolphin models problem (modelLoadError)

They just doesn't seems to work, so far tested phi 2 dolphin and mistral 7b dolphin and it don't work. Maybe I'm just dumb, if that's the case can someone give me a step by step guide on how to use phi 2 dolphin ? (I'm using TestFlight version and iPhone XR)

Crash when running with RWKV 5 (Raven)

When trying to load Raven (RWKV 5), the application crashes and exits.

There are no error messages displayed.

Is it possible simply that the RWKV package needs updating?

Crashes on macOS with Llama2 model

Let me know if you don't have access to the crash logs via TestFlight. Latest stable macOS release, your Llama2 7B model file, default settings. After chatting a few times in a row, it crashes.

Sampling stuck in greedy

The sampling is reverted back to greedy if I try to change it. Device: iPhone 15 pro max

In greedy mode the app is unusable. The llm keeps inferencing.

Always crash at 2nd user input

iPhone 13 Pro
iOS 16.3.1
Orca mini 3B downloaded from readme link

Tried different params, use metal or not, always crash. Creating a new chat for each input works, but 2nd input in same chat will crash.

How to install on an iPhone?

Hi, this looks like an interesting project! However, not entirely sure how to install this on an iOS device. I tried via diawi.com but it complained about not being able to verify integrity, and tutorials I found referenced profiles, but I didn't find any in the device management tab in Settings.
Any idea what went wrong?

can't compile in xcode

Assertion failed: (extras.otherInstrOffset != 0 && "Kind::arm64_adrp_ldr missing extra info"), function applyFixup, file Fixup.cpp, line 793.

Environment:
Xcode Version 15.0.1 (15A507)
iOS 16 SDK
iPhone 14pro
Macbook Air M2
MacOS 14.1.2 (23B92)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.