Comments (7)
development version
from llmfarm.
Make sure Metal=on, BOS=on, EOS=off. And try setting contextsize=1024. I got 8-9 Tok/sec.
Officially phi3 is only supported starting with llama.cpp release b2717. The latest LLMFarm commit uses b2692. The Testflight version uses b2135 which officially supports only phi2.
from llmfarm.
Hi. work normal with this template
<|user|>
{{prompt}}<|end|>
<|assistant|>
And BOS option enabled.
from llmfarm.
Hi. How can I make it generate until EOS? If I select the option, the app crashes.
from llmfarm.
Hi. work normal with this template
<|user|> {{prompt}}<|end|> <|assistant|>
And BOS option enabled.
BOS is enabled, I have set that prompt, but I am getting an error as reply for every message:
Load Model Error: [Error]
modelLoad Error
Load Model Error: [Done]
from llmfarm.
@guinmoon when you say "works normal" are you referring to the development version or the version in the App store?
The stable version from the app store isn't honoring the end token and the app crashes if you try enabling EOS.
from llmfarm.
Hi. work normal with this template
<|user|> {{prompt}}<|end|> <|assistant|>
And BOS option enabled.
BOS is enabled, I have set that prompt, but I am getting an error as reply for every message: Load Model Error: [Error] modelLoad Error Load Model Error: [Done]
In the TestFlight version I’m using ‘Phi-3-mini-4k-instruct-q4.gguf’
When setting up, I used the “Phi 2” setting template and then wrote the recommended prompt. On my iPhone 14 Pro I’m getting around 2-5 token per second.
Sometimes the <|end|> tag isn’t handle correctly, and it just skips over it and starts a new answer
from llmfarm.
Related Issues (20)
- How to delete downloaded models on phone to free up disk space? HOT 1
- Missing required module 'llmfarm_core_cpp' HOT 2
- Feature Suggestion: CoreML/Neural engine HOT 1
- support MiniCPM
- Support ChatML template? HOT 7
- Please add function calling! HOT 3
- (Question) How to set up MOE?
- Spews complete nonsense after any prompt HOT 1
- Add in-app option support for flash attention HOT 1
- Aya-23-8B gibberish if metal AND mmap turned on HOT 4
- Can't open file 'convert.py' HOT 1
- Add flash attention 2 support
- Improve AI reponse handling
- C_limit HOT 2
- Could not run models HOT 1
- Ability to download models within shortcuts HOT 1
- Gemma-2-2b-it crashing HOT 3
- RAG support
- Responses are too long (C_LIMIT getting ignore)
- add support for MiniCPM V2.6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llmfarm.