Comments (5)
instruct isn't a valid flag because it's encompassed in the api itself – ChatCompletion will simulate a chat response and Completion will simulate a completion specifically. So it's not a necessary flag (the app using the OpenAi API should already be doing the right "instruct" mode when necessary)
For the model, you want to pass that into the gpt-app instead (like chatbot-ui or auto-gpt), typically in the .env
file, so it'd look something like OPENAI_API_KEY=../llama.cpp/models/wizardLM-7B-GGML/wizardLM-7B.ggml.q5_1.bin
from gpt-llama.cpp.
That would be weird abuse of a variable. It would be much better to have a LOCAL_MODEL_PATH variable, and if no local model path is set, then use OpenAI's API, for example. I would favor trying to use a de facto standard local API such as text-generation-webui's API, rather than trying to reinvent the wheel by running local models directly, though. For one thing, sharing one local API means that multiple tools can use it. For another, there's a LOT of complexity in supporting local acceleration hardware and different model types and so on. Just using a standard local API makes it a lot simpler.
from gpt-llama.cpp.
@keldenl
Sorry I think I'm missing something. How do I get it to follow the ### INSTRUCTION ### RESPONSE template for alpaca and similar models. When I use chatcompletion, it seems to be in a User: Assistant: template, which isn't working for wizardLM. The LLM doesn't follow my instructions.
When I use the Completions endpoint and add the Instruction Response template into the prompt, the server seems to hang and no response is generated.
It Processes the prompt, and then the ===== RESPONSE ===== line appears, and that's it.
from gpt-llama.cpp.
That would be weird abuse of a variable. It would be much better to have a LOCAL_MODEL_PATH variable, and if no local model path is set, then use OpenAI's API, for example. I would favor trying to use a de facto standard local API such as text-generation-webui's API, rather than trying to reinvent the wheel by running local models directly, though. For one thing, sharing one local API means that multiple tools can use it. For another, there's a LOT of complexity in supporting local acceleration hardware and different model types and so on. Just using a standard local API makes it a lot simpler.
The thing about this is that the end goal for this project to be able to plug 'n play with any GPT-powered project – the less changes (even 0 changes like in chatbot-ui) to the code the better. LOCAL_MODEL_PATH
is something people need to account for (i.e. langchain supporting local models), but this project aims to solve for all the other GPT apps that exist out there how can we leverage the work folks have done but run a local model against it? That's the goal.
from gpt-llama.cpp.
@keldenl
Sorry I think I'm missing something. How do I get it to follow the ### INSTRUCTION ### RESPONSE template for alpaca and similar models. When I use chatcompletion, it seems to be in a User: Assistant: template, which isn't working for wizardLM. The LLM doesn't follow my instructions.
When I use the Completions endpoint and add the Instruction Response template into the prompt, the server seems to hang and no response is generated.
It Processes the prompt, and then the ===== RESPONSE ===== line appears, and that's it.
@regstuff it sounds like you might be running into a different issue – any chance you could post what's showing up on your terminal and what the request is? (where are you using the server? chatbot-ui?)
also i just merged some changes that should give u better error logging so maybe pull and then post here?
from gpt-llama.cpp.
Related Issues (20)
- TypeError: Window.fetch: HEAD or GET Request cannot have a body. HOT 1
- npm error on gpt-llama.cpp HOT 4
- Slow speed Vicuna - 7B Help plz HOT 3
- llama.cpp GPU support HOT 1
- Are there different specific instructions for running Red Pajama?
- no response message with Readable Stream: CLOSED HOT 2
- Error: spawn ..\llama.cpp\main ENOENT at ChildProcess._handle.onexit HOT 1
- SERVER BUSY, REQUEST QUEUED
- Cannot POST /V1/embeddings HOT 1
- Bearer Token vs Model parameter?
- Why is a default chat being forced?
- Every Other Chat Response HOT 1
- Finding last messages?
- "Internal Server Error" on a remote server
- Change listening ip to public ip? HOT 1
- gguf supported? HOT 1
- llama.cpp unresponsive for 20 seconds HOT 3
- Module not found: Package path ./lite/tiktoken_bg.wasm?module is not exported from package HOT 1
- node:events:491 throw er; // Unhandled 'error' event Error: spawn YOUR_KEY=../llama.cpp/main ENOENT
- How to create a single binary
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpt-llama.cpp.