petals-infra / chat.petals.dev Goto Github PK
View Code? Open in Web Editor NEW๐ฌ Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client
Home Page: https://chat.petals.dev
๐ฌ Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client
Home Page: https://chat.petals.dev
Hi,
I've played a little with the chatbot this morning, and I got it stuck in a monologue for a few exchanges:
I think the expected behaviour would be to stop the text generation when Human:
appears in the generated output. However, this might lead to premature stop (maybe there should be an option to overwrite the stop and continue if a Human:
is detected).
I tried to install your solution on linux ubuntu server, the solution did not work.
After that, the server stopped responding after 15 minutes of inactivity, and the sites on it stopped working now.
How to remove your solution from the server and restore its normal operation?
Llama 2 supports a context length of 4096, but it looks like chat.petals is limited to 2048:
ValueError: Maximum length exceeded: prefix 0 + current 4094 exceeds pre-allocated maximum 2048
Could the increased context length be supported?
Hi,
I'm Mr. Rushmore, a very friendly guy.
Can I test your http API at http://chat.petals.dev/api1 using curl?
Can you show me an example of the curl syntax just saying good morning to llama2 and receiving output?
Is it possible to provide an API the mimics the functionality of the OPENAI API?
Anyone got rough numbers?
I'm getting a DNS resolution error on chat.petals.ml
After a long convo, I'm getting the following error:
Request failed. Retry
Traceback (most recent call last):
File "/home/borzunov/chat.petals.ml/app.py", line 127, in generate
outputs = model.generate(
File "/home/borzunov/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/remote_generation.py", line 182, in generate
hidden_state = session.step(hidden_state, prompts=intermediate_prompts, hypo_ids=hypo_ids)[:, -1]
File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 233, in step
raise ValueError(
ValueError: Maximum length exceeded: prefix 0 + current 784 exceeds pre-allocated maximum 768
A more helpful error message, preferably with a button to reset the conversation, would be better.
You can replicate it navigating to the endpoint https://health.petals.dev/api/v1/state with a Firefox browser
See #31 (review).
Hey folks, I am trying to run the chat locally, on a system without a GPU. The first issue I have ran into is:
OSError: meta-llama/Llama-2-70b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
Any suggestions?
Requests to api/v2/generate (websocket_api.py) can fail to detect the stop sequence in the generated response and will continue generation well after. This problem becomes even more apparent if max_new_tokens is greater than 1.
I have a change I can commit that works around this issue by scanning for the stop sequence in the last delta appended to the tail of the previous deltas, then stopping and returning a truncated new delta if the stop sequence is found.
If you'd like a PR for this, it will also require merging #31.
Trying to use ModelInfo(repo="petals-team/StableBeluga2", name="stabilityai/StableBeluga2") in my chat.petals.dev colab, I'm getting:
ModelInfo(repo="petals-team/StableBeluga2", name="stabilityai/StableBeluga2")
TypeError: ModelInfo.init() got an unexpected keyword argument 'name'
Hi,
Thank you for making this enormous open-source effort! It's indeed a remarkable work.
There is one thing that puzzles me, and I couldn't find any reliable source to confirmit: any difference between the BLOOMZ-176B accessed through WebSocket API (as you described in your readme file) and the original BLOOMZ-176 released here: https://huggingface.co/bigscience/bloomz.
I'm inclined toward believing they're the same model and Petals Chatbot just uses the WebSocket API internally as well, but it would be really appreciated to have someone confirming this.
Could you consider adding the YaLM-100B open-source model by Yandex? I think it would be a nice addition.
Hello,
Do you think it would make sense to integrate petals somehow in the Oobabooga web UI?
Regards,
Vincent
Just wanted to add this here.
Ideally we would write it into langchain in a way that the user can choose the URL of the endpoint they want to use (Since its not recommended to use chat.petals.dev)
Linux, Python 3.11.3
installation
git clone https://github.com/borzunov/chat.petals.ml.git
python -m venv /home/mkw/yhteinen/chat.petals.ml
cd chat.petals.ml/
source bin/activate
pip install -r requirements.txt
I've tried to find a solution on the net, but no luck. A log file attached
lokitiedosto_2.txt
The HTTP API fails while the chat option works
curl http://chat.petals.ml/api/v1/generate -H "Content-Type: application/x-www-form-urlencoded" -d 'inputs="A cat in French is "&max_new_tokens=3'
{"ok":false,"traceback":"Traceback (most recent call last):\n File "/home/borzunov/chat.petals.ml/http_api.py", line 40, in http_api_generate\n outputs = model.generate(\n File "/home/borzunov/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context\n return func(*args, **kwargs)\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/remote_generation.py", line 171, in generate\n hidden_state = session.step(hidden_state, prompts=intermediate_prompts, hypo_ids=hypo_ids)[:, -1]\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 301, in step\n self._update_sequence(server_idx, block_idx, attempt_no)\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 343, in _update_sequence\n updated_spans = self._sequence_manager.make_sequence(\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/routing/sequence_manager.py", line 154, in make_sequence\n span_sequence = self._make_sequence_with_min_latency(\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/routing/sequence_manager.py", line 182, in _make_sequence_with_min_latency\n raise MissingBlocksError(missing_blocks)\npetals.client.routing.sequence_manager.MissingBlocksError: No servers holding blocks [0, 1, 2, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] are online. You can check the public swarm's state at http://health.petals.ml If there are not enough servers, please connect your GPU: https://github.com/bigscience-workshop/petals#connect-your-gpu-and-increase-petals-capacity \n"}
Force ๐๏ธ
Hi,
I'm new to petals, with a fairly low bandwidth connection. Everytime the download fails, it seems the shard has to be re-downloaded from the start.
Can't petals have resumable downloads? What's being used? LFS? I'm seeing:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.
Happy to look into this if you point me in the right direction.
When I run the daemon and then stops it with keyboard interrupt, server stops responding, but the port is kept occupied by hanging worker thread. This prevents the app to start again, throwing error:
$ gunicorn app:app --bind localhost:8004 --threads 5 --timeout 900
....
[2023-01-22 11:09:02 +0000] [357078] [ERROR] Connection in use: ('localhost', 8004)
[2023-01-22 11:09:02 +0000] [357078] [ERROR] Retrying in 1 second.
Netstat shows gunicorn, but the process is already killed:
$ netstat -anp | grep 8004
tcp 0 0 127.0.0.1:8004 0.0.0.0:* LISTEN 357022/gunicorn: wo
tcp 0 0 127.0.0.1:8004 0.0.0.0:* LISTEN 176873/gunicorn: wo
And yes, ps aux displays hanging worker thread:
$ ps aux | grep 357022
dev 357022 0.0 0.4 2980364 289556 pts/4 S 11:08 0:00 gunicorn: worker [app:app]
Changing the port works and the server is starting. But I have already like 20 ports occupied by zombie threads :-).
Impossible to select part of the text while AI still generates answers. When I select part of the text and AI adds some text to the incomplete answer the selection start moves to the start of the message
Are there specific network ports that need to be open on my firewall in order to allow my system to join the public cluster? If so, which ones?
Every time I run this code:
git clone https://github.com/petals-infra/chat.petals.dev.git cd chat.petals.dev pip install -r requirements.txt flask run --host=0.0.0.0 --port=5000
I get this error
from cpufeature import CPUFeature
ModuleNotFoundError: No module named 'cpufeature'
If I then run pip install cpufeature
I get this error.
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for cpufeature
Running setup.py clean for cpufeature
Failed to build cpufeature
ERROR: Could not build wheels for cpufeature, which is required to install pyproject.toml-based projects
I've tried upgrading and reinstallng pip, setuptools, and wheel. I've tried reinstalling. I've tried changing my virtual environment so no other packages would have a conflict and interfere. How can I get past this?
The chat gets much slower than it should be in this case.
Hypothesis: this may happen because all tensors are serialized/deserialized in one process.
hello this is a very cool project is there a way to leverage petals as a flask app i already have? i want to move away from openai and usage i using a orange pi 5 but i have never been able to utilize the NPU also i looked at the documents and i thought my head was going to explode but yeah i want to basically take my code and switch out openai to petals use my npu/gpu to help the cluster/myself then access the chatbot within my html
heres my code any advice would be awesome as im getting tired on it and just want it to work lol ~heres to hoping the code markdown stuck (i always have problems with it)
import openai
import threading
import time
import sys
import chat_commands
from gtts import gTTS
import os
import tkinter as tk
from tkinter import filedialog
from tkinter import messagebox
from tkinter import *
from flask import Flask, request, render_template
from PIL import Image, ImageTk
import torch
import torchvision.models as models
import speech_recognition as sr
import pygame
import speech_recognition as sr
import webbrowser
import re
import subprocess
import os
#import guifunc
#import pythonide
import pipgui
#from tkinter import *
#from tkinter.filedialog import asksaveasfilename, askopenfilename
# OPENAI API KEY
openai.api_key = "api_key"
doListenToCommand = True
listening = False
# List with common departures to end the while loop
despedida = ["Goodbye", "goodbye", "bye", "Bye", "See you later", "see you later"]
# Create the GUI window
window = tk.Tk()
window.title("Computer:AI")
window.geometry("400x400")
# Create the text entry box
text_entry = tk.Entry(window, width=50)
text_entry.pack(side=tk.BOTTOM)
# Create the submit button
submit_button = tk.Button(window, text="Submit", command=lambda: submit())
submit_button.pack(side=tk.BOTTOM)
# Create the text output box
text_output = tk.Text(window, height=300, width=300)
text_output.pack(side=tk.BOTTOM)
def submit(event=None, text_input=None):
global doListenToCommand
global listening
# Get the user input and check if the input matches the list of goodbyes
if text_input != "":
usuario = text_input
else:
usuario = text_entry.get()
if usuario in despedida:
on_closing()
else:
prompt = f"You are ChatGPT and answer my following message: {usuario}"
# Getting responses using the OpenAI API
response = openai.Completion.create(
engine="text-davinci-003",
prompt=prompt,
max_tokens=2049
)
respuesta = response["choices"][0]["text"]
# Converting text to audio
texto = str(respuesta)
tts = gTTS(texto, lang='en', tld='ie')
tts.save("audio.mp3")
# Displaying the answer on the screen
text_output.insert(tk.END, "ChatGPT: " + respuesta + "\n")
# Clear the input text
text_entry.delete(0, tk.END)
# Playing the audio
doListenToCommand = False
time.sleep(1)
os.system("play audio.mp3")
doListenToCommand = True
# Call function to listen to the user
# if listening == False:
# listen_to_command()
# Bind the Enter key to the submit function
window.bind("<Return>", submit)
#pygame.mixer.music.load("audio.mp3")
#pygame.mixer.music.play()
#termux audio
#os.system("mpg123 audio.mp3")
# Flask app
app = Flask(__name__, template_folder='templates')
@app.route("/", methods=["GET", "POST"])
def index():
if request.method == "POST":
file = request.files["file"]
file.save(file.filename)
openai.api_key = request.form["apikey"]
return "Model file and API key saved."
return render_template("index.html")
def run_as_normal_app():
window.update()
def run_on_flask():
app.run()
def listen_to_command():
global doListenToCommand
global listening
# If we are not to be listening then exit the function.
if doListenToCommand == True:
# Initialize the recognizer
r = sr.Recognizer()
# Use the default microphone as the audio source
with sr.Microphone() as source:
print("Listening...")
listening = True
audio = r.listen(source)
listening = False
try:
# Use speech recognition to convert speech to text
command = r.recognize_google(audio)
print("You said:", command)
text_output.insert(tk.END, "You: " + command + "\n")
text_entry.delete(0, tk.END)
# Check if the command is a "generate image" instruction
# if "generate image" in command.lower():
# # Call the function to generate the image
# generate_image()
# Process the commands
# Prepare object to be passed.
class passed_commands:
tk = tk
text_output = text_output
submit = submit
chat_commands.process_commands(passed_commands,command)
except sr.UnknownValueError:
print("Speech recognition could not understand audio.")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service:", str(e))
listen_to_command()
listening = False
def on_closing():
if tk.messagebox.askokcancel("Quit", "Do you want to quit?"):
window.destroy()
def pythonide():
command = pythonide
process = subprocess.run(["python3","pythonide.py"])
window.protocol("WM_DELETE_WINDOW", on_closing)
if __name__ == "__main__":
# Create the menu bar
menu_bar = tk.Menu(window)
# Create the "File" menu
file_menu = tk.Menu(menu_bar, tearoff=0)
file_menu.add_command(label="Open LLM", command=lambda: filedialog.askopenfilename())
file_menu.add_command(label="Save LLM", command=lambda: filedialog.asksaveasfilename())
file_menu.add_separator()
#file_menu.add_command(label="Exit", command=window.quit)
file_menu.add_command(label="Exit", command=on_closing)
menu_bar.add_cascade(label="File", menu=file_menu)
# Create the "Run" menu
run_menu = tk.Menu(menu_bar, tearoff=0)
run_menu.add_command(label="Run as normal app", command=run_as_normal_app)
run_menu.add_command(label="Run on Flask", command=run_on_flask)
run_menu.add_command(label='Python Ide', command=pythonide)
menu_bar.add_cascade(label="Run", menu=run_menu)
# Set the menu bar
window.config(menu=menu_bar)
# Start the main program loop
start_listening_thread = threading.Thread(target=listen_to_command)
start_listening_thread.daemon = True
start_listening_thread.start()
window.mainloop()````
Seems Unicode characters in responses for all models on chat.petals.dev are generally butchered.
Ask it to "show an emoji", translate something to Japanese, etc., and it will usually return a fair amount of \ufffd characters instead of the correct Unicode.
The hugging face llama2 chat demo does respond correctly:
https://huggingface.co/blog/llama2#demo
Currently, BLOOMZ behaves well only for the first output in the few-shot mode, then outputs </s>
and forgets everything. This is visible in the English-to-Spanish translation example.
We need to use stop_sequence = "\n\n"
and extra_stop_sequences = ["</s>"]
to fix this.
is there any way to provide a system prompt instead of just the user prompt?
or is that hardcoded somewhere along the way (and if so, where?)
thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.