petals-infra / chat.petals.dev Goto Github PK

View Code? Open in Web Editor NEW

297.0 10.0 73.0 238 KB

💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client

Home Page: https://chat.petals.dev

Python 45.04% JavaScript 25.77% HTML 16.40% CSS 12.79%

api bloom chatbot distributed-systems gpt guanaco language-models large-language-models llama transformer

chat.petals.dev's People

Contributors

Stargazers

Watchers

chat.petals.dev's Issues

BLOOMZ generates the questions and can enter a monologue loop

Hi,

I've played a little with the chatbot this morning, and I got it stuck in a monologue for a few exchanges:

image

I think the expected behaviour would be to stop the text generation when Human: appears in the generated output. However, this might lead to premature stop (maybe there should be an option to overwrite the stop and continue if a Human: is detected).

How to remove the installation

I tried to install your solution on linux ubuntu server, the solution did not work.
After that, the server stopped responding after 15 minutes of inactivity, and the sites on it stopped working now.
How to remove your solution from the server and restore its normal operation?

4096 context length

Llama 2 supports a context length of 4096, but it looks like chat.petals is limited to 2048:

ValueError: Maximum length exceeded: prefix 0 + current 4094 exceeds pre-allocated maximum 2048

Could the increased context length be supported?

Can I test your http API at http://chat.petals.dev/api1 using curl?

Hi,

I'm Mr. Rushmore, a very friendly guy.

Can I test your http API at http://chat.petals.dev/api1 using curl?

Can you show me an example of the curl syntax just saying good morning to llama2 and receiving output?

OpenAI-compatible API

Is it possible to provide an API the mimics the functionality of the OPENAI API?

Update the memory requirements for Guanaco in the readme.

Anyone got rough numbers?

Site not available.

I'm getting a DNS resolution error on chat.petals.ml

Exception after a dialogue reaches certain length

After a long convo, I'm getting the following error:

Request failed. Retry
Traceback (most recent call last):
  File "/home/borzunov/chat.petals.ml/app.py", line 127, in generate
    outputs = model.generate(
  File "/home/borzunov/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/remote_generation.py", line 182, in generate
    hidden_state = session.step(hidden_state, prompts=intermediate_prompts, hypo_ids=hypo_ids)[:, -1]
  File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 233, in step
    raise ValueError(
ValueError: Maximum length exceeded: prefix 0 + current 784 exceeds pre-allocated maximum 768

A more helpful error message, preferably with a button to reset the conversation, would be better.

Screenshot of the error

[Firefox only] SyntaxError: JSON.parse: unexpected character at line 51 column 73 of the JSON data

You can replicate it navigating to the endpoint https://health.petals.dev/api/v1/state with a Firefox browser

Fix speed measurements in case of complex Unicode characters

See #31 (review).

OSError: meta-llama/Llama-2-70b-hf is not a local folder and is not a valid model identifier

Hey folks, I am trying to run the chat locally, on a system without a GPU. The first issue I have ran into is:

OSError: meta-llama/Llama-2-70b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

Any suggestions?

Stop sequence missed

Requests to api/v2/generate (websocket_api.py) can fail to detect the stop sequence in the generated response and will continue generation well after. This problem becomes even more apparent if max_new_tokens is greater than 1.

I have a change I can commit that works around this issue by scanning for the stop sequence in the last delta appended to the tail of the previous deltas, then stopping and returning a truncated new delta if the stop sequence is found.

If you'd like a PR for this, it will also require merging #31.

ModelInfo.init() got an unexpected keyword argument 'name'

Trying to use ModelInfo(repo="petals-team/StableBeluga2", name="stabilityai/StableBeluga2") in my chat.petals.dev colab, I'm getting:

ModelInfo(repo="petals-team/StableBeluga2", name="stabilityai/StableBeluga2")
TypeError: ModelInfo.init() got an unexpected keyword argument 'name'

Is there any difference between the BLOOMZ-176B accessible through WebSocket API and the original BLOOMZ-176B?

Hi,

Thank you for making this enormous open-source effort! It's indeed a remarkable work.

There is one thing that puzzles me, and I couldn't find any reliable source to confirmit: any difference between the BLOOMZ-176B accessed through WebSocket API (as you described in your readme file) and the original BLOOMZ-176 released here: https://huggingface.co/bigscience/bloomz.

I'm inclined toward believing they're the same model and Petals Chatbot just uses the WebSocket API internally as well, but it would be really appreciated to have someone confirming this.

Add YaLM-100B

Could you consider adding the YaLM-100B open-source model by Yandex? I think it would be a nice addition.

Integration in Oobabooga webui?

Hello,

Do you think it would make sense to integrate petals somehow in the Oobabooga web UI?

Regards,
Vincent

Integrate with Langchain

Just wanted to add this here.

langchain-ai/langchain#8563

Ideally we would write it into langchain in a way that the user can choose the URL of the endpoint they want to use (Since its not recommended to use chat.petals.dev)

python setup.py bdist_wheel did not run successfully / Failed building wheel for grpcio-tools

Linux, Python 3.11.3
installation
git clone https://github.com/borzunov/chat.petals.ml.git
python -m venv /home/mkw/yhteinen/chat.petals.ml
cd chat.petals.ml/
source bin/activate
pip install -r requirements.txt

I've tried to find a solution on the net, but no luck. A log file attached
lokitiedosto_2.txt

HTTP API fails

The HTTP API fails while the chat option works

curl http://chat.petals.ml/api/v1/generate -H "Content-Type: application/x-www-form-urlencoded" -d 'inputs="A cat in French is "&max_new_tokens=3'
{"ok":false,"traceback":"Traceback (most recent call last):\n File "/home/borzunov/chat.petals.ml/http_api.py", line 40, in http_api_generate\n outputs = model.generate(\n File "/home/borzunov/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context\n return func(*args, **kwargs)\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/remote_generation.py", line 171, in generate\n hidden_state = session.step(hidden_state, prompts=intermediate_prompts, hypo_ids=hypo_ids)[:, -1]\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 301, in step\n self._update_sequence(server_idx, block_idx, attempt_no)\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 343, in _update_sequence\n updated_spans = self._sequence_manager.make_sequence(\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/routing/sequence_manager.py", line 154, in make_sequence\n span_sequence = self._make_sequence_with_min_latency(\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/routing/sequence_manager.py", line 182, in _make_sequence_with_min_latency\n raise MissingBlocksError(missing_blocks)\npetals.client.routing.sequence_manager.MissingBlocksError: No servers holding blocks [0, 1, 2, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] are online. You can check the public swarm's state at http://health.petals.ml If there are not enough servers, please connect your GPU: https://github.com/bigscience-workshop/petals#connect-your-gpu-and-increase-petals-capacity \n"}

Import : import md and most files on GitHub

Force 🏛️

resume model downloads

Hi,

I'm new to petals, with a fairly low bandwidth connection. Everytime the download fails, it seems the shard has to be re-downloaded from the start.

Can't petals have resumable downloads? What's being used? LFS? I'm seeing:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.

Happy to look into this if you point me in the right direction.

Server don't stop properly, hangs indefinitely

When I run the daemon and then stops it with keyboard interrupt, server stops responding, but the port is kept occupied by hanging worker thread. This prevents the app to start again, throwing error:

$ gunicorn app:app --bind localhost:8004 --threads 5 --timeout 900
....
[2023-01-22 11:09:02 +0000] [357078] [ERROR] Connection in use: ('localhost', 8004)
[2023-01-22 11:09:02 +0000] [357078] [ERROR] Retrying in 1 second.

Netstat shows gunicorn, but the process is already killed:

$ netstat -anp | grep 8004
tcp        0      0 127.0.0.1:8004          0.0.0.0:*               LISTEN      357022/gunicorn: wo 
tcp        0      0 127.0.0.1:8004          0.0.0.0:*               LISTEN      176873/gunicorn: wo

And yes, ps aux displays hanging worker thread:

$ ps aux | grep 357022
dev       357022  0.0  0.4 2980364 289556 pts/4  S    11:08   0:00 gunicorn: worker [app:app]

Changing the port works and the server is starting. But I have already like 20 ports occupied by zombie threads :-).

Impossible to select part of the text while AI still generates answers

Impossible to select part of the text while AI still generates answers. When I select part of the text and AI adds some text to the incomplete answer the selection start moves to the start of the message

Open ports needed to connect to the public petals cluster?

Are there specific network ports that need to be open on my firewall in order to allow my system to join the public cluster? If so, which ones?

Cpufeature requirement not allowing backend access of petals

Every time I run this code:
git clone https://github.com/petals-infra/chat.petals.dev.git cd chat.petals.dev pip install -r requirements.txt flask run --host=0.0.0.0 --port=5000
I get this error
from cpufeature import CPUFeature
ModuleNotFoundError: No module named 'cpufeature'
If I then run pip install cpufeature
I get this error.
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for cpufeature
Running setup.py clean for cpufeature
Failed to build cpufeature
ERROR: Could not build wheels for cpufeature, which is required to install pyproject.toml-based projects
I've tried upgrading and reinstallng pip, setuptools, and wheel. I've tried reinstalling. I've tried changing my virtual environment so no other packages would have a conflict and interfere. How can I get past this?

Chat gets slower when multiple people use it

The chat gets much slower than it should be in this case.

Hypothesis: this may happen because all tensors are serialized/deserialized in one process.

(question)switching from openai and NPU usage on the opi 5?

hello this is a very cool project is there a way to leverage petals as a flask app i already have? i want to move away from openai and usage i using a orange pi 5 but i have never been able to utilize the NPU also i looked at the documents and i thought my head was going to explode but yeah i want to basically take my code and switch out openai to petals use my npu/gpu to help the cluster/myself then access the chatbot within my html
heres my code any advice would be awesome as im getting tired on it and just want it to work lol ~heres to hoping the code markdown stuck (i always have problems with it)


import openai
import threading
import time
import sys
import chat_commands
from gtts import gTTS
import os
import tkinter as tk
from tkinter import filedialog
from tkinter import messagebox
from tkinter import *
from flask import Flask, request, render_template
from PIL import Image, ImageTk
import torch
import torchvision.models as models
import speech_recognition as sr
import pygame
import speech_recognition as sr
import webbrowser
import re
import subprocess
import os
#import guifunc
#import pythonide
import pipgui
#from tkinter import *
#from tkinter.filedialog import asksaveasfilename, askopenfilename

# OPENAI API KEY
openai.api_key = "api_key"

doListenToCommand = True
listening = False

# List with common departures to end the while loop
despedida = ["Goodbye", "goodbye", "bye", "Bye", "See you later", "see you later"]

# Create the GUI window
window = tk.Tk()
window.title("Computer:AI")
window.geometry("400x400")

# Create the text entry box
text_entry = tk.Entry(window, width=50)
text_entry.pack(side=tk.BOTTOM)

# Create the submit button
submit_button = tk.Button(window, text="Submit", command=lambda: submit())
submit_button.pack(side=tk.BOTTOM)

# Create the text output box
text_output = tk.Text(window, height=300, width=300)
text_output.pack(side=tk.BOTTOM)

def submit(event=None, text_input=None):
    global doListenToCommand
    global listening

    # Get the user input and check if the input matches the list of goodbyes
    if text_input != "":
        usuario = text_input
    else:
        usuario = text_entry.get()

    if usuario in despedida:
        on_closing()
    else:
        prompt = f"You are ChatGPT and answer my following message: {usuario}"

    # Getting responses using the OpenAI API
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=2049
    )

    respuesta = response["choices"][0]["text"]

    # Converting text to audio
    texto = str(respuesta)
    tts = gTTS(texto, lang='en', tld='ie')
    tts.save("audio.mp3")

    # Displaying the answer on the screen
    text_output.insert(tk.END, "ChatGPT: " + respuesta + "\n")

    # Clear the input text
    text_entry.delete(0, tk.END)

    # Playing the audio
    doListenToCommand = False
    time.sleep(1)
    os.system("play audio.mp3")
    doListenToCommand = True

    # Call function to listen to the user
    # if listening == False:
    #     listen_to_command()



# Bind the Enter key to the submit function
window.bind("<Return>", submit)
#pygame.mixer.music.load("audio.mp3")
#pygame.mixer.music.play()
#termux audio
#os.system("mpg123 audio.mp3")
# Flask app
app = Flask(__name__, template_folder='templates')

@app.route("/", methods=["GET", "POST"])
def index():
    if request.method == "POST":
        file = request.files["file"]
        file.save(file.filename)
        openai.api_key = request.form["apikey"]
        return "Model file and API key saved."
    return render_template("index.html")

def run_as_normal_app():
    window.update()

def run_on_flask():
    app.run()

def listen_to_command():
    global doListenToCommand
    global listening

    # If we are not to be listening then exit the function.
    if doListenToCommand == True:
        # Initialize the recognizer
        r = sr.Recognizer()

        # Use the default microphone as the audio source
        with sr.Microphone() as source:
            print("Listening...")
            listening = True
            audio = r.listen(source)
            listening = False

        try:
            # Use speech recognition to convert speech to text
            command = r.recognize_google(audio)
            print("You said:", command)
            text_output.insert(tk.END, "You: " + command + "\n")
            text_entry.delete(0, tk.END)
            
            # Check if the command is a "generate image" instruction
            # if "generate image" in command.lower():
            #   # Call the function to generate the image
            #   generate_image()

            # Process the commands
            # Prepare object to be passed.
            class passed_commands:
                tk = tk
                text_output = text_output
                submit = submit

            chat_commands.process_commands(passed_commands,command)

        except sr.UnknownValueError:
            print("Speech recognition could not understand audio.")
        except sr.RequestError as e:
            print("Could not request results from Google Speech Recognition service:", str(e))


        listen_to_command()
        listening = False

def on_closing():
    if tk.messagebox.askokcancel("Quit", "Do you want to quit?"):
        window.destroy()

def pythonide():
    command = pythonide
    process = subprocess.run(["python3","pythonide.py"])        

window.protocol("WM_DELETE_WINDOW", on_closing)

if __name__ == "__main__":
    # Create the menu bar
    menu_bar = tk.Menu(window)

    # Create the "File" menu
    file_menu = tk.Menu(menu_bar, tearoff=0)
    file_menu.add_command(label="Open LLM", command=lambda: filedialog.askopenfilename())
    file_menu.add_command(label="Save LLM", command=lambda: filedialog.asksaveasfilename())
    file_menu.add_separator()
    #file_menu.add_command(label="Exit", command=window.quit)
    file_menu.add_command(label="Exit", command=on_closing)
    menu_bar.add_cascade(label="File", menu=file_menu)

    # Create the "Run" menu
    run_menu = tk.Menu(menu_bar, tearoff=0)
    run_menu.add_command(label="Run as normal app", command=run_as_normal_app)
    run_menu.add_command(label="Run on Flask", command=run_on_flask)
    run_menu.add_command(label='Python Ide', command=pythonide)
    menu_bar.add_cascade(label="Run", menu=run_menu)

    # Set the menu bar
    window.config(menu=menu_bar)


    # Start the main program loop
    start_listening_thread = threading.Thread(target=listen_to_command)
    start_listening_thread.daemon = True
    start_listening_thread.start()
    window.mainloop()````

Token decode issues with Unicode characters

Seems Unicode characters in responses for all models on chat.petals.dev are generally butchered.

Ask it to "show an emoji", translate something to Japanese, etc., and it will usually return a fair amount of \ufffd characters instead of the correct Unicode.

The hugging face llama2 chat demo does respond correctly:
https://huggingface.co/blog/llama2#demo

Change separators for BLOOMZ in the few-shot mode

Currently, BLOOMZ behaves well only for the first output in the few-shot mode, then outputs </s> and forgets everything. This is visible in the English-to-Spanish translation example.

We need to use stop_sequence = "\n\n" and extra_stop_sequences = ["</s>"] to fix this.

providing a system prompt

is there any way to provide a system prompt instead of just the user prompt?

or is that hardcoded somewhere along the way (and if so, where?)

thanks!

petals-infra / chat.petals.dev Goto Github PK

chat.petals.dev's People

Contributors

Stargazers

Watchers

Forkers

chat.petals.dev's Issues

Recommend Projects

Recommend Topics

Recommend Org