Git Product home page Git Product logo

chat.petals.dev's People

Contributors

borzunov avatar justheuristic avatar tijszwinkels avatar vadi2 avatar webifi avatar xu-song avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chat.petals.dev's Issues

BLOOMZ generates the questions and can enter a monologue loop

Hi,

I've played a little with the chatbot this morning, and I got it stuck in a monologue for a few exchanges:

image

I think the expected behaviour would be to stop the text generation when Human: appears in the generated output. However, this might lead to premature stop (maybe there should be an option to overwrite the stop and continue if a Human: is detected).

How to remove the installation

I tried to install your solution on linux ubuntu server, the solution did not work.
After that, the server stopped responding after 15 minutes of inactivity, and the sites on it stopped working now.
How to remove your solution from the server and restore its normal operation?

4096 context length

Llama 2 supports a context length of 4096, but it looks like chat.petals is limited to 2048:

ValueError: Maximum length exceeded: prefix 0 + current 4094 exceeds pre-allocated maximum 2048

Could the increased context length be supported?

Exception after a dialogue reaches certain length

After a long convo, I'm getting the following error:

Request failed. Retry
Traceback (most recent call last):
  File "/home/borzunov/chat.petals.ml/app.py", line 127, in generate
    outputs = model.generate(
  File "/home/borzunov/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/remote_generation.py", line 182, in generate
    hidden_state = session.step(hidden_state, prompts=intermediate_prompts, hypo_ids=hypo_ids)[:, -1]
  File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 233, in step
    raise ValueError(
ValueError: Maximum length exceeded: prefix 0 + current 784 exceeds pre-allocated maximum 768

A more helpful error message, preferably with a button to reset the conversation, would be better.

Screenshot of the error

image

OSError: meta-llama/Llama-2-70b-hf is not a local folder and is not a valid model identifier

Hey folks, I am trying to run the chat locally, on a system without a GPU. The first issue I have ran into is:

OSError: meta-llama/Llama-2-70b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

Any suggestions?

Stop sequence missed

Requests to api/v2/generate (websocket_api.py) can fail to detect the stop sequence in the generated response and will continue generation well after. This problem becomes even more apparent if max_new_tokens is greater than 1.

I have a change I can commit that works around this issue by scanning for the stop sequence in the last delta appended to the tail of the previous deltas, then stopping and returning a truncated new delta if the stop sequence is found.

If you'd like a PR for this, it will also require merging #31.

ModelInfo.__init__() got an unexpected keyword argument 'name'

Trying to use ModelInfo(repo="petals-team/StableBeluga2", name="stabilityai/StableBeluga2") in my chat.petals.dev colab, I'm getting:

ModelInfo(repo="petals-team/StableBeluga2", name="stabilityai/StableBeluga2")
TypeError: ModelInfo.init() got an unexpected keyword argument 'name'

Is there any difference between the BLOOMZ-176B accessible through WebSocket API and the original BLOOMZ-176B?

Hi,

Thank you for making this enormous open-source effort! It's indeed a remarkable work.

There is one thing that puzzles me, and I couldn't find any reliable source to confirmit: any difference between the BLOOMZ-176B accessed through WebSocket API (as you described in your readme file) and the original BLOOMZ-176 released here: https://huggingface.co/bigscience/bloomz.

I'm inclined toward believing they're the same model and Petals Chatbot just uses the WebSocket API internally as well, but it would be really appreciated to have someone confirming this.

Add YaLM-100B

Could you consider adding the YaLM-100B open-source model by Yandex? I think it would be a nice addition.

HTTP API fails

The HTTP API fails while the chat option works

curl http://chat.petals.ml/api/v1/generate -H "Content-Type: application/x-www-form-urlencoded" -d 'inputs="A cat in French is "&max_new_tokens=3'
{"ok":false,"traceback":"Traceback (most recent call last):\n File "/home/borzunov/chat.petals.ml/http_api.py", line 40, in http_api_generate\n outputs = model.generate(\n File "/home/borzunov/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context\n return func(*args, **kwargs)\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/remote_generation.py", line 171, in generate\n hidden_state = session.step(hidden_state, prompts=intermediate_prompts, hypo_ids=hypo_ids)[:, -1]\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 301, in step\n self._update_sequence(server_idx, block_idx, attempt_no)\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/inference_session.py", line 343, in _update_sequence\n updated_spans = self._sequence_manager.make_sequence(\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/routing/sequence_manager.py", line 154, in make_sequence\n span_sequence = self._make_sequence_with_min_latency(\n File "/home/borzunov/.local/lib/python3.10/site-packages/petals/client/routing/sequence_manager.py", line 182, in _make_sequence_with_min_latency\n raise MissingBlocksError(missing_blocks)\npetals.client.routing.sequence_manager.MissingBlocksError: No servers holding blocks [0, 1, 2, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] are online. You can check the public swarm's state at http://health.petals.ml If there are not enough servers, please connect your GPU: https://github.com/bigscience-workshop/petals#connect-your-gpu-and-increase-petals-capacity \n"}

resume model downloads

Hi,

I'm new to petals, with a fairly low bandwidth connection. Everytime the download fails, it seems the shard has to be re-downloaded from the start.

Can't petals have resumable downloads? What's being used? LFS? I'm seeing:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.

Happy to look into this if you point me in the right direction.

Server don't stop properly, hangs indefinitely

When I run the daemon and then stops it with keyboard interrupt, server stops responding, but the port is kept occupied by hanging worker thread. This prevents the app to start again, throwing error:

$ gunicorn app:app --bind localhost:8004 --threads 5 --timeout 900
....
[2023-01-22 11:09:02 +0000] [357078] [ERROR] Connection in use: ('localhost', 8004)
[2023-01-22 11:09:02 +0000] [357078] [ERROR] Retrying in 1 second.

Netstat shows gunicorn, but the process is already killed:

$ netstat -anp | grep 8004
tcp        0      0 127.0.0.1:8004          0.0.0.0:*               LISTEN      357022/gunicorn: wo 
tcp        0      0 127.0.0.1:8004          0.0.0.0:*               LISTEN      176873/gunicorn: wo 

And yes, ps aux displays hanging worker thread:

$ ps aux | grep 357022
dev       357022  0.0  0.4 2980364 289556 pts/4  S    11:08   0:00 gunicorn: worker [app:app]

Changing the port works and the server is starting. But I have already like 20 ports occupied by zombie threads :-).

Cpufeature requirement not allowing backend access of petals

Every time I run this code:
git clone https://github.com/petals-infra/chat.petals.dev.git cd chat.petals.dev pip install -r requirements.txt flask run --host=0.0.0.0 --port=5000
I get this error
from cpufeature import CPUFeature
ModuleNotFoundError: No module named 'cpufeature'
If I then run pip install cpufeature
I get this error.
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for cpufeature
Running setup.py clean for cpufeature
Failed to build cpufeature
ERROR: Could not build wheels for cpufeature, which is required to install pyproject.toml-based projects
I've tried upgrading and reinstallng pip, setuptools, and wheel. I've tried reinstalling. I've tried changing my virtual environment so no other packages would have a conflict and interfere. How can I get past this?

(question)switching from openai and NPU usage on the opi 5?

hello this is a very cool project is there a way to leverage petals as a flask app i already have? i want to move away from openai and usage i using a orange pi 5 but i have never been able to utilize the NPU also i looked at the documents and i thought my head was going to explode but yeah i want to basically take my code and switch out openai to petals use my npu/gpu to help the cluster/myself then access the chatbot within my html
heres my code any advice would be awesome as im getting tired on it and just want it to work lol ~heres to hoping the code markdown stuck (i always have problems with it)


import openai
import threading
import time
import sys
import chat_commands
from gtts import gTTS
import os
import tkinter as tk
from tkinter import filedialog
from tkinter import messagebox
from tkinter import *
from flask import Flask, request, render_template
from PIL import Image, ImageTk
import torch
import torchvision.models as models
import speech_recognition as sr
import pygame
import speech_recognition as sr
import webbrowser
import re
import subprocess
import os
#import guifunc
#import pythonide
import pipgui
#from tkinter import *
#from tkinter.filedialog import asksaveasfilename, askopenfilename

# OPENAI API KEY
openai.api_key = "api_key"

doListenToCommand = True
listening = False

# List with common departures to end the while loop
despedida = ["Goodbye", "goodbye", "bye", "Bye", "See you later", "see you later"]

# Create the GUI window
window = tk.Tk()
window.title("Computer:AI")
window.geometry("400x400")

# Create the text entry box
text_entry = tk.Entry(window, width=50)
text_entry.pack(side=tk.BOTTOM)

# Create the submit button
submit_button = tk.Button(window, text="Submit", command=lambda: submit())
submit_button.pack(side=tk.BOTTOM)

# Create the text output box
text_output = tk.Text(window, height=300, width=300)
text_output.pack(side=tk.BOTTOM)

def submit(event=None, text_input=None):
    global doListenToCommand
    global listening

    # Get the user input and check if the input matches the list of goodbyes
    if text_input != "":
        usuario = text_input
    else:
        usuario = text_entry.get()

    if usuario in despedida:
        on_closing()
    else:
        prompt = f"You are ChatGPT and answer my following message: {usuario}"

    # Getting responses using the OpenAI API
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=2049
    )

    respuesta = response["choices"][0]["text"]

    # Converting text to audio
    texto = str(respuesta)
    tts = gTTS(texto, lang='en', tld='ie')
    tts.save("audio.mp3")

    # Displaying the answer on the screen
    text_output.insert(tk.END, "ChatGPT: " + respuesta + "\n")

    # Clear the input text
    text_entry.delete(0, tk.END)

    # Playing the audio
    doListenToCommand = False
    time.sleep(1)
    os.system("play audio.mp3")
    doListenToCommand = True

    # Call function to listen to the user
    # if listening == False:
    #     listen_to_command()



# Bind the Enter key to the submit function
window.bind("<Return>", submit)
#pygame.mixer.music.load("audio.mp3")
#pygame.mixer.music.play()
#termux audio
#os.system("mpg123 audio.mp3")
# Flask app
app = Flask(__name__, template_folder='templates')

@app.route("/", methods=["GET", "POST"])
def index():
    if request.method == "POST":
        file = request.files["file"]
        file.save(file.filename)
        openai.api_key = request.form["apikey"]
        return "Model file and API key saved."
    return render_template("index.html")

def run_as_normal_app():
    window.update()

def run_on_flask():
    app.run()

def listen_to_command():
    global doListenToCommand
    global listening

    # If we are not to be listening then exit the function.
    if doListenToCommand == True:
        # Initialize the recognizer
        r = sr.Recognizer()

        # Use the default microphone as the audio source
        with sr.Microphone() as source:
            print("Listening...")
            listening = True
            audio = r.listen(source)
            listening = False

        try:
            # Use speech recognition to convert speech to text
            command = r.recognize_google(audio)
            print("You said:", command)
            text_output.insert(tk.END, "You: " + command + "\n")
            text_entry.delete(0, tk.END)
            
            # Check if the command is a "generate image" instruction
            # if "generate image" in command.lower():
            #   # Call the function to generate the image
            #   generate_image()

            # Process the commands
            # Prepare object to be passed.
            class passed_commands:
                tk = tk
                text_output = text_output
                submit = submit

            chat_commands.process_commands(passed_commands,command)

        except sr.UnknownValueError:
            print("Speech recognition could not understand audio.")
        except sr.RequestError as e:
            print("Could not request results from Google Speech Recognition service:", str(e))


        listen_to_command()
        listening = False

def on_closing():
    if tk.messagebox.askokcancel("Quit", "Do you want to quit?"):
        window.destroy()

def pythonide():
    command = pythonide
    process = subprocess.run(["python3","pythonide.py"])        

window.protocol("WM_DELETE_WINDOW", on_closing)

if __name__ == "__main__":
    # Create the menu bar
    menu_bar = tk.Menu(window)

    # Create the "File" menu
    file_menu = tk.Menu(menu_bar, tearoff=0)
    file_menu.add_command(label="Open LLM", command=lambda: filedialog.askopenfilename())
    file_menu.add_command(label="Save LLM", command=lambda: filedialog.asksaveasfilename())
    file_menu.add_separator()
    #file_menu.add_command(label="Exit", command=window.quit)
    file_menu.add_command(label="Exit", command=on_closing)
    menu_bar.add_cascade(label="File", menu=file_menu)

    # Create the "Run" menu
    run_menu = tk.Menu(menu_bar, tearoff=0)
    run_menu.add_command(label="Run as normal app", command=run_as_normal_app)
    run_menu.add_command(label="Run on Flask", command=run_on_flask)
    run_menu.add_command(label='Python Ide', command=pythonide)
    menu_bar.add_cascade(label="Run", menu=run_menu)

    # Set the menu bar
    window.config(menu=menu_bar)


    # Start the main program loop
    start_listening_thread = threading.Thread(target=listen_to_command)
    start_listening_thread.daemon = True
    start_listening_thread.start()
    window.mainloop()````

Change separators for BLOOMZ in the few-shot mode

Currently, BLOOMZ behaves well only for the first output in the few-shot mode, then outputs </s> and forgets everything. This is visible in the English-to-Spanish translation example.

We need to use stop_sequence = "\n\n" and extra_stop_sequences = ["</s>"] to fix this.

providing a system prompt

is there any way to provide a system prompt instead of just the user prompt?

or is that hardcoded somewhere along the way (and if so, where?)

thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.