philschmid / easyllm Goto Github PK

View Code? Open in Web Editor NEW

392.0 6.0 34.0 1.01 MB

Home Page: https://philschmid.github.io/easyllm/

License: MIT License

Makefile 0.08% Python 25.89% Jupyter Notebook 74.03%

easyllm's Introduction

EasyLLM -

EasyLLM is an open source project that provides helpful tools and methods for working with large language models (LLMs), both open source and closed source. Get immediataly started or check out the documentation.

EasyLLM implements clients that are compatible with OpenAI's Completion API. This means you can easily replace openai.ChatCompletion, openai.Completion, openai.Embedding with, for example, huggingface.ChatCompletion, huggingface.Completion or huggingface.Embedding by changing one line of code.

Supported Clients

huggingface - HuggingFace models
- huggingface.ChatCompletion - Chat with LLMs
- huggingface.Completion - Text completion with LLMs
- huggingface.Embedding - Create embeddings with LLMs
sagemaker - Open LLMs deployed on Amazon SageMaker
- sagemaker.ChatCompletion - Chat with LLMs
- sagemaker.Completion - Text completion with LLMs
- sagemaker.Embedding - Create embeddings with LLMs
bedrock - Amazon Bedrock LLMs

Check out the Examples to get started.

🚀 Getting Started

Install EasyLLM via pip:

pip install easyllm

Then import and start using the clients:

from easyllm.clients import huggingface

# helper to build llama2 prompt
huggingface.prompt_builder = "llama2"

response = huggingface.ChatCompletion.create(
    model="meta-llama/Llama-2-70b-chat-hf",
    messages=[
        {"role": "system", "content": "\nYou are a helpful assistant speaking like a pirate. argh!"},
        {"role": "user", "content": "What is the sun?"},
    ],
    temperature=0.9,
    top_p=0.6,
    max_tokens=256,
)

print(response)

the result will look like

{
  "id": "hf-lVC2iTMkFJ",
  "object": "chat.completion",
  "created": 1690661144,
  "model": "meta-llama/Llama-2-70b-chat-hf",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": " Arrrr, the sun be a big ol' ball o' fire in the sky, me hearty! It be the source o' light and warmth for our fair planet, and it be a mighty powerful force, savvy? Without the sun, we'd be sailin' through the darkness, lost and cold, so let's give a hearty \"Yarrr!\" for the sun, me hearties! Arrrr!"
      },
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 111,
    "completion_tokens": 299,
    "total_tokens": 410
  }
}

Check out other examples:

See the documentation for more detailed usage and examples.

💪🏻 Migration from OpenAI to HuggingFace

Migrating from OpenAI to HuggingFace is easy. Just change the import statement and the client you want to use and optionally the prompt builder.

- import openai
+ from easyllm.clients import huggingface
+ huggingface.prompt_builder = "llama2"


- response = openai.ChatCompletion.create(
+ response = huggingface.ChatCompletion.create(
-    model="gpt-3.5-turbo",
+    model="meta-llama/Llama-2-70b-chat-hf",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Knock knock."},
    ],
)

Make sure when you switch your client that your hyperparameters are still valid. For example, temperature of GPT-3 might be different than temperature of Llama-2.

☑️ Key Features

🤝 Compatible Clients

Implementation of clients compatible with OpenAI API format of openai.ChatCompletion, openai.Completion, openai.Embedding.
Easily switch between different LLMs like openai.ChatCompletion and huggingface.ChatCompletion by changing one line of code.
Support for streaming of completions, checkout example How to stream completions.

⚙️ Helper Modules ⚙️

evol_instruct (work in progress) - Use evolutionary algorithms create instructions for LLMs.
prompt_utils - Helper methods to easily convert between prompt formats like OpenAI Messages to prompts for open source models like Llama 2.

🙏 Contributing

EasyLLM is an open source project and welcomes contributions of all kinds.

The project uses hatch for development. To get started, fork the repository and clone it to your local machine.

Confirm hatch is installed (pipx is great to make it available globally on your machine)
Once in the project directory, run hatch env create to create a default virtual environment for development.
Activate the virtual environment with hatch shell
Start developing! 🤩

📔 Citation & Acknowledgements

If you use EasyLLM, please share it with me on social media or email. I would love to hear about it! You can also cite the project using the following BibTeX:

@software{Philipp_Schmid_EasyLLM_2023,
author = {Philipp Schmid},
license = {Apache-2.0},
month = juj,
title = {EasyLLM: Streamlined Tools for LLMs},
url = {https://github.com/philschmid/easyllm},
year = {2023}
}

easyllm's People

Contributors

Stargazers

Watchers

easyllm's Issues

Issue setting huggingface.prompt_builder = 'llama2' when using sagemaker as client

So I'm building a class that can alternate between both the huggingface and sagemaker clients and I declare all my os.environs at the top of the class like so:

os.environ["AWS_ACCESS_KEY_ID"] = "<key_id>"
os.environ["AWS_SECRET_ACCESS_KEY"] = '<access_key>'
os.environ["AWS_DEFAULT_REGION"] = "us-east-1"
os.environ["HF_TOKEN"] = "<hf_token>"
os.environ["HUGGINGFACE_PROMPT"] = "llama2"

and even later on in the class, just to be sure, I declare huggingface.prompt_builder = 'llama2'
tried importing build_llama2_prompt directly and passing it as a callable, that also didn't work
tried setting sagemaker.prompt_builder = 'llama2' just for fun to see if that would do anything...nope

Still get the warning telling me I haven't set a prompt builder, which is kinda weird, plus it's clear that occasionally the prompt is being formatted a bit weirdly (because the same prompt formatted as in the example below when passed directly to the sagemaker endpoint yields a somewhat better response from the same endpoint)

it's nbd that this doesn't work super well for me, I might just be being stupid about it, below is how I've just worked around it by manually implementing w/ sagemaker's HuggingFacePredictor cls:

llm = sagemaker.huggingface.model.HuggingFacePredictor('llama-party', sess)
def build_llama2_prompt(messages):
    startPrompt = "<s>[INST] "
    endPrompt = " [/INST]"
    conversation = []
    for index, message in enumerate(messages):
        if message["role"] == "system" and index == 0:
            conversation.append(f"<<SYS>>\n{message['content']}\n<</SYS>>\n\n")
        elif message["role"] == "user":
            conversation.append(message["content"].strip())
        else:
            conversation.append(f" [/INST] {message.content}</s><s>[INST] ")

    return startPrompt + "".join(conversation) + endPrompt

prompt = build_llama2_prompt(messages)

payload = {
  "inputs":  prompt,
  "parameters": {
    "do_sample": True,
    "top_p": 0.6,
    "temperature": 0.9,
    "top_k": 50,
    "max_new_tokens": 512,
    "repetition_penalty": 1.03,
    "stop": ["</s>"]
  }
}

chat = llm.predict(payload)

print(chat[0]["generated_text"][len(prompt):])

this code was pretty much fully taken from the sagemaker llama deployment blog post here: https://www.philschmid.de/sagemaker-llama-llm

works fine, just don't know why the same code doesn't work right inside of the lib (easyllm)

Need to provide additional args to InferenceClient

I'm using huggingface.chatCompletion and need to be able to provide some cookies to the InferenceClient. I don't see a way to pass that in via create(), which is where the client is instantiated. I also looked at the huggingface_hub implementation, but didn't see a way to set it via module member or environment variable.

The approach I landed on was adding a module-level client_args dictionary, similar to huggingface.api_key and .prompt_builder, and passing anything defined in there when the client is instantiated. If that makes sense and you want to support this, I'm happy to open a PR. Or if you see issues or have suggestions to improve, let me know.

Add support for Vertex AI pretrained language models (GCP)

See

Streaming support in Sagemaker?

Wondering if it is possible to add streaming support for models hosted on Sagemaker. This example makes it seem possible, but the implementation looks awkward.
https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/generativeai/llm-workshop/lab6-stream-with-pagination/stream_pagination_lmi.ipynb

bedrock.ChatCompletion.create Raises ValidationError for Non-Integer Token Values in Python 3.9

Python 3.9

Code to reproduce:

import os
# set env for prompt builder
os.environ["BEDROCK_PROMPT"] = "anthropic"
os.environ["AWS_REGION"] = "us-east-1"
os.environ["AWS_ACCESS_KEY_ID"] = "xxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxx"

from easyllm.clients import bedrock
from easyllm.schema.base import ChatMessage

response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        ChatMessage(role="user", content="What is 2 + 2?"),
    ],
    temperature=0.9,
    top_p=0.6,
    max_tokens=1024,
    debug=False,
)
print(response)

Result:

Traceback (most recent call last):
  File "<path_to>\main.py", line 27, in <module>
    response = bedrock.ChatCompletion.create(
  File "<path_to>\lib\site-packages\easyllm\clients\bedrock.py", line 208, in create
    usage=Usage(
  File "<path_to>\lib\site-packages\pydantic\main.py", line 159, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Usage
completion_tokens
  Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=2.25, input_type=float]
    For further information visit https://errors.pydantic.dev/2.1/v/int_from_float
total_tokens
  Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=11.25, input_type=float]
    For further information visit https://errors.pydantic.dev/2.1/v/int_from_float

Expected result:
{'id': 'hf-iE53Kvwnlr', 'object': 'chat.completion', 'created': 1698683234, 'model': 'anthropic.claude-v2', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '2 + 2 = 4'}, 'finish_reason': 'stop_sequence'}], 'usage': {'prompt_tokens': 9, 'completion_tokens': 2, 'total_tokens': 11}}

(Chat)Completion objects cannot generate diverse outputs

Hello,

I have noticed that the interface returns the same generations independently of the number of responses requested (n > 1). Easy reproduction:

from easyllm.clients import huggingface

# helper to build llama2 prompt
huggingface.prompt_builder = "llama2"

response = huggingface.ChatCompletion.create(
    model="meta-llama/Llama-2-70b-chat-hf",
    messages=[
        {"role": "system", "content": "\nYou are a helpful assistant speaking like a pirate. argh!"},
        {"role": "user", "content": "What is the sun?"},
    ],
    temperature=0.9,
    top_p=0.6,
    max_tokens=256,
    n=10
)

print(response)

You will notice that the content of 'choices' will be exactly the same.

Looking at the code base, it seems that the issue comes from the fact that you are performing 'n' independent HTTP requests with the same generation parameters (but with a fixed seed).

# Normally this would not have been an issue since most of the time we are sampling
# from the model, however, gen_kwargs 
# have the same seed so the output will be the same for each
for _i in range(request.n):
                res = client.text_generation(
                    prompt,
                    details=True,
                    **gen_kwargs,
                )

I believe a solution would be to either change gen_kwargs to directly return n outputs by setting num_return_sequences to n, or by artificially generating different seeds for each request.

Need to pass custom_attributes='accept_eula=true' when invoking SageMaker endpoint

I get the following error when calling a SageMaker endpoint (deployed llama2).

Need to pass custom_attributes='accept_eula=true' as part of header. 
This means you have read and accept the end-user license agreement (EULA) of the model.

Is there a way to pass this attribute in the library?

Boto dependency shouldn't be foreced

I'm getting botocore dependency despite I'm just regularly installing the package. Either it should be added dependency or specified in the install like: easyllm[aws]

Rolling back to v0.5.0 solved the issue for now.

Move makefile to hatch scripts

Hi @philschmid

I'm discovering hatch being a poetry or setuptools user.

Could it be worth it to move all the Makefile scripts to the hatch scripts systems to make them use automatically the optional dependencies and also move the optional-dependencies under a [tool.hatch.envs.NAME]?

Like this example from the hatch documentation.

Wdyt?

Is there a way to enable structured output?

... like OpenAI function calling or jsonformer?

Multiple messages

Hi,

I am using llama2-chat-7b to do experiment

I want to do 1-shot prompting, which provide a turn of conversation between user and the system as an example.

The input, messages argument, for huggingface.ChatCompletion.create looks like this:

[{'role':'system'},
'content':'you are a movie recommender'},
{'role':'user'},
'content':'I am from USA, recommend me some movies'},
{'role':'system'},
'content':'1. Deadpool\n 2. The Princess\n 3. La La Land\n '},
{'role':'user'},
'content':'I am from China, recommend me some movies'},

As you can see, it contains a system message, following with an example.

However, huggingface.ChatCompletion.create throws me an error:


    153     prompt = buildBasePrompt(request.messages)
    154 else:
--> 155     prompt = build_prompt(request.messages, prompt_builder)
    157 # if the model is a url, use it directly
    158 if request.model:

     34         if prompt_builder is None:
     35             raise ValueError(
     36                 f"Prompt builder {builder} not found. Are you sure you spelled it correctly? \
     37 Available prompt builders are: {PROMPT_MAPPING.keys()}. \
     38 You can open an issue or PR to add more prompt builders at https://github.com/philschmid/easyllm"
     39             )
---> 40         prompt = prompt_builder(messages)
     41     else:
     42         prompt = builder(messages)


     30         conversation.append(f"<>\n{message.content}\n<</SYS>>\n\n")
     31     else:
---> 32         raise ValueError(f"Invalid message role: {message.role}")
     34 return startPrompt + "".join(conversation) + endPrompt

ValueError: Invalid message role: system

wondering how to make this kind of multi-message input to be valid?

Local inference of TGI

Hello,

I made a local docker container using TGI. I use TheBloke/Llama-2-13B-fp16 model

I define:
from easyllm.clients import huggingface
huggingface.prompt_builder = "llama2"
huggingface.api_base = "http://localhost:8080"

and use simple completion (same for chat):

response = huggingface.Completion.create(
prompt="What is the meaning of life?",
temperature=0.9,
top_p=0.6,
max_tokens=1024,
)
response:

But response look like :
{'id': 'hf-FJZ72HMlsF',
'object': 'text.completion',
'created': 1691331574,
'choices': [{'index': 0,
'text': '\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST] What is the meaning of life? [/INST]\n[INST]',
'finish_reason': 'length'}],
'usage': {'prompt_tokens': 11,
'completion_tokens': 1024,
'total_tokens': 1035}}

Do you have any idea please ?
Thanks

Ps: http://localhost:8080/info give
{
"model_id": "TheBloke/Llama-2-13B-fp16",
"model_sha": "b2e65e8ad4bb35e5abaee0170ebd5fc2134a50bb",
"model_dtype": "torch.float16",
"model_device_type": "cuda",
"model_pipeline_tag": "text-generation",
"max_concurrent_requests": 128,
"max_best_of": 2,
"max_stop_sequences": 4,
"max_input_length": 1024,
"max_total_tokens": 2048,
"waiting_served_ratio": 1.2,
"max_batch_total_tokens": 7984,
"max_waiting_tokens": 20,
"validation_workers": 2,
"version": "1.0.0",
"sha": "3ef5ffbc6400370ff2e1546550a6bad3ac61b079",
"docker_label": "sha-3ef5ffb"
}

Bad request: Model requires a Pro subscription

HTTPError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
260 try:
--> 261 response.raise_for_status()
262 except HTTPError as e:

7 frames
HTTPError: 400 Client Error: Bad Request for url: https://api-inference.huggingface.co/models/meta-llama/Llama-2-70b-chat-hf

The above exception was the direct cause of the following exception:

BadRequestError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
297 f"\n\nBad request for {endpoint_name} endpoint:" if endpoint_name is not None else "\n\nBad request:"
298 )
--> 299 raise BadRequestError(message, response=response) from e
300
301 # Convert HTTPError into a HfHubHTTPError to display request information

BadRequestError: (Request ID: dzqk3ZqfFF4wh7pIc4rC7)

Bad request:
Model requires a Pro subscription

Integration with self hosted models through TGI

Hello,

It would be great if easyLLM could integrate with models that are self hosted through huggingface text generation inference. This could be built the same way the openAI client deals with models hosted through AzureOpenAI (by providing a base URL, deployment-name, API version etc).

Would be happy to contribute and build this feature out.

Thank You!

Chat completion format for empty system content.

Hi,

I was looking the meta implementation for the chat completion, and it seems that there is no way to get an empty system.
However, with easyllm is possible to use it with empty system input.
Is it expected to work correctly? I mean, LLaMA v2 was trained with empty system?

What is the difference between EasyLLM and Langchain?

In the readme it was mentioned that both open source and closed source LLMs can be used with EasyLLM. Langchain seems to provide that as well. What differences does EasyLLM have with Langchain?

OverloadedError: Model is overloaded

I am using the meta-llama/Llama-2-70b-chat-hf model on a data frame with 3000 rows, each including a 500-token text. But after 10 rows is processed, I get the following error

` in call_llama2_api(self, messages)
79 def call_llama2_api(self, messages):
80 huggingface.prompt_builder = "llama2"
---> 81 response = huggingface.ChatCompletion.create(
82 model="meta-llama/Llama-2-70b-chat-hf",
83 messages=messages,

/usr/local/lib/python3.10/dist-packages/easyllm/clients/huggingface.py in create(messages, model, temperature, top_p, top_k, n, max_tokens, stop, stream, frequency_penalty, debug)
205 generated_tokens = 0
206 for _i in range(request.n):
--> 207 res = client.text_generation(
208 prompt,
209 details=True,

/usr/local/lib/python3.10/dist-packages/huggingface_hub/inference/_client.py in text_generation(self, prompt, details, stream, model, do_sample, max_new_tokens, best_of, repetition_penalty, return_full_text, seed, stop_sequences, temperature, top_k, top_p, truncate, typical_p, watermark, decoder_input_details)
1063 decoder_input_details=decoder_input_details,
1064 )
-> 1065 raise_text_generation_error(e)
1066
1067 # Parse output

/usr/local/lib/python3.10/dist-packages/huggingface_hub/inference/_text_generation.py in raise_text_generation_error(http_error)
472 raise IncompleteGenerationError(message) from http_error
473 if error_type == "overloaded":
--> 474 raise OverloadedError(message) from http_error
475 if error_type == "validation":
476 raise ValidationError(message) from http_error

OverloadedError: Model is overloaded`

Is there any solution to fix this problem, like increasing the rate limit?

Pydantic problem

Hello,

If we update pydantic lib to last version V2 we have a model_dump problem.

Thanks

[Feature] Add support for logit_bias

Use-case: LMQL and others eth-sri/lmql#190 (comment)

Feature was added to transformers some months ago: huggingface/transformers#22168 (comment)

Bug: `model` must be defined despite docs saying that if not provided, it defaults to base url.

Hello!

How to reproduce:

from easyllm.clients import huggingface
from easyllm.prompt_utils import build_llama2_prompt

huggingface.prompt_builder = build_llama2_prompt

# send a ChatCompletion request to count to 100
response = huggingface.ChatCompletion.create(
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
)

Output:

Traceback (most recent call last):
  File "...\demo.py", line 11, in <module>
    response = huggingface.ChatCompletion.create(
  File "...\easyllm\clients\huggingface.py", line 150, in create
    client = InferenceClient(url, token=api_key)
UnboundLocalError: local variable 'url' referenced before assignment

Given that the docs say that the model "defaults to the base url if not provided", I would expect this to work.

Should I include a fix for this in #7? And if so, how should I tackle it? Set a default or require the model to be given?

Tom Aarsen

philschmid / easyllm Goto Github PK

easyllm's Introduction

EasyLLM -

Supported Clients

🚀 Getting Started

💪🏻 Migration from OpenAI to HuggingFace

☑️ Key Features

🤝 Compatible Clients

⚙️ Helper Modules ⚙️

🙏 Contributing

📔 Citation & Acknowledgements

easyllm's People

Contributors

Stargazers

Watchers

Forkers

easyllm's Issues

How to reproduce:

Output:

Recommend Projects

Recommend Topics

Recommend Org