Git Product home page Git Product logo

langserve's Introduction

๐Ÿฆœ๏ธ๐Ÿ“ LangServe

Release Notes Downloads Open Issues

Overview

LangServe helps developers deploy LangChain runnables and chains as a REST API.

This library is integrated with FastAPI and uses pydantic for data validation.

In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.

Features

  • Input and Output schemas automatically inferred from your LangChain object, and enforced on every API call, with rich error messages
  • API docs page with JSONSchema and Swagger (insert example link)
  • Efficient /invoke, /batch and /stream endpoints with support for many concurrent requests on a single server
  • /stream_log endpoint for streaming all (or some) intermediate steps from your chain/agent
  • new as of 0.0.40, supports /stream_events to make it easier to stream without needing to parse the output of /stream_log.
  • Playground page at /playground/ with streaming output and intermediate steps
  • Built-in (optional) tracing to LangSmith, just add your API key (see Instructions)
  • All built with battle-tested open-source Python libraries like FastAPI, Pydantic, uvloop and asyncio.
  • Use the client SDK to call a LangServe server as if it was a Runnable running locally (or call the HTTP API directly)
  • LangServe Hub

Limitations

  • Client callbacks are not yet supported for events that originate on the server
  • OpenAPI docs will not be generated when using Pydantic V2. Fast API does not support mixing pydantic v1 and v2 namespaces. See section below for more details.

Security

  • Vulnerability in Versions 0.0.13 - 0.0.15 -- playground endpoint allows accessing arbitrary files on server. Resolved in 0.0.16.

Installation

For both client and server:

pip install "langserve[all]"

or pip install "langserve[client]" for client code, and pip install "langserve[server]" for server code.

LangChain CLI ๐Ÿ› ๏ธ

Use the LangChain CLI to bootstrap a LangServe project quickly.

To use the langchain CLI make sure that you have a recent version of langchain-cli installed. You can install it with pip install -U langchain-cli.

Setup

Note: We use poetry for dependency management. Please follow poetry doc to learn more about it.

1. Create new app using langchain cli command

langchain app new my-app

2. Define the runnable in add_routes. Go to server.py and edit

add_routes(app. NotImplemented)

3. Use poetry to add 3rd party packages (e.g., langchain-openai, langchain-anthropic, langchain-mistral etc).

poetry add [package-name] // e.g `poetry add langchain-openai`

4. Set up relevant env variables. For example,

export OPENAI_API_KEY="sk-..."

5. Serve your app

poetry run langchain serve --port=8100

Examples

Get your LangServe instance started quickly with LangChain Templates.

For more examples, see the templates index or the examples directory.

Description Links
LLMs Minimal example that reserves OpenAI and Anthropic chat models. Uses async, supports batching and streaming. server, client
Retriever Simple server that exposes a retriever as a runnable. server, client
Conversational Retriever A Conversational Retriever exposed via LangServe server, client
Agent without conversation history based on OpenAI tools server, client
Agent with conversation history based on OpenAI tools server, client
RunnableWithMessageHistory to implement chat persisted on backend, keyed off a session_id supplied by client. server, client
RunnableWithMessageHistory to implement chat persisted on backend, keyed off a conversation_id supplied by client, and user_id (see Auth for implementing user_id properly). server, client
Configurable Runnable to create a retriever that supports run time configuration of the index name. server, client
Configurable Runnable that shows configurable fields and configurable alternatives. server, client
APIHandler Shows how to use APIHandler instead of add_routes. This provides more flexibility for developers to define endpoints. Works well with all FastAPI patterns, but takes a bit more effort. server
LCEL Example Example that uses LCEL to manipulate a dictionary input. server, client
Auth with add_routes: Simple authentication that can be applied across all endpoints associated with app. (Not useful on its own for implementing per user logic.) server
Auth with add_routes: Simple authentication mechanism based on path dependencies. (No useful on its own for implementing per user logic.) server
Auth with add_routes: Implement per user logic and auth for endpoints that use per request config modifier. (Note: At the moment, does not integrate with OpenAPI docs.) server, client
Auth with APIHandler: Implement per user logic and auth that shows how to search only within user owned documents. server, client
Widgets Different widgets that can be used with playground (file upload and chat) server
Widgets File upload widget used for LangServe playground. server, client

Sample Application

Server

Here's a server that deploys an OpenAI chat model, an Anthropic chat model, and a chain that uses the Anthropic model to tell a joke about a topic.

#!/usr/bin/env python
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatAnthropic, ChatOpenAI
from langserve import add_routes

app = FastAPI(
    title="LangChain Server",
    version="1.0",
    description="A simple api server using Langchain's Runnable interfaces",
)

add_routes(
    app,
    ChatOpenAI(model="gpt-3.5-turbo-0125"),
    path="/openai",
)

add_routes(
    app,
    ChatAnthropic(model="claude-3-haiku-20240307"),
    path="/anthropic",
)

model = ChatAnthropic(model="claude-3-haiku-20240307")
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
add_routes(
    app,
    prompt | model,
    path="/joke",
)

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="localhost", port=8000)

If you intend to call your endpoint from the browser, you will also need to set CORS headers. You can use FastAPI's built-in middleware for that:

from fastapi.middleware.cors import CORSMiddleware

# Set all CORS enabled origins
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
    expose_headers=["*"],
)

Docs

If you've deployed the server above, you can view the generated OpenAPI docs using:

โš ๏ธ If using pydantic v2, docs will not be generated for invoke, batch, stream, stream_log. See Pydantic section below for more details.

curl localhost:8000/docs

make sure to add the /docs suffix.

โš ๏ธ Index page / is not defined by design, so curl localhost:8000 or visiting the URL will return a 404. If you want content at / define an endpoint @app.get("/").

Client

Python SDK

from langchain.schema import SystemMessage, HumanMessage
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from langserve import RemoteRunnable

openai = RemoteRunnable("http://localhost:8000/openai/")
anthropic = RemoteRunnable("http://localhost:8000/anthropic/")
joke_chain = RemoteRunnable("http://localhost:8000/joke/")

joke_chain.invoke({"topic": "parrots"})

# or async
await joke_chain.ainvoke({"topic": "parrots"})

prompt = [
    SystemMessage(content='Act like either a cat or a parrot.'),
    HumanMessage(content='Hello!')
]

# Supports astream
async for msg in anthropic.astream(prompt):
    print(msg, end="", flush=True)

prompt = ChatPromptTemplate.from_messages(
    [("system", "Tell me a long story about {topic}")]
)

# Can define custom chains
chain = prompt | RunnableMap({
    "openai": openai,
    "anthropic": anthropic,
})

chain.batch([{"topic": "parrots"}, {"topic": "cats"}])

In TypeScript (requires LangChain.js version 0.0.166 or later):

import { RemoteRunnable } from "@langchain/core/runnables/remote";

const chain = new RemoteRunnable({
  url: `http://localhost:8000/joke/`,
});
const result = await chain.invoke({
  topic: "cats",
});

Python using requests:

import requests

response = requests.post(
    "http://localhost:8000/joke/invoke",
    json={'input': {'topic': 'cats'}}
)
response.json()

You can also use curl:

curl --location --request POST 'http://localhost:8000/joke/invoke' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "input": {
            "topic": "cats"
        }
    }'

Endpoints

The following code:

...
add_routes(
    app,
    runnable,
    path="/my_runnable",
)

adds of these endpoints to the server:

  • POST /my_runnable/invoke - invoke the runnable on a single input
  • POST /my_runnable/batch - invoke the runnable on a batch of inputs
  • POST /my_runnable/stream - invoke on a single input and stream the output
  • POST /my_runnable/stream_log - invoke on a single input and stream the output, including output of intermediate steps as it's generated
  • POST /my_runnable/astream_events - invoke on a single input and stream events as they are generated, including from intermediate steps.
  • GET /my_runnable/input_schema - json schema for input to the runnable
  • GET /my_runnable/output_schema - json schema for output of the runnable
  • GET /my_runnable/config_schema - json schema for config of the runnable

These endpoints match the LangChain Expression Language interface -- please reference this documentation for more details.

Playground

You can find a playground page for your runnable at /my_runnable/playground/. This exposes a simple UI to configure and invoke your runnable with streaming output and intermediate steps.

Widgets

The playground supports widgets and can be used to test your runnable with different inputs. See the widgets section below for more details.

Sharing

In addition, for configurable runnables, the playground will allow you to configure the runnable and share a link with the configuration:

Chat playground

LangServe also supports a chat-focused playground that opt into and use under /my_runnable/playground/. Unlike the general playground, only certain types of runnables are supported - the runnable's input schema must be a dict with either:

  • a single key, and that key's value must be a list of chat messages.
  • two keys, one whose value is a list of messages, and the other representing the most recent message.

We recommend you use the first format.

The runnable must also return either an AIMessage or a string.

To enable it, you must set playground_type="chat", when adding your route. Here's an example:

# Declare a chain
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful, professional assistant named Cob."),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | ChatAnthropic(model="claude-2")


class InputChat(BaseModel):
    """Input for the chat endpoint."""

    messages: List[Union[HumanMessage, AIMessage, SystemMessage]] = Field(
        ...,
        description="The chat messages representing the current conversation.",
    )


add_routes(
    app,
    chain.with_types(input_type=InputChat),
    enable_feedback_endpoint=True,
    enable_public_trace_link_endpoint=True,
    playground_type="chat",
)

If you are using LangSmith, you can also set enable_feedback_endpoint=True on your route to enable thumbs-up/thumbs-down buttons after each message, and enable_public_trace_link_endpoint=True to add a button that creates a public traces for runs. Note that you will also need to set the following environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_PROJECT="YOUR_PROJECT_NAME"
export LANGCHAIN_API_KEY="YOUR_API_KEY"

Here's an example with the above two options turned on:

Note: If you enable public trace links, the internals of your chain will be exposed. We recommend only using this setting for demos or testing.

Legacy Chains

LangServe works with both Runnables (constructed via LangChain Expression Language) and legacy chains (inheriting from Chain). However, some of the input schemas for legacy chains may be incomplete/incorrect, leading to errors. This can be fixed by updating the input_schema property of those chains in LangChain. If you encounter any errors, please open an issue on THIS repo, and we will work to address it.

Deployment

Deploy to AWS

You can deploy to AWS using the AWS Copilot CLI

copilot init --app [application-name] --name [service-name] --type 'Load Balanced Web Service' --dockerfile './Dockerfile' --deploy

Click here to learn more.

Deploy to Azure

You can deploy to Azure using Azure Container Apps (Serverless):

az containerapp up --name [container-app-name] --source . --resource-group [resource-group-name] --environment  [environment-name] --ingress external --target-port 8001 --env-vars=OPENAI_API_KEY=your_key

You can find more info here

Deploy to GCP

You can deploy to GCP Cloud Run using the following command:

gcloud run deploy [your-service-name] --source . --port 8001 --allow-unauthenticated --region us-central1 --set-env-vars=OPENAI_API_KEY=your_key

Community Contributed

Deploy to Railway

Example Railway Repo

Deploy on Railway

Pydantic

LangServe provides support for Pydantic 2 with some limitations.

  1. OpenAPI docs will not be generated for invoke/batch/stream/stream_log when using Pydantic V2. Fast API does not support [mixing pydantic v1 and v2 namespaces].
  2. LangChain uses the v1 namespace in Pydantic v2. Please read the following guidelines to ensure compatibility with LangChain

Except for these limitations, we expect the API endpoints, the playground and any other features to work as expected.

Advanced

Handling Authentication

If you need to add authentication to your server, please read Fast API's documentation about dependencies and security.

The below examples show how to wire up authentication logic LangServe endpoints using FastAPI primitives.

You are responsible for providing the actual authentication logic, the users table etc.

If you're not sure what you're doing, you could try using an existing solution Auth0.

Using add_routes

If you're using add_routes, see examples here.

Description Links
Auth with add_routes: Simple authentication that can be applied across all endpoints associated with app. (Not useful on its own for implementing per user logic.) server
Auth with add_routes: Simple authentication mechanism based on path dependencies. (No useful on its own for implementing per user logic.) server
Auth with add_routes: Implement per user logic and auth for endpoints that use per request config modifier. (Note: At the moment, does not integrate with OpenAPI docs.) server, client

Alternatively, you can use FastAPI's middleware.

Using global dependencies and path dependencies has the advantage that auth will be properly supported in the OpenAPI docs page, but these are not sufficient for implement per user logic (e.g., making an application that can search only within user owned documents).

If you need to implement per user logic, you can use the per_req_config_modifier or APIHandler (below) to implement this logic.

Per User

If you need authorization or logic that is user dependent, specify per_req_config_modifier when using add_routes. Use a callable receives the raw Request object and can extract relevant information from it for authentication and authorization purposes.

Using APIHandler

If you feel comfortable with FastAPI and python, you can use LangServe's APIHandler.

Description Links
Auth with APIHandler: Implement per user logic and auth that shows how to search only within user owned documents. server, client
APIHandler Shows how to use APIHandler instead of add_routes. This provides more flexibility for developers to define endpoints. Works well with all FastAPI patterns, but takes a bit more effort. server, client

It's a bit more work, but gives you complete control over the endpoint definitions, so you can do whatever custom logic you need for auth.

Files

LLM applications often deal with files. There are different architectures that can be made to implement file processing; at a high level:

  1. The file may be uploaded to the server via a dedicated endpoint and processed using a separate endpoint
  2. The file may be uploaded by either value (bytes of file) or reference (e.g., s3 url to file content)
  3. The processing endpoint may be blocking or non-blocking
  4. If significant processing is required, the processing may be offloaded to a dedicated process pool

You should determine what is the appropriate architecture for your application.

Currently, to upload files by value to a runnable, use base64 encoding for the file (multipart/form-data is not supported yet).

Here's an example that shows how to use base64 encoding to send a file to a remote runnable.

Remember, you can always upload files by reference (e.g., s3 url) or upload them as multipart/form-data to a dedicated endpoint.

Custom Input and Output Types

Input and Output types are defined on all runnables.

You can access them via the input_schema and output_schema properties.

LangServe uses these types for validation and documentation.

If you want to override the default inferred types, you can use the with_types method.

Here's a toy example to illustrate the idea:

from typing import Any

from fastapi import FastAPI
from langchain.schema.runnable import RunnableLambda

app = FastAPI()


def func(x: Any) -> int:
    """Mistyped function that should accept an int but accepts anything."""
    return x + 1


runnable = RunnableLambda(func).with_types(
    input_type=int,
)

add_routes(app, runnable)

Custom User Types

Inherit from CustomUserType if you want the data to de-serialize into a pydantic model rather than the equivalent dict representation.

At the moment, this type only works server side and is used to specify desired decoding behavior. If inheriting from this type the server will keep the decoded type as a pydantic model instead of converting it into a dict.

from fastapi import FastAPI
from langchain.schema.runnable import RunnableLambda

from langserve import add_routes
from langserve.schema import CustomUserType

app = FastAPI()


class Foo(CustomUserType):
    bar: int


def func(foo: Foo) -> int:
    """Sample function that expects a Foo type which is a pydantic model"""
    assert isinstance(foo, Foo)
    return foo.bar


# Note that the input and output type are automatically inferred!
# You do not need to specify them.
# runnable = RunnableLambda(func).with_types( # <-- Not needed in this case
#     input_type=Foo,
#     output_type=int,
#
add_routes(app, RunnableLambda(func), path="/foo")

Playground Widgets

The playground allows you to define custom widgets for your runnable from the backend.

Here are a few examples:

Description Links
Widgets Different widgets that can be used with playground (file upload and chat) server, client
Widgets File upload widget used for LangServe playground. server, client

Schema

  • A widget is specified at the field level and shipped as part of the JSON schema of the input type
  • A widget must contain a key called type with the value being one of a well known list of widgets
  • Other widget keys will be associated with values that describe paths in a JSON object
type JsonPath = number | string | (number | string)[];
type NameSpacedPath = { title: string; path: JsonPath }; // Using title to mimick json schema, but can use namespace
type OneOfPath = { oneOf: JsonPath[] };

type Widget = {
  type: string; // Some well known type (e.g., base64file, chat etc.)
  [key: string]: JsonPath | NameSpacedPath | OneOfPath;
};

Available Widgets

There are only two widgets that the user can specify manually right now:

  1. File Upload Widget
  2. Chat History Widget

See below more information about these widgets.

All other widgets on the playground UI are created and managed automatically by the UI based on the config schema of the Runnable. When you create Configurable Runnables, the playground should create appropriate widgets for you to control the behavior.

File Upload Widget

Allows creation of a file upload input in the UI playground for files that are uploaded as base64 encoded strings. Here's the full example.

Snippet:

try:
    from pydantic.v1 import Field
except ImportError:
    from pydantic import Field

from langserve import CustomUserType


# ATTENTION: Inherit from CustomUserType instead of BaseModel otherwise
#            the server will decode it into a dict instead of a pydantic model.
class FileProcessingRequest(CustomUserType):
    """Request including a base64 encoded file."""

    # The extra field is used to specify a widget for the playground UI.
    file: str = Field(..., extra={"widget": {"type": "base64file"}})
    num_chars: int = 100

Example widget:

Chat Widget

Look at the widget example.

To define a chat widget, make sure that you pass "type": "chat".

  • "input" is JSONPath to the field in the Request that has the new input message.
  • "output" is JSONPath to the field in the Response that has new output message(s).
  • Don't specify these fields if the entire input or output should be used as they are ( e.g., if the output is a list of chat messages.)

Here's a snippet:

class ChatHistory(CustomUserType):
    chat_history: List[Tuple[str, str]] = Field(
        ...,
        examples=[[("human input", "ai response")]],
        extra={"widget": {"type": "chat", "input": "question", "output": "answer"}},
    )
    question: str


def _format_to_messages(input: ChatHistory) -> List[BaseMessage]:
    """Format the input to a list of messages."""
    history = input.chat_history
    user_input = input.question

    messages = []

    for human, ai in history:
        messages.append(HumanMessage(content=human))
        messages.append(AIMessage(content=ai))
    messages.append(HumanMessage(content=user_input))
    return messages


model = ChatOpenAI()
chat_model = RunnableParallel({"answer": (RunnableLambda(_format_to_messages) | model)})
add_routes(
    app,
    chat_model.with_types(input_type=ChatHistory),
    config_keys=["configurable"],
    path="/chat",
)

Example widget:

You can also specify a list of messages as your a parameter directly, as shown in this snippet:

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assisstant named Cob."),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | ChatAnthropic(model="claude-2")


class MessageListInput(BaseModel):
    """Input for the chat endpoint."""
    messages: List[Union[HumanMessage, AIMessage]] = Field(
        ...,
        description="The chat messages representing the current conversation.",
        extra={"widget": {"type": "chat", "input": "messages"}},
    )


add_routes(
    app,
    chain.with_types(input_type=MessageListInput),
    path="/chat",
)

See this sample file for an example.

Enabling / Disabling Endpoints (LangServe >=0.0.33)

You can enable / disable which endpoints are exposed when adding routes for a given chain.

Use enabled_endpoints if you want to make sure to never get a new endpoint when upgrading langserve to a newer verison.

Enable: The code below will only enable invoke, batch and the corresponding config_hash endpoint variants.

add_routes(app, chain, enabled_endpoints=["invoke", "batch", "config_hashes"], path="/mychain")

Disable: The code below will disable the playground for the chain

add_routes(app, chain, disabled_endpoints=["playground"], path="/mychain")

langserve's People

Contributors

169 avatar akira avatar baskaryan avatar bracesproul avatar ccurme avatar dennisrall avatar dirien avatar donbr avatar dqbd avatar efriis avatar eyurtsev avatar hinthornw avatar hwchase17 avatar jacoblee93 avatar jakerachleff avatar jgalego avatar kreneskyp avatar lucianotonet avatar nat-n avatar nfcampos avatar obi1kenobi avatar rahilvora avatar sowsan avatar warrentherabbit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

langserve's Issues

RemoteRunnable stream() doesn't work on url recommended by playground

Tl;dr, I think we should update the urljoin logic to ensure a trailing slash in the stream call so it doesn't overwrite the last argument.

Behavior:

Playground currently recommends a remote url with the /c/ suffix, which works with the invoke() command, but it fails for the stream() command.

from langserve import RemoteRunnable

chain = RemoteRunnable("http://127.0.0.1:8001/c/N4XyA")
res = chain.invoke({ "text": "What's good?" }) # this works
print(res)
for chunk in chain.stream({ "text": "What's good?" }): # this does not
    print("CHUNK", chunk)

This is when running the following langserve code

from fastapi import FastAPI
from langserve import add_routes

from langchain import hub
from langchain.chat_models import ChatAnthropic
from langchain.schema.output_parser import StrOutputParser

# This depends on the structure of the package you install
# from summarize_anthropic.chain import chain

app = FastAPI()

# Create chain
prompt = hub.pull("hwchase17/anthropic-paper-qa")
model = ChatAnthropic(model="claude-2", max_tokens=10000)
chain = prompt | model | StrOutputParser()
add_routes(app, chain)


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)

Error raised client-side isn't super helpful:

Traceback (most recent call last):
  File "/Users/wfh/code/lc/scripts/hammertime.py", line 4, in <module>
    for chunk in chain.stream({ "text": "What's good?" }):
  File "/Users/wfh/code/lc/langchain/libs/langchain/.venv/lib/python3.11/site-packages/langserve/client.py", line 476, in stream
    for sse in event_source.iter_sse():
  File "/Users/wfh/code/lc/langchain/libs/langchain/.venv/lib/python3.11/site-packages/httpx_sse/_api.py", line 28, in iter_sse
    self._check_content_type()
  File "/Users/wfh/code/lc/langchain/libs/langchain/.venv/lib/python3.11/site-packages/httpx_sse/_api.py", line 18, in _check_content_type
    raise SSEError(
httpx_sse._exceptions.SSEError: Expected response Content-Type to be 'text/event-stream', got 'application/json'

Server side, you see

INFO:     127.0.0.1:62672 - "POST /c/N4XyA/invoke HTTP/1.1" 200 OK
INFO:     127.0.0.1:62672 - "POST /c/stream HTTP/1.1" 404 Not Found

The url join behavior is inconsistent between the httpx default and the urljoin lib. In httpx, they always ensure a trailing slash (see below)

def _merge_url(self, url: URLTypes) -> URL:
        """
        Merge a URL argument together with any 'base_url' on the client,
        to create the URL used for the outgoing request.
        """
        merge_url = URL(url)
        if merge_url.is_relative_url:
            # To merge URLs we always append to the base URL. To get this
            # behaviour correct we always ensure the base URL ends in a '/'
            # separator, and strip any leading '/' from the merge URL.
            #
            # So, eg...
            #
            # >>> client = Client(base_url="https://www.example.com/subpath")
            # >>> client.base_url
            # URL('https://www.example.com/subpath/')
            # >>> client.build_request("GET", "/path").url
            # URL('https://www.example.com/subpath/path')
            merge_raw_path = self.base_url.raw_path + merge_url.raw_path.lstrip(b"/")
            return self.base_url.copy_with(raw_path=merge_raw_path)
        return merge_url

Whereas urljoin replaces the final argument for urljoin(self.url, "stream")

Seems like this is a couple issues?

  • Playground link probably wrong (should just be the localhost one(?))
  • Should make the url join work as expected when streaming (add a trailing slash to the base url if not provided when joining with the stream path)

Playground not working due to incorrect mimetype in index.js

I can't get the playground to load when using langserve because the index.js file is served with the mimetype "text/plain" and thus my browser (Chrome/Brave/Firefox) refuse to run it. Here are the response headers as reported by Chrome DevTools:
image

So, there are two problems here:

  1. The index.js file is being served with an incorrect mimetype, causing the playground to fail, and
  2. When the playground fails, a blank page is returned.

On 2, the HTML source code for the playground endpoint looks like this:

<body>
 <div id="root"></div>
  <script>
    try {
      window.CONFIG_SCHEMA = . . .
      window.INPUT_SCHEMA = . . . 
    } catch (error) {
      // pass
    }
  </script>
</body>

And thus, when the page fails to load, because the try/catch block does nothing, no informative error message is returned.

Runnable Lamdba type signatures more validation

Invalid type signatures on the function used for runnable lambda lead to errors that are difficult to understand:

    async def raise_error(input_: int) -> None: # <-- Issue due to None
        """Raise an error."""
        raise ValueError("This is a server side error.")

    add_routes(app, RunnableLambda(raise_error))

    # Invoke request
    async with get_async_client(app) as runnable:
        assert await runnable.ainvoke(1)

Also argumentless:

    async def raise_error() -> None:
        """Raise an error."""
        raise ValueError("This is a server side error.")

Gray out intermediate steps?

Not super important, but maybe there's something better we can do if there are no intermediate steps? Maybe expand and say that the runnable has no intermediate steps?

image

chain.stream() hangs on end

The remote runnable goes through all the chunks then freezes. The server side run is marked as done. Haven't debugged yet.

for chunk in chain.stream(
           {"text": text, "chat_history": chat_history, "last_run_id": last_run_id},
       ):
         on_chunk(chunk)
         results.append(chunk)

Add error event to for stream and astream log

If a server side exceptio occurs while using the stream and astream log endpoints, the server should submit an error event.

Double check that invalid requests result in an appropriate client sider error

LangServe Crash when multiple clients send requests.

When I send two or more requests to the server, it crashes, error logs below:

CUDA version: 11.7
NVDA Driver Version: 515.65.01

** On entry to SGEMM parameter number 13 had an illegal value

cuBLAS error 7 at /tmp/pip-install-_wvffp3m/llama-cpp-python_93b4c08269a545e2a4e8f946ea11d827/vendor/llama.cpp/ggml-cuda.cu:6140
current device: 0

CUDA error 4 at /tmp/pip-install-_wvffp3m/llama-cpp-python_93b4c08269a545e2a4e8f946ea11d827/vendor/llama.cpp/ggml-cuda.cu:455: driver shutting down
current device: 0
./bins/langchain_serve_test.sh: line 7: 311435 Segmentation fault (core dumped) python -u langchain_serve.py

Failure to import langserve because fastapi is missing

LangServe declares fastapi as optional dependency but fastapi seems required.
In a new virtual env:

pip install langserve

>>> import langserve
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/tmp/pythonProject/lib/python3.11/site-packages/langserve/__init__.py", line 9, in <module>
    from langserve.server import add_routes
  File "/private/tmp/pythonProject/lib/python3.11/site-packages/langserve/server.py", line 28, in <module>
    from fastapi import HTTPException, Request, Response
ModuleNotFoundError: No module named 'fastapi'

I can fix with a PR if needed.

langchain commands are not working

cmd >> langchain version
Traceback (most recent call last):
File "/var/github/workspace/langchain-serve/app/bin/langchain", line 5, in
from langchain_cli.cli import app
File "/var/github/workspace/langchain-serve/app/lib/python3.8/site-packages/langchain_cli/cli.py", line 6, in
from langchain_cli.namespaces import app as app_namespace
File "/var/github/workspace/langchain-serve/app/lib/python3.8/site-packages/langchain_cli/namespaces/app.py", line 26, in
from langchain_cli.utils.pyproject import (
File "/var/github/workspace/langchain-serve/app/lib/python3.8/site-packages/langchain_cli/utils/pyproject.py", line 15, in
pyproject_toml: Path, local_editable_dependencies: Iterable[tuple[str, Path]]
TypeError: 'type' object is not subscriptable

Playground can't load assets when using APIRouter

Hi,

in a modularized application with APIRouter-instances that set their own prefix, loading the playground does not work, as the prefix path is not prepended for the assets.

This seems like a simple fix: replacing line 1032 of server.py with

f"{namespace}/playground" if isinstance(app, FastAPI) else f"{app.prefix}{namespace}/playground",

works for me. I've prepared a corresponding pull request.

How to check Langserve Rest API swagger for ConversationalRetrievalChain with memory.

memory = ConversationBufferWindowMemory(k=4, memory_key="chat_history", return_messages=True)
qa = ConversationalRetrievalChain.from_llm(
llm,
db.as_retriever(search_kwargs={"k": 3}),
memory=memory
)

I created RAG QA with ConversationalRetrievalChain with memory, and did

app = FastAPI()
from langserve import add_routes
add_routes(app, qa)

its executed and given api also swagger UI also available how to test and in which endpoint i need to pass user question to get the response.

Need help to migrate my custom agent to LCEL to use with langserve

Hi, team-langchain,

I have an agent that uses memory, user-authentication as well as function calling. I'd like to migrate it to langserve in production but couldn't find anything as complex as my case in the examples in docs. So I got stuck and need help. Could you please give me an advise how to convert this code to LCEL?

agent.py:

from typing import Type

from langchain.agents import AgentExecutor, OpenAIFunctionsAgent
from langchain.agents.openai_functions_agent.agent_token_buffer_memory import (
    AgentTokenBufferMemory,
)
from langchain.callbacks.base import Callbacks
from langchain.chat_models import AzureChatOpenAI
from langchain.prompts import MessagesPlaceholder
from langchain.prompts.chat import BaseMessagePromptTemplate
from langchain.schema import SystemMessage

from .src import CustomFirestoreChatMessageHistory, CustomOpenAIFunctionsTool

HUMAN_MESSAGE_TEMPLATE = "..."


class CRMAgent:
    tool_classes: list[Type[CustomOpenAIFunctionsTool]]
    system_message_template: str

    def __init__(self, api_wrapper, crm_user, internal_user):
        self.api_wrapper = api_wrapper
        self.crm_user = crm_user
        self.hints_user = internal_user

        self.llm = AzureChatOpenAI(...)

        chat_memory = CustomFirestoreChatMessageHistory(
            user_id=internal_user["user_id"], session_id=internal_user["integration_id"]
        )
        self.memory = AgentTokenBufferMemory(chat_memory=chat_memory, llm=self.llm)

        self.tools = [
            ToolClass.from_api_wrapper(self.api_wrapper, **self.crm_user)
            for ToolClass in self.tool_classes
        ]

        system_message = self.system_message_template.format(...)
        extra_prompt_messages: list[BaseMessagePromptTemplate] | None = [
            MessagesPlaceholder(variable_name=self.memory.memory_key)
        ]
        self.agent = OpenAIFunctionsAgent.from_llm_and_tools(
            llm=self.llm,
            tools=self.tools,
            extra_prompt_messages=extra_prompt_messages,
            system_message=SystemMessage(content=system_message),
        )

        self.executor = AgentExecutor.from_agent_and_tools(
            agent=self.agent,
            tools=self.tools,
            memory=self.memory,
            handle_parsing_errors=True,
            return_intermediate_steps=True,
            metadata=self.hints_user,
        )

    def run(self, message: str, callbacks: Callbacks) -> str:
        """Run the agent on a human message."""
        human_message = HUMAN_MESSAGE_TEMPLATE.format(
            timestamp=self.api_wrapper.get_current_timestamp(),
            message=message,
        )
        inputs = {"input": human_message}
        return self.executor(inputs, callbacks=callbacks)["output"]

app.py

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()


class Credentials(BaseModel):
    access_token: str


class CRMInput(BaseModel):
    credentials: Credentials
    user_message: str
    pipeline_id: str
    user_id: str
    integration_id: str


@app.post("/crm")
def crm(payload: CRMInput):
    agent = CRMAgent(
        api_wrapper=APIWrapper(payload.credentials.access_token),
        crm_user={
            "default_owner_id": None,
            "default_pipeline_id": payload.pipeline_id,
        },
        internal_user={
            "user_id": payload.user_id,
            "integration_id": payload.integration_id,
        },
    )
    agent.run(payload.user_message, callbacks=None)

TLDR what this code does is:

  • On every request create an APIWrapper for a specific user
  • Fetch their's data from API to create tools (openai functions), so tools are different for each user
  • Fetch their's chat history from Firestore
  • Run agent

Do you have any ideas how to turn this into a langserve project?

No need to specify word `invoke` at the end of the url

It seems that adding the word invoke at the and of the url, it gets duplicated when hitting the server.

import { RemoteRunnable } from "langchain/runnables/remote";

const chain = new RemoteRunnable({
  url: `http://localhost:8000/chain/invoke/`,
});
const result = await chain.invoke({
  topic: "cats",
});

Here what I get when keeping the word invoke as per documentation:
INFO: 192.168.0.147:52826 - "POST /chain/invoke/invoke HTTP/1.1" 404 Not Found

And here what I get when I supress the word invoke, i.e., url is just http://localhost:8000/chain/:
INFO: 192.168.0.147:52848 - "POST /chain/invoke HTTP/1.1" 200 OK

It seems that when calling the method invoke, langserve already add the word invoke at the end of the url, under the hood.
In conclusion, suggest to confirm if my observation makes sense, and if so, adjust the documentation.
Best,
Roberto

Adding chunk to stream fails on dict

I'm building a rag chain, and have the rag portions on the server side and the chat history on the client side. When I stream, I'm getting dict chunks which breaks in client.py:547. If I change the block from:

if final_output:
  final_output += chunk
else:
  final_output = chunk

to

if final_output:
  if isinstance(final_output, dict):
    final_output.update(chunk)
  else:
    final_output += chunk
else:
  final_output = chunk

it works. Am I using this in a way that it wasn't intended, or can a solution like this be included?

Thanks,

Glenn

`stream` API returns entire answer after a while, instead of actually streaming the answer for `ConversationalRetrievalChain`

Hello,

I built a simple langchain app using ConversationalRetrievalChain and langserve.

It is working great for its invoke API. However when it comes to stream API, it returns entire answer after a while, instead of actually streaming the answer.

Here is the langserve part:

    streamer = TextStreamer(tokenizer=tokenizer, skip_prompt=True)

    pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_length=1000,
        temperature=0.1,
        top_p=0.95,
        top_k=40,
        repetition_penalty=1.1,
        do_sample=True,
        streamer=streamer,
        # device=0,  # -1 for CPU
    )

    tokenizer.pad_token_id = model.config.eos_token_id

    llm = HuggingFacePipeline(
        pipeline=pipeline,
    )

    condenseQuestionPrompt = PromptTemplate.from_template(CONDENSE_QUESTION_TEMPLATE)

    prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])
    chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        verbose=True,
        condense_question_prompt=condenseQuestionPrompt,
        combine_docs_chain_kwargs={"prompt": prompt},
        response_if_no_docs_found="I'm afraid that I don't have an answer to the question.",
    ).with_types(input_type=ChatHistory)

    app = FastAPI(
        title="app",
        version="0.1",
        description="The AI which knows everything about XYZ products.",
    )

    app.add_middleware(
        GZipMiddleware,
        minimum_size=1024,
    )

    app.add_middleware(
        CORSMiddleware,
        allow_origins=["*"],
        allow_methods=["*"],
        allow_headers=["*"],
    )

    add_routes(app, chain)

    uvicorn.run(app, host="0.0.0.0", port=8000)

Please help. Thanks!

Chat Widget Improvements

The chat widget doesn't quite feel like a chat experience yet, two improvements that could help:

  1. Focus the mouse on the next required input when loading the playground?
  2. Potentially swap roles of ctrl+enter <-> enter?

David's notes:

  1. Clear out the input field when submitting a message
  2. If a single chat widget is detected, consolidate input and output into a chat-like interface entirely.

localhost:8000 always report 404

I was trying to run the server at https://github.com/langchain-ai/langserve/blob/main/examples/conversational_retrieval_chain/server.py via python server.py, it can start up successfully.

But when I open localhost:8000, the browser will report as follows:

Screenshot 2023-10-19 at 11 42 12โ€ฏAM

The console report the following:

% python server.py
INFO:     Started server process [55125]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO:     ::1:52140 - "GET / HTTP/1.1" 404 Not Found
INFO:     ::1:52142 - "GET / HTTP/1.1" 404 Not Found

Am I doing anything wrong? Thanks

fail faster if i have two chains on same route

if i do something like:

add_routes(app, chain)
add_routes(app, rag_chain)

it doesnt through any errors until i go to docs. even if i go to playground it still works. imo you should get an error as soon as you start this server

[BUG]ModuleNotFoundError: No module named 'langserve.packages'

Traceback (most recent call last):
File "/root/miniconda3/envs/langserve/bin/langchain", line 5, in
from langchain_cli.cli import app
File "/root/miniconda3/envs/langserve/lib/python3.11/site-packages/langchain_cli/cli.py", line 6, in
from langchain_cli.namespaces import app as app_namespace
File "/root/miniconda3/envs/langserve/lib/python3.11/site-packages/langchain_cli/namespaces/app.py", line 11, in
from langserve.packages import get_langserve_export
ModuleNotFoundError: No module named 'langserve.packages'

Env: Python3.11 Conda, langchain langchain-cli 0.0.15 Langserve

IBM Watsonx and meta-llama/llama-2-70b-chat

I would like to test below code with LangServe but after the lang serve command there is following error:

TypeError: WatsonxLLM._call() takes from 2 to 3 positional arguments but 4 were given

When I run the code in Python with chain.invoke() everything works perfectly. Attached screen from LangSmith showing that there is (in my opinion) no issues with the code.

langsmith_watsonx

Hope you can advise me.

The code:

from pathlib import Path

from langchain.prompts import ChatPromptTemplate
from langchain.pydantic_v1 import BaseModel, Extra, BaseModel
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.utilities import SQLDatabase

import os
from dotenv import load_dotenv
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods
from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM
from typing import Any, List, Mapping, Optional, Union, Dict
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
import requests
import logging

logger = logging.getLogger(__name__)
load_dotenv()

creds = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": os.getenv("API_KEY", None)
}

def getBearer(apikey):
    form = {'apikey': apikey, 'grant_type': "urn:ibm:params:oauth:grant-type:apikey"}
    print("About to create bearer")
#    print(form)
    response = requests.post("https://iam.cloud.ibm.com/oidc/token", data = form)
    if response.status_code != 200:
        print("Bad response code retrieving token")
        raise Exception("Failed to get token, invalid status")
    json = response.json()
    if not json:
        print("Invalid/no JSON retrieving token")
        raise Exception("Failed to get token, invalid response")
    print("Bearer retrieved")
    return json.get("access_token")

project_id = os.getenv("PROJECT_ID", None)
creds["token"] = getBearer(creds["apikey"])
parameters = {
    # GenParams.DECODING_METHOD: "greedy",
    # GenParams.MAX_NEW_TOKENS: 256
    GenParams.DECODING_METHOD: "sample",
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.RANDOM_SEED: 42,
    GenParams.TEMPERATURE: 0.8,
    GenParams.TOP_K: 10,
    GenParams.TOP_P:1,
    GenParams.STOP_SEQUENCES: [";"]
}

llama_model = Model(
    model_id="meta-llama/llama-2-70b-chat",
    params=parameters,
    credentials=creds,
    project_id=project_id,
    )

llama_llm = WatsonxLLM(model=llama_model)

db_path = Path(__file__).parent / "nba_roster.db"
rel = db_path.relative_to(Path.cwd())
db_string = f"sqlite:///{rel}"
db = SQLDatabase.from_uri(db_string, sample_rows_in_table_info=0)

def get_schema(_):
    return db.get_table_info()

def run_query(query):
    return db.run(query)

template_query = """Based on the table schema and example below, write a SQL query that would answer the user's question:
{schema}

Example:
Question: How many players are in the NBA?
SQLQuery: SELECT COUNT(*) FROM nba_roster;

Question: {question}
SQL Query:""" 
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Given an input question, convert it to a SQL query. No pre-amble."),
        ("human", template_query),
    ]
)

sql_response = (
    RunnablePassthrough.assign(schema=get_schema)
    | prompt
    | llama_llm.bind(stop=["\nSQLResult:"])
    | StrOutputParser()
)

template_response = """Based on the table schema below, question, sql query, and sql response, write a natural language response:
{schema}

Question: {question}
SQL Query: {query}
SQL Response: {response}""" 

prompt_response = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question and SQL response, convert it to a natural "
            "language answer. No pre-amble.",
        ),
        ("human", template_response),
    ]
)

class InputType(BaseModel):
    question: str   

chain = (
    RunnablePassthrough.assign(query=sql_response).with_types(input_type=InputType) 
    | RunnablePassthrough.assign(
        schema=get_schema,
        response=lambda x: db.run(x["query"]),
    )
    | prompt_response
    | llama_llm
)

chain.invoke({"question": "How many players are there?"}) # only for testing. No use with LangServe

Long Descriptions do not render well

image

Chain

model = ChatOpenAI(temperature=0.5).configurable_alternatives(
    ConfigurableField(
        id="llm",
        name="LLM",
        description=(
            "Configurabe whether to use a high or a low temperature "
            "parameter for the LLM. Example of a configurable alternative field."
        ),
    ),
    high_temp=ChatOpenAI(temperature=0.9),
    low_temp=ChatOpenAI(temperature=0.1),
    default_key="medium_temp",
)
prompt = PromptTemplate.from_template(
    "tell me a joke about {topic}."
).configurable_fields(
    template=ConfigurableField(
        id="prompt",
        name="Prompt",
        description=(
            "The prompt to use. Must contain {topic}. This is an example "
            "of a configurable field. It allows the user to "
            "specify change the prompt template text."
        ),
    )
)
chain = prompt | model | StrOutputParser()

Prioritized Features

Updated 2023-11-16:

  • Examples to chat history persisted on backend
  • Add more ingestion options for files
  • Potentially add storage for runnable configuration options
  • Support custom user pydantic objecs; any custom pydantic objects that users define in runnables will likely fail to de-serialize to the client
  • Playground bugs and improvements
  • Add toggleable endpoints (in progress)

CI:

  • Automatically build FE assets prior to publishing, potentially publish to pytest first as a test?

Unsupported sse event type - metadata

I seem to be getting a sse metadata event that I wasn't getting before. It currently contains a run_id, which appears to be from upstream. The client code is logging the error with each request, since it is not expecting it.

Unable to pass APIRouter when registering routes using add_routes.

app = FastAPI()

router = APIRouter()

add_routes(
    router,
    ChatOpenAI(),
    path="/chat",
)

app.include_router(router)

produces the following error:

Traceback (most recent call last):
  File "...\app\main.py", line 21, in <module>
    add_routes(
  File "...\.venv\lib\site-packages\langserve\server.py", line 300, in add_routes
    _register_path_for_app(app, path)
  File "...\.venv\lib\site-packages\langserve\server.py", line 232, in _register_path_for_app
    if app in _APP_TO_PATHS:
  File "...\pyenv\pyenv-win\versions\3.10.6\lib\weakref.py", line 460, in __contains__
    return wr in self.data
TypeError: unhashable type: 'APIRouter'

Unable to bootstrap - `langchain` cli missing

According to README.md

To use the langchain CLI make sure that you have a recent version of langchain installed
and also typer. (pip install langchain typer or pip install "langchain[cli]")

langchain ../path/to/directory

And follow the instructions...

However after pip install langchain typer (also tried pip install "langchain[cli]"), langchain command is not found...

ERROR: Exception in ASGI application

I try to check pirate-speak/playground, but
I got a blank page in browser, Chrome Dev Tools shows 500 (Internal Server Error)

Langserve: 0.0.22
Langchain: 0.0.330

I got this errors in VSC terminal:
LANGSERVE: Playground for chain "/pirate-speak" is live at:
LANGSERVE: โ”‚
LANGSERVE: โ””โ”€โ”€> /pirate-speak/playground
LANGSERVE:
LANGSERVE: See all available routes at /docs

INFO: Application startup complete.
INFO: 127.0.0.1:52857 - "GET / HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:52857 - "GET /favicon.ico HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:52856 - "GET /docs HTTP/1.1" 200 OK
INFO: 127.0.0.1:52856 - "GET /openapi.json HTTP/1.1" 200 OK
INFO: 127.0.0.1:52883 - "GET /pirate-speak HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:52909 - "GET /pirate-speak/playground HTTP/1.1" 307 Temporary Redirect
INFO: 127.0.0.1:52909 - "GET /pirate-speak/playground/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:52909 - "GET /pirate-speak/playground/assets/index-ea49ff70.js HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 426, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\fastapi\applications.py", line 1115, in call
await super().call(scope, receive, send)
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\starlette\applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\starlette\middleware\errors.py", line 184, in call raise exc
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\starlette\middleware\errors.py", line 162, in call await self.app(scope, receive, _send)
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\starlette\middleware\exceptions.py", line 79, in call
raise exc
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\starlette\middleware\exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\fastapi\middleware\asyncexitstack.py", line 20, in call
raise e
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\fastapi\middleware\asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\starlette\routing.py", line 718, in call
await route.handle(scope, receive, send)
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\starlette\routing.py", line 276, in handle
await self.app(scope, receive, send)
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\starlette\routing.py", line 66, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\fastapi\routing.py", line 274, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\langserve\server.py", line 857, in playground
return await serve_playground(
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\AI_testing\LangChain_test\Python_231026\langchain-venv\Lib\site-packages\langserve\playground.py", line 47, in serve_playground response = PlaygroundTemplate(f.read()).substitute(
^^^^^^^^
File "C:\Users\Atti\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1250.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 163499: character maps to
INFO: 127.0.0.1:52920 - "GET /pirate-speak/playground/assets/index-244b2b9b.css HTTP/1.1" 200 OK
INFO: 127.0.0.1:53079 - "GET /pirate-speak/playground/assets/index-244b2b9b.css HTTP/1.1" 200 OK`

Any idea what's wrong? :)

Cannot use POST requests

I'm currently in the process of familiarizing myself with Langserve and am actively exploring its functionalities. In my attempt to run the provided examples, I've encountered an issue specifically related to routes utilizing POST requests, such as /invoke, /stream, and /batch. Unfortunately, despite my efforts, I'm unable to interact with these routes successfully. Notably, the only routes accessible to me are of the GET type, as depicted in the attached image for reference. Your guidance on resolving this matter would be greatly appreciated.

P.S. I've experimented with various examples from diverse sources, yet the persistent issue remains consistent across all instances.

image

Expose configurables through `RemoteRunnable`

For local runnables, you can use runnable.config_specs to get a list of all configurable fields. The RemoteRunnable doesn't expose these, even though it technically has the correct information via /config_schema endpoint. The same can be said for Input and Output with input_schema and output_schema, respectively.

Loading these endpoints (lazily) would enable better compatibility with applications that may rely on this. In my case, I automatically expose some of my chain's options through a UI when they're configurable. This now doesn't work when I use langserve.

Streaming with custom input?

I'm using a conversation_id as input to keep track of chat history and anyway the ChatOpenAI llm I'm using inside my function doesn't appear to be doing any streaming.

import json
from uuid import uuid4
from fastapi import FastAPI
from langserve import add_routes
from langchain.schema.runnable import RunnableLambda
from langserve.schema import CustomUserType

from chain import get_llm
from api import create_conversation, invoke_conversation
import uvicorn

app = FastAPI(
    title = 'Example LangServe API',
    version = '1.0',
    description = 'An example API for LangServe.',
)

llm = get_llm()
# sql_llm = get_llm()
# interpret_llm = get_llm()

class AskInput(CustomUserType):
    question: str
    conversation_id: str = None

def ask(AskInput) -> str:
    '''Ask the chatbot a question. If you have a conversation_id you can pick up where you left off.'''
    question = AskInput.question
    conversation_id = AskInput.conversation_id

    if conversation_id is None:
        conversation_id = str(uuid4())
    conversation = create_conversation(conversation_id, None, llm, verbose = True)
    ret_dict = invoke_conversation(conversation, question, conversation_id, llm)
    print(ret_dict)
    return json.dumps(ret_dict)

runnable = RunnableLambda(ask).with_types(
    input_type = AskInput
)

add_routes(
    app,
    runnable,
    path='/ask',
)

if __name__ == '__main__':
    uvicorn.run(app, host='localhost', port=8090)

Is there something I need to do to the llm so langserve will give it the correct callbacks? I put together a simple example using a single ChatOpenAI llm (no chains or nothing) and the streaming endpoint worked in that simple case

Set model parameters from request context, eg. Azure AD token

From the examples, I can see how to instantiate a model, eg. ChatOpenAI, using static parameters (eg. temperature, openai_api_X, etc). I can even see how to make them configurable w/ static alternatives.

However, I have a use-case where I need to dynamically set model parameters by introspecting the incoming request. For example -- to set openai_api_key to the user's token from incoming Authorization header token when using openai_api_type="azure_ad".

How can I accomplish this with LangServe's Runnable/add_routes pattern? Is there somehow a way to expose the request to Runnable.bind() ?

app = FastAPI(
    title="LangChain Server",
)

model = ChatOpenAI(
    temperature=0.5,
    openai_api_key=token, # This needs to be the user's incoming Authorization header
    openai_api_type="azure_ad",
)

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")

add_routes(
    app,
    prompt | model,
    path="/chain",
)

Renaming components in the `packages/` directory breaks component imports

Attemping to rename a package which has been added via langchain cli breaks langchain serve. This issue appears to affect all LangChain apps currently.

Expected Behavior

If a package component is renamed properly in all locations, LangServe should respect the renamed package and import it correctly.

Current Behavior

Renaming a langchain / langserve app component results in broken imports.

Possible Solution

Some extra step (which appears to be undocumented afaik) must be performed to update the package component so LangChain has visibility of it.

Steps to Reproduce

  1. Create a LangChain app with a package, e.g.: langchain app new my-app --package rag-conversation.
  2. Rename the component in the packages/ directory, e.g.: rag-conversation -> rag (be sure to also update pyproject.toml name and export_module, and app/server.py chain import.
  3. Run langchain serve.
  4. Get following error:
from rag import chain as rag_conversation_chain
ModuleNotFoundError: No module named 'rag'

Streaming RemoteRunnable not working w/ rag-chroma-private

Testing rag-chroma-private.

for -

from langserve.client import RemoteRunnable
rag_app = RemoteRunnable('http://0.0.0.0:8001/rag_chroma_private/')

this returns a generator -

rag_app.stream("How does agent memory work?")

and this returns the answer as a single item -

for item in rag_app.stream("How does agent memory work?"):
    print(item)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.