ajndkr / lanarky Goto Github PK

View Code? Open in Web Editor NEW

958.0 958.0 74.0 23.44 MB

The web framework for building LLM microservices

Home Page: https://lanarky.ajndkr.com/

License: MIT License

Python 98.82% Makefile 1.18%

fastapi llmops microservices python3 web

lanarky's Issues

feat: add streaming callback for Langchain agents

Langchain recently added a callback handler for agents: https://python.langchain.com/en/latest/modules/agents/streaming_stdout_final_only.html?highlight=agent%20streaming.

Acceptance Criteria

create new callback handlers based on FinalStreamingStdOutCallbackHandler

ERROR - Async generation not implemented for this LLM.

I use the example file "retrieval_qa_w_sources.py" but change the LLM model to GPT4All but It got error that

ERROR - Async generation not implemented for this LLM

Is it not support for that LLM or how I solve it?

Support to Custom Chains

I have a custom chain for Conversational Retrieval With Source Chain because currently not exist support on langchian

class ConversationalRetrievalWithSourcesChain(ConversationalRetrievalChain):
    """Chain for chatting with sources over documents."""

    sources_output_key: str = "sources"  #: :meta private:

    @property
    def output_keys(self) -> List[str]:
        """Return the output keys.

        :meta private:
        """
        _output_keys = [self.output_key]
        _output_keys.append(self.sources_output_key)
        if self.return_source_documents:
            _output_keys = _output_keys + ["source_documents"]
        return _output_keys

    def _call(
        self,
        inputs: Dict[str, Any],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
        result = super()._call(
            inputs=inputs,
            run_manager=run_manager,
        )
        answer = result[self.output_key]
        if re.search(r"SOURCES:\s", answer):
            answer, sources = re.split(r"SOURCES:\s", answer)
        else:
            sources = ""
        result[self.output_key] = answer
        result[self.sources_output_key] = sources
        return result

    async def _acall(
        self,
        inputs: Dict[str, Any],
        run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
        result = await super()._acall(
            inputs=inputs,
            run_manager=run_manager,
        )
        answer = result[self.output_key]
        if re.search(r"SOURCES:\s", answer):
            answer, sources = re.split(r"SOURCES:\s", answer)
        else:
            sources = ""
        result[self.output_key] = answer
        result[self.sources_output_key] = sources
        return result

I love a new feat: auto-detect callback

But now I receive a this kind of errors

Any idea to include my custom chain on auto-detect callback

Langchain dependency errors when installing with other libraries

The langchain version in pyproject.toml is fixed via a caret requirement (langchain = "^0.0.164") .

As langchain is updated almost daily, this makes fastapi-async-langchain basically unusable with other langchain-dependent modules, e.g.:

 and fastapi-async-langchain (0.5.2) depends on langchain (>=0.0.164,<0.0.165), jpllm (0.0.4)  is incompatible with fastapi-async-langchain (0.5.2). 
`` ``

It would be great to change the langchain requirement to e.g. ">0.0.164"  or "~0.0.164"  to make the use more flexible.

feat: add database integration with sqlalchemy

New milestone feature: db integration with sqlalchemy

Useful for storing user conversations, tracking token usage, and so on.

Acceptance Criteria

add lifespan event for sqlalchemy session
add basic schemas and models

Source Documents

I was testing with RetrievalQAWithSourcesChain and it is working fine but the document sources are not obtained, I wanted to ask you if you have thought about how to approach this issue.

Perhaps I should include them in the last message, but I don't know the best way to present that information.

Sending first message by agent

Hi,

Is there any way for the agent to send the first message. I could customize the websocket connection to do this. However, I don't want to store the initial message that triggered the bot to be stored. Is there any other possibility of doing it with websocket connection.

Broken compatibility with `BaseLangchainStreamingResponse` when using custom `input_variables`

I am trying the library with an LLMChain instead of ConversationChain.
My chain has custom inputs.
Say, for example, the prompt is

Tell me the difference between {number-one} and {number-two}

and thus

input_variables=["number-one", "number-two"]

This leads to the error:

AttributeError: 'dict' object has no attribute 'kwargs'

from BaseLangchainStreamingResponse(StreamingResponse).

LIB VERSION

langchain==0.0.157
fastapi_async_langchain==0.4.3

MY TRACEBACK

This initiates at line 43 of BaseLangchainStreamingResponse:

outputs = await self.chain_executor(send_token)

which gives the exception:

A single string input was passed in, but this chain expects multiple inputs

Basically, my input variables are being ignored/lost.

What happens next is that we got in the catch block below,

except Exception as e:
    if self.background is not None:
        self.background.kwargs["outputs"] = str(e)

however self.background (which I don't know what it means) has no attribute kwargs

SUMMARY

I do not know why but since the recent changes passing of input variables seem to be broken when using LLMChain (I don't know about ConversationChain). It was working fine before.

moreover, it is assumed that background has kwargs but that might not always the case.

feat: Allow overriding registered callbacks

I have created a custom streaming callback and am trying to register it with @register_streaming_callback("AgentExecutor") but it raises KeyError: "<class 'app.callbacks.MyCustomAgentsStreamingCallback'> already registered as AgentExecutor".

It would be very convenient to be able to override the default lanarky AgentExecutor callback since otherwise I have to create a custom AgentExecutor, register it and then write more code that supports my custom agent (for example the initialize_agent function in langchain which does not support custom agents atm. The workaround is further not ideal since I may not need a custom AgentExecutor, but only a custom callback handler.

One simple solution would be to simply replace the raise with a warning here instead https://github.com/ajndkr/lanarky/blob/main/lanarky/register/base.py#L26.

Acceptance Criteria

Being able to override registered callbacks

Unexpected ASGI message 'http.response.body' sent, after response already completed.

I'm getting this error message randomly when I launch the conversation_chain example :

Unexpected ASGI message 'http.response.body' sent, after response already completed.

I noticed the LangchainRouter in version v0.7.3. But how should I refactor my existing code?

app = FastAPI()
embeddings = OpenAIEmbeddings()

langchain.llm_cache = RedisSemanticCache(
    redis_url="redis://127.0.0.1:7777",
    embedding=embeddings
)

v_db = Milvus(
    embeddings,
    connection_args={"host": "127.0.0.1", "port": "19530"},
    collection_name="LangChainCollection",
)

class Ask(BaseModel):
    query: Optional[str] = None

@app.post("/sseAsk")
async def sseask(ask: Ask):
    llm = ChatOpenAI(
        max_tokens=1000,
        temperature=0,
        model_name="gpt-3.5-turbo",
        streaming=True)

    system_template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES"). 
                        If you don't know the answer, just say that it was not found in the knowledge base. Do not attempt to fabricate an answer.
                        ALWAYS return a "SOURCES" part in your answer.
                        Respond in Chinese.

                        QUESTION: {question}
                        =========
                        {summaries}
                        =========
                        FINAL ANSWER IN Chinese:"""

    messages = [
        SystemMessagePromptTemplate.from_template(system_template),
        HumanMessagePromptTemplate.from_template('{question}')
    ]

    prompt = ChatPromptTemplate.from_messages(messages)

    qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
        llm=llm,
        chain_type_kwargs={"prompt": prompt},
        chain_type="stuff",
        retriever=v_db.as_retriever(),
        return_source_documents=True,
    )
    return StreamingResponse.from_chain(qa_chain, ask.query, as_json=True, media_type="text/event-stream")


langchain_router = LangchainRouter(
    langchain_url="/sseAsk",
    langchain_object=LLMChain.from_string(
        llm=OpenAI(temperature=0), template="Answer the query.\n{query}"
    ),
    streaming_mode=0,
)

app.include_router(langchain_router)

Question: Streaming not working with Conversational Agents

Hi,

I have a conversational agent designed with custom prompt. If I don't define the type of the agent the agent works somehow, but its not able to give correct answers but I get streaming response. Whereas, if I define the type of the agent to Conversational Agent or Chat Agent. The streaming stops working. Not really sure what is the problem. Can you help me understand?

agent = ConversationalAgent.from_llm_and_tools(llm=llm, tools=tools, prefix=system_message)
#agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory, handle_parsing_errors="Check your output and make sure it conforms!")
agent_executor = initialize_agent(tools=tools, llm=llm, verbose=True, memory=memory, handle_parsing_errors="Check your output and make sure it conforms!")

langchain_router = LangchainRouter(
langchain_url="/chat",
langchain_object=agent_executor,
streaming_mode=1
)

langchain_router.add_langchain_api_websocket_route("/ws", langchain_object=agent_executor)

This is my code. I had to change to initialize_agent because of the same reason, but it doesn't work. If you could tell me what will be the fix. I can try fixing it and also become a contributor.

feat(tests): add unit tests

As a codeowner, I want to add some unit testing for this project as the library is growing and we want to avoid introducing breaking changes.

Acceptance Criteria

add unit tests with expected coverage >50%

It is difficult to determine the output result when I use requests to call the API.

params = {
"query": "What is Google for?"
}
for resp in requests.post(url="http://127.0.0.1:8000/sseAsk", json=params, stream=True).iter_content(
chunk_size=None, decode_unicode=False):
print(str(resp, 'utf8'))

The returned results are as follows:
{"token":""}
{"token":"Google"}
...
{"token":"。"}
{"token":""}
{"token":"","source_documents":...}

How should I determine the timing of the appearance of source_documents?

Great project, hoping to support chain's formatted output.

Some chains may output the sources of answers. It is hoped to increase the content of streaming output in JSON format for easy extraction and display on the web.

Remove openai module dependency from streaming.py

Currently, the openai module is required, even if I'm using another LLM. Why is it required, and can it be removed?

def openai_aiosession(func):
    """Decorator to set openai.aiosession for StreamingResponse."""

    @wraps(func)
    async def wrapper(*args, **kwargs):
        try:
            import openai  # type: ignore
        except ImportError:
            raise ImportError(
                "openai is not installed. Install it with `pip install 'lanarky[openai]'`."
            )

        openai.aiosession.set(aiohttp.ClientSession())
        logger.debug(f"opeanai.aiosession set: {openai.aiosession.get()}")

feat: add websocket support

This project is currently limited to streaming responses. For building applications like chatbots, users might require a websockets-based solution.

The idea is already implemented in https://github.com/hwchase17/chat-langchain. We can integrate it here as well.

Add support for custom AgentExecutor

In more advanced agents use cases, you often want to have greater control of the loop that controls how agents act.

The langchain AgentExecutor class has a method _take_next_step which is encouraged to be overridden in order to "(...) take control of how the agent makes and acts on choices." (https://github.com/hwchase17/langchain/blob/master/langchain/agents/agent.py#L748)

Trying to do so in conjunction with the newly added agent streaming support raises:
"Error! Chain type 'CustomAgentExecutor' is not currently supported by 'AsyncStreamingResponseCallback'."

To reproduce the error you can run examples/app/zero_shot_agent.py and replace the zero_shot_agent_dependency with a modified version of the function. This function creates a dummy custom subclass of AgentExecutor and initializes it manually rather than using the initialize_agent helper function as it currently does not support custom agent executors.

def zero_shot_agent_dependency() -> Callable[[], AgentExecutor]:
    @lru_cache(maxsize=1)
    def dependency() -> AgentExecutor:
        llm = ChatOpenAI(
            temperature=0,
            streaming=True,
        )
        tools = load_tools(["llm-math"], llm=llm)
        # Additional imports
        from langchain.agents import AgentExecutor
        from langchain.agents.loading import AGENT_TO_CLASS
        # Create dummy custom agent executor
        class CustomAgentExecutor(AgentExecutor):
            pass
        # Initialize agent without initialize_agent helper function
        agent = AgentType.ZERO_SHOT_REACT_DESCRIPTION
        agent_cls = AGENT_TO_CLASS[agent]
        agent_obj = agent_cls.from_llm_and_tools(llm, tools)
        agent = CustomAgentExecutor.from_agent_and_tools(
            agent=agent_obj,
            tools=tools,
            )
        return agent

    return dependency

Acceptance Criteria

Running a subclass of AgentExecutor no longer raises an error

When will support for GPTCache or RedisSemanticCache be added?

When will support for GPTCache or RedisSemanticCache be added?
This is very important in local knowledge application.

Add documentation for using Feat: JSON RESPONSE

Please add documentation for using JSON RESPONSE feature for streaming responses..

OpenAI Functions support and demo example

Hi,
I was wondering how we could plug the OpenAI Functions into Lanarky.

It takes the conversation chain as an input and Functions seem to be called by Agents
Do you think it's possible to add that and maybe some examples as well?
Thanks!

feat: add streaming and websocket support for all chain types

After #6, it is evident that the chain_wrapper_fn logic is not generic for all chain types. There needs to be a Response class for each chain type. This issue will track progress.

Todo:

Streaming

Websocket

feat: do we can handler multiple conversation, like `new chat` button in https://chat.openai.com/

do we can handler multiple conversation, like new chat in https://chat.openai.com/

2/3 examples don't seem to work

Scenario

Downloading Repo
Installing dependencies + adding .env file
Starting demos according to examples/README.md

Actual result

✅ Conversation Chain
❌ Retrieval QA with Sources Chain
❌ Conversational Retrieval

Retrieval QA with Sources Chain:

thorge@Ts-MacBook-Air ~/w/f/examples (main)> uvicorn app.retrieval_
qa_chain:app --reload
INFO:     Will watch for changes in these directories: ['/Users/thorge/workdir/fastapi-async-langchain/examples']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [4004] using WatchFiles
ERROR:    Error loading ASGI app. Could not import module "app.retrieval_qa_chain"

Conversational Retrieval starts, but only returns 'ConversationalRetrievalChain' in terminal and in the Gradio UI

thorge@Ts-MacBook-Air ~/w/fastapi-async-langchain (main)> curl -N -
X POST \
                                                              -H "A
ccept: text/event-stream" -H "Content-Type: application/json" \
                                                              -d '{
"query": "Give me list of text splitters available with code sample
s" }' \
                                                              http:
//localhost:8000/chat
'ConversationalRetrievalChain'⏎

feat: add `BackgroundTasks` support for `WebsocketConnection`

Discussed in #65

^{Originally posted by votkon May 26, 2023}
That would be great to have chat logging implemented somehow.
Currently, it is impossible(?) to save the chain.memory.chat_memory.messages on websocket disconnection.
Websocket disconnect doesn't seem to raise any exceptions when closed correctly.

Issue with second message

I was testing it and the first message worked correctly but if I make a second request it shows the following error.

bug(response): error handling in streaming response causes background task to fail

Scenario

chain executor inside the streaming response class throws an error

Actual result

background task execution fails due to missing kwargs

Expected result

background task should work normally

Acceptance criteria

fix error handling code block

JSON Dict Response

This is very useful in the use case of conversationalretrievalchain because you need an array of sources to properly format responses on any interface..

feat: add Llama-index adapter

New milestone feature: support for llama-index framework

Acceptance Criteria

investigate streaming with llama-index
add adapter modules for llama-index

feat: add gradio support for testing endpoints

Add support for a simple gradio chatbot interface to test new API chat endpoint changes.

feat: How can I do more things in the router?

from dotenv import load_dotenv
from fastapi import FastAPI
from langchain import ConversationChain
from langchain.chat_models import ChatOpenAI

from lanarky import LangchainRouter

load_dotenv()
app = FastAPI()

langchain_router = LangchainRouter(
   langchain_url="/chat",
   langchain_object=ConversationChain(
      llm=ChatOpenAI(temperature=0), verbose=True
   ),
   streaming_mode=0
)
app.include_router(langchain_router)

According to this code, if I want to do more things in /chat endpoint, how can I archive it? I mean, I am not just chat, maybe before&after /chat, I need to access db, query user etc, somthing like this.

Currently, maybe we can create another endpoint chat-wrapper to wrap this endpoint and then do other things, but obviously it is not a good way.

Handling Closed Connections

Recently we merged some changes into reworkd/AgentGPT#631 which enabled streaming on AgentGPT using lanarky!

After merging, I noticed an increase error rate due to what I assume is lanarky not closing the connection to openai if the client (React in our cases) closes the SSE stream early. This happens, for example, when the user closes the page while in the middle of a chat completion being streamed.

I was wondering if there was a way / best practice to close the connection to openai in the case mentioned above? Error details below for context.

Sample Implementation

from typing import List

from fastapi.responses import StreamingResponse as FastAPIStreamingResponse
from lanarky.responses import StreamingResponse
from langchain import LLMChain

from reworkd_platform.web.api.agent.model_settings import ModelSettings, create_model

def summarize(
    model_settings: ModelSettings, goal: str, query: str, snippets: List[str]
) -> FastAPIStreamingResponse:

    chain = LLMChain(
        llm=create_model(model_settings, streaming=True), prompt=summarize_prompt
    )

    return StreamingResponse.from_chain(
        chain,
        {"goal": goal, "query": query, "snippets": snippets},
        media_type="text/event-stream",
    )

Error

Unclosed connection
client_connection: Connection<ConnectionKey(host='api.openai.com', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=None)>

Thanks for the amazing work! You should setup github sponsors 😉

feat: add test client example in docs for streaming responses

As a user, I want lanarky to provide a test client example in the docs for testing streaming responses.

Acceptance Criteria

add test client example in docs

ConversationalRetrievalChain

Would this work together with ConversationalRetrievalChain?

E.g.

qa = ConversationalRetrievalChain(
    retriever=vectorstore.as_retriever(),
    combine_docs_chain=doc_chain,
    question_generator=question_generator,
    return_source_documents=False,
    get_chat_history=get_chat_history,
)

Ref: https://langchain-fanyi.readthedocs.io/en/latest/modules/chains/index_examples/chat_vector_db.html?highlight=ConversationalRetrievalChain#conversationalretrievalchain-with-streaming-to-stdout

Error in on_llm callback

Scenario

I upgraded to version 0.6.0 and now I get the following errors

Error in on_llm callback: 'AsyncConversationalRetrievalChainStreamingCallback' object has no attribute 'on_llm'

and

Error in on_llm callback: 'AsyncRetrievalQAWithSourcesChainStreamingCallback' object has no attribute 'on_llm'

It seems to have no effect with the result

Actual result

Expected result

Not receive this error message

Acceptance criteria

Metadata returned in docs

How should one go about extracting and outputting more doc meta data than just "source"?
In my Pinecone vector db, I have entered additional meta data like "author" etc.

How do you create combability with ConversationalRetrievalChain

The following code snippet is an example of how you can stream a response from Langchain's ConversationalRetrievalChain into the console but I don't understand how you can add compatibility to Lanarky. This documentation doesn't make a whole lot of sense to me: https://lanarky.readthedocs.io/en/latest/advanced/custom_callbacks.html

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
import pinecone
from langchain.vectorstores import Pinecone
from langchain.prompts.prompt import PromptTemplate

from langchain.chains.llm import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains.question_answering import load_qa_chain
from langchain.chat_models import ChatOpenAI

OPENAI_API_KEY = '...'
PINECONE_API_KEY = '...' # replace with your key
PINECONE_ENV = '...' # replace with your environment
PINECONE_INDEX = '...' # replace with your index name

# Construct a ConversationalRetrievalChain with a streaming llm for combine docs
# and a separate, non-streaming llm for question generation
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
# streaming_llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0, openai_api_key=OPENAI_API_KEY)
streaming_llm = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], openai_api_key=OPENAI_API_KEY, temperature=0, verbose=True)

QA_V2 = """You are a helpful AI assistant. Use the following pieces of context to answer the question at the end.
# If you don't know the answer, just say you don't know. DO NOT try to make up an answer.
# If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
# Use as much detail when as possible when responding.

# {context}

# Question: {question}
# All answers should be in MARKDOWN (.md) Format:"""

qap = PromptTemplate(
    template=QA_V2, input_variables=["context", "question"]
)

CD_V2 = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
All answers should be in MARKDOWN (.md) Format:
Standalone question:"""

cdp = PromptTemplate.from_template(CD_V2)



question_generator = LLMChain(llm=llm, prompt=cdp)
doc_chain = load_qa_chain(streaming_llm, chain_type="stuff", prompt=qap)


pinecone.init(api_key=PINECONE_API_KEY,environment=PINECONE_ENV)
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002', openai_api_key=OPENAI_API_KEY)
vectorstore = Pinecone.from_existing_index(index_name=PINECONE_INDEX, embedding=embeddings, text_key='text', namespace='testing_rtd1')

qa = ConversationalRetrievalChain(retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, return_source_documents=True, question_generator=question_generator)


chat_history = []
query = input('Enter Question: ')
result = qa({"question": query, "chat_history": chat_history})

Question: How to pass in search_kwargs dynamically to the chain based on the query input?

If we take the example that is in the docs:

def create_chain():
db = FAISS.load_local(
    folder_path="vector_stores/",
    index_name="langchain-python",
    embeddings=OpenAIEmbeddings(),
)

return RetrievalQAWithSourcesChain.from_chain_type(
    llm=ChatOpenAI(
        temperature=0,
        streaming=True,
    ),
    chain_type="stuff",
    retriever=db.as_retriever(),
    return_source_documents=True,
    verbose=True,
)

app = mount_gradio_app(FastAPI(title="RetrievalQAWithSourcesChainDemo"))
templates = Jinja2Templates(directory="templates")
chain = create_chain()


@app.get("/")
async def get(request: Request):
    return templates.TemplateResponse("index.html", {"request": request})


langchain_router = LangchainRouter(
    langchain_url="/chat", langchain_object=chain, streaming_mode=1
)

I want to be able to pass to the retriever in the chain the search_kwargs so it does some filtering,

But that should be based on the query of the input, for example, we might have another attribute in the input like a list of authorized_documents_codes so that we can pass this list to the retriever and it can filter the documents in the search.

How to do that with Lanarky?

Here is the original code without streaming to have an idea of what I'm talking about that I used for filtering and the goal is to make it in streaming but with filtering for the retriever:

def send_message(question: Question):
    system_template = """Use the following pieces of context to answer the user's question.
    If you don't know the answer, just say that "I don't know", don't try to make up an answer.
    If the context is empty, just say that "I don't know", but always reply in the same language of the user.
    ----------------
    context:
    {summaries}"""

    messages = [
        SystemMessagePromptTemplate.from_template(system_template),
        HumanMessagePromptTemplate.from_template(question.question),
    ]
    prompt = ChatPromptTemplate.from_messages(messages)

    llm = ChatOpenAI(
        model_name=question.model_type,
        temperature=question.temperature,
        max_tokens=question.max_tokens,
    )

    search_kwargs = {"k": question.k}

    if question.authorized_codes:
        if len(question.authorized_codes) > 1:
            or_filter = [
                {"source": {"$eq": code}} for code in question.authorized_codes
            ]
            search_kwargs["filter"] = {"$or": or_filter}
        elif len(question.authorized_codes) == 1:
            search_kwargs["filter"] = {
                "source": {"$eq": question.authorized_codes[0]}
            }

    chain = RetrievalQAWithSourcesChain.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=langchain_db_client.as_retriever(search_kwargs=search_kwargs),
        return_source_documents=True,
        chain_type_kwargs={"prompt": prompt},
    )

    results = chain(question.question)

    return results


@app.post("/query")
def query(body: Question):
    logger.info(f"Received query: {body}")
    response = send_message(body)
    logger.info("Query processed successfully.")
    return response

The problem with Lanarky is that the chain should be passed at the creation of the router, so that makes it not dynamic and susceptible for modifications..

Stablelm instead of openai?

Would it be possible to add open and free openai alternatives via langchain?

Thinking about https://huggingface.co/spaces/stabilityai/stablelm-tuned-alpha-chat

Would be very cool and open up some nice use cases. In our case we'd add it to a game, can hardly ask gamers to add a paid openai key (or use my own) at this point ✌️

Streaming response returns 'StuffDocumentsChain' not the stream

When calling StreamingResponse.from_chain, stream is not returned, instead 'StuffDocumentsChain' str is returned.

This code works as intended when calling normally without streaming.
Motorhead memory was omitted to simplify code.

Scenario

@app.get("/chat", response_model=ChatResponse)
async def chat_endpoint(
    sessionID: str, question: str, model: str, aimodel: str, engine: str
):
    chain, inputs = await answerQuestion(
        sessionID, question, model, aimodel
    )
    
    return StreamingResponse.from_chain(chain, inputs, media_type="text/event-stream")

async def answerQuestion(sessionID: str, question: str, model: str, ai_model: str):
    key = (
        os.environ.get("OPEN_API_KEY_GPT4")
        if ai_model == "gpt-4"
        else os.environ.get("OPEN_API_KEY")
    )

    namespace = "05a77398-8e25-49de-9ba1-3338366f2386"
    vectorstore = load_pinecone_existing_index(namespace, model)
    llm = OpenAI(
        openai_api_key=key,
        temperature=0.125,
        model_name=ai_model, 
        verbose=True, 
        streaming=True,
    )

    retriever = SelfQueryRetriever.from_llm(
        llm,
        vectorstore,
        document_content_description,
        metadata_field_info,
        verbose=True,
    )


    template = """Some template...

    {context}

    {chat_history}
    Zákazník: {human_input}
    Chatbot:"""

    docs = retriever.get_relevant_documents(question)

    llm = ChatOpenAI(
        model_name=ai_model,
        streaming=True,
        verbose=True,
        temperature=0.125,
        max_tokens=600,
        openai_api_key=key,
    )

    prompt = PromptTemplate(
        input_variables=["chat_history", "human_input", "context"], template=template
    )

    chain = load_qa_chain(
        llm=llm,
        document_variable_name="context",
        chain_type="stuff",
        prompt=prompt,
        verbose=True,
    )

    inputs = {
        "input_documents": docs,
        "human_input": question,
    }

    return chain, inputs

Actual result

'StuffDocumentsChain'

Expected result

Stream is returned. For example case, from Read.me it works

Acceptance criteria

Endoint should return stream.

How to utilize on frontend

Could you provide some information and examples on how we can handle/display the streaming response on the frontend?

Pass multiple Inputs to LLMChainStreamingResponse?

I came across this library and I love it so far, I got the conversation chain running locally and would like to use it for my application now where I use my own LLMChain that I pass to LLMChainStreamingResponse.
My Problem is that I dont know how to pass my input keys to my prompt template when I use the from_chain method on LLMChainStreamingResponse.

@app.post("/chat")
async def chat(request: Prompt) -> LLMChainStreamingResponse:
    context_str = get_context(request.prompt)
    return LLMChainStreamingResponse.from_chain(chain,
                                                inputs={"question": request.prompt, "context": context_str,
                                                        "history": ""},
                                                media_type="text/event-stream")

This is how I thought I could use it,
I would be super thankful for advice!

feat: add Option to Accept or Not Accept WebSocket Connection

Description

As a developer, I want an option to control whether to accept a WebSocket connection or not so I can prevent unnecessary "websocket.accept" messages after the connection has already been accepted.

Currently, my application is encountering an issue where it expects a "websocket.send" or "websocket.close" message after the WebSocket connection has been accepted, but it is receiving another "websocket.accept" message. This is likely due to the connection.connect() method trying to accept the WebSocket connection again, which is not allowed.

To address this issue, this pull request introduces an option to control whether to accept a WebSocket connection or not. This will provide more flexibility in managing WebSocket connections and help prevent the aforementioned issue.

For instance, in the case where the first message from the client is used to validate a JWT token, we can use this new option to prevent the connection.connect() method from accepting the WebSocket connection again after it has already been accepted:

@router.websocket("/chat")
async def chat_websocket_endpoint(
    websocket: WebSocket,
    chain: ConversationChain = Depends(conversation_chain),
):
    # Accept the connection
    await websocket.accept()

    # Wait for the client to send the initial authentication message (jwt token)
    token = await websocket.receive_text()

    # Verify the token and get the user
    user = await get_current_user_from_token(token)

    if not user:
        await websocket.close(code=status.WS_1008_POLICY_VIOLATION)
        return

    # If the user is authenticated, proceed with the connection
    connection = WebsocketConnection.from_chain(chain=chain, websocket=websocket)
    await connection.connect(accept_connection=False)

Acceptance Criteria

Add a parameter or flag to the connection.connect() method to control whether to accept the WebSocket connection or not.
The flag should be set to true by default
If the flag is set to true, the connection.connect() method should accept the WebSocket connection.
If the flag is set to false, the connection.connect() method should not attempt to accept the WebSocket connection.
Ensure that no "websocket.accept" messages are sent after the WebSocket connection has been accepted.

Question: Other language can't use?

First of all, thank you for making this repo.
This is really convenient!

But, lanarky can't use Korean...
OPENAI_API(GPT api) can use Korean, Why lanarky can't use?

I'm trying to find something prompt_template, I can't find.
Can I get some advice?

fix: request and response model created by `LangchainRouter` breaks for complex chain types

Scenario

User creates an application using RetrievalQAWithSourcesChain with streaming_mode=0

Actual result

API call fails due to:

pydantic.error_wrappers.ValidationError: 1 validation error for QbutpLangchainResponse
response -> source_documents
  str type expected (type=type_error.str)

Expected result

API call returns chain output

Acceptance criteria

fix util functions which create pydantic models for API request and response

How do you return source_documents using ConversationalRetrievalChain?

Here's my code:

from enum import StrEnum
import os
from fastapi import FastAPI
from fastapi.responses import FileResponse
from fastapi.staticfiles import StaticFiles
from fastapi.middleware.cors import CORSMiddleware
from dotenv import load_dotenv
from lanarky import LangchainRouter, StreamingResponse
from langchain.chains import LLMChain, ConversationalRetrievalChain
from langchain.embeddings import GPT4AllEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import UnstructuredHTMLLoader
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.memory import ConversationSummaryBufferMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    PromptTemplate)
from langchain.memory.chat_message_histories.in_memory import (
    ChatMessageHistory)
from pydantic import BaseModel, constr
from AGPT4All import AGPT4All

load_dotenv()  # load environment variables from .env file
model_path = os.environ.get("MODEL_PATH")
titleDownloads = os.environ.get("TITLE_DOWNLOADS")
print(f"TITLE_DOWNLOADS={titleDownloads}")
print(f"MODEL_PATH={model_path}")

app = FastAPI(title="AI", version="0.0.1",
              description="AI Service")

app.mount("/static", StaticFiles(directory="static"), name="static")
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["GET", "POST"],
    allow_headers=["*"],
)

callbacks = [AsyncIteratorCallbackHandler(), StreamingStdOutCallbackHandler()]

llm = AGPT4All(model=model_path, callbacks=callbacks, verbose=True,
               max_tokens=4096, n_predict=4096, streaming=True)

embeddings = GPT4AllEmbeddings()

if os.path.exists("./chroma_db") and os.path.isdir("./chroma_db"):
    vectorstore = Chroma(persist_directory="./chroma_db",
                         embedding_function=embeddings)
else:
    loader = DirectoryLoader(
        f'{titleDownloads}',
        loader_cls=UnstructuredHTMLLoader,
        recursive=True, glob="**/*.html", show_progress=True)
    documents = loader.load()
    print(f"Total docs: {len(documents)}")
    chunks = []
    splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=32)
    for chunk in splitter.split_documents(documents):
        chunks.append(chunk)
    texts = [doc.page_content for doc in chunks]
    metadatas = [doc.metadata for doc in chunks]
    vectorstore = Chroma.from_texts(texts=texts,
                                    embedding=embeddings,
                                    metadatas=metadatas,
                                    persist_directory="./chroma_db")

retriever = vectorstore.as_retriever()

B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"


class Role(StrEnum):
    SYSTEM = "system"
    ASSISTANT = "assistant"
    USER = "user"


class Message(BaseModel):
    role: constr(
        regex=f"^({Role.ASSISTANT}|{Role.USER}|{Role.SYSTEM})$")  # NOQA
    content: str


class ChatRequest(BaseModel):
    model: str
    messages: list[Message]
    max_tokens: int
    temperature: float


def create_chain(messages: list[Message]):

    condense_template_str = (
        "You are an expert at summarizing chat histories and questions. "
        "Given the following Chat History and a Follow Up Question, "
        "rephrase the follow up question to be a new Standalone Question. \n "
        "Chat History: \n"
        "{chat_history} \n"
        "Follow Up Question: {question} \n"
        "Standalone question:")
    condense_template = PromptTemplate.from_template(
        f"{B_INST} {B_SYS}{condense_template_str.strip()}{E_SYS} {E_INST}")

    prompt_template_str = (
        "You are a helpful AI. "
        "Use the following Context and Chat History to answer the "
        "question at the end with a helpful and detailed answer. "
        "If you don't know the answer, just say "
        "'I don't know'; don't try to make up an answer. \n"
        "Context: {context} \n"
        "Chat History: {chat_history} \n"
        "Question: {question} \n"
        "Helpful Answer:")
    qa_prompt_template = ChatPromptTemplate.from_template(
        f"{B_INST} {B_SYS}{prompt_template_str.strip()}{E_SYS} {E_INST}")

    chat_memory = ChatMessageHistory()
    for message in messages:
        if message.role == Role.USER:
            chat_memory.add_user_message(message.content)
        elif message.role == Role.ASSISTANT:
            chat_memory.add_ai_message(message.content)

    memory = ConversationSummaryBufferMemory(
        llm=llm,
        chat_memory=chat_memory,
        memory_key="chat_history",
        input_key="question",
        return_messages=True)

    question_generator = LLMChain(llm=llm, prompt=condense_template,
                                  memory=memory, verbose=True)

    doc_chain = load_qa_chain(llm=llm, chain_type="stuff",
                              prompt=qa_prompt_template,  verbose=True)

    return ConversationalRetrievalChain(
            combine_docs_chain=doc_chain,
            memory=memory,
            retriever=retriever,
            question_generator=question_generator,
            return_generated_question=True,
            return_source_documents=True,
            output_key="answer",
            verbose=True)


router = LangchainRouter(
    streaming_mode=1,
    # llm_cache_mode=3,  # GPTCache
)


@router.post(
    "/chat",
    summary="AI Chat",
    description="Chat with AI Service",
)
def chat(request: ChatRequest):
    chain = create_chain(
        # model=request.model,
        messages=request.messages[:-1],
        # max_tokens=request.max_tokens,
        # temperature=request.temperature
        )
    return StreamingResponse.from_chain(
        chain, request.messages[-1].content, as_json=False)


app.include_router(router, tags=["chat"])

Everything is working except I want the source documents and metadata returned. If I set as_json=True in StreamingResponse.from_chain it returns a ton of "token"s but it also includes the sources and metadata, but it's an unexpected response format.

Originally posted by @auxon in #108

feat: add Guidance adapter

New milestone feature: support for guidance framework

Acceptance Criteria

investigate streaming with guidance
add GuidanceRouter

feat: setup project documentation

As the project grows, there is a strong need for a docs page for new (and existing) users to find information which might be too verbose to add to the repo README.

For the first version, we can add:

getting started
api reference
tutorial 1: deploy basic llm chain
tutorial 2: deploy chains with vector store / retriever
tutorial 3: deploy custom chains

Acceptance Criteria

study https://readthedocs.org/
publish docs on https://readthedocs.org/

Can't get gradio testing to work

Scenario

I'm trying the provided example from "conversation_chain.py".
I use the following command: uvicorn app.conversation_chain:app --reload

Using the web template it works fine.
But if I use the gradio endpoint, the form is displayed but whenever I send a query I get the following error:

Actual result

HTTP error occurred: 422 Client Error: Unprocessable Entity for url: http://localhost:8000/chat

Expected result

The chatbot should display a response from the chatbot as in the web template UI.

More complete explanation to get started

Thank you for this great initiative.

It would be nice to have more complete examples to get started.
The inclusion of a testing gradio UI is great news but in order to get it to work there needs more stuff than what you provide in this quick-start code:

langchain_router = LangchainRouter(
    langchain_url="/chat",
    langchain_object=ConversationChain(
        llm=ChatOpenAI(temperature=0), verbose=True
    ),
    streaming_mode=1
  )
app.include_router(langchain_router)

I figured I could get the gradio interface to work using testing.gradio.mount_gradio_app(app=app) but that's not perfect because the gradio chat displays the answer as json code instead of text.
Also I'm still struggling to get the streaming to work on gradio. Something else is missing it fails and I get this error: "Error in on_chain_start callback: 'name'"). I guess a callback is missing?

Thanks for your help.

ajndkr / lanarky Goto Github PK

lanarky's Issues

Acceptance Criteria

Acceptance Criteria

LIB VERSION

MY TRACEBACK

SUMMARY

Acceptance Criteria

Acceptance Criteria

Acceptance Criteria

Todo:

Streaming

Websocket

Scenario

Actual result

Discussed in #65

Scenario

Actual result

Expected result

Acceptance criteria

Acceptance Criteria

Sample Implementation

Error

Acceptance Criteria

Scenario

Actual result

Expected result

Acceptance criteria

Scenario

Actual result

Expected result

Acceptance criteria

Description

Acceptance Criteria

Scenario

Actual result

Expected result

Acceptance criteria

Acceptance Criteria

Acceptance Criteria

Scenario

Actual result

Expected result

Recommend Projects

Recommend Topics

Recommend Org