ajndkr / lanarky Goto Github PK
View Code? Open in Web Editor NEWThe web framework for building LLM microservices
Home Page: https://lanarky.ajndkr.com/
License: MIT License
The web framework for building LLM microservices
Home Page: https://lanarky.ajndkr.com/
License: MIT License
Langchain recently added a callback handler for agents: https://python.langchain.com/en/latest/modules/agents/streaming_stdout_final_only.html?highlight=agent%20streaming.
FinalStreamingStdOutCallbackHandler
I use the example file "retrieval_qa_w_sources.py" but change the LLM model to GPT4All but It got error that
ERROR - Async generation not implemented for this LLM
Is it not support for that LLM or how I solve it?
I have a custom chain for Conversational Retrieval With Source Chain because currently not exist support on langchian
class ConversationalRetrievalWithSourcesChain(ConversationalRetrievalChain):
"""Chain for chatting with sources over documents."""
sources_output_key: str = "sources" #: :meta private:
@property
def output_keys(self) -> List[str]:
"""Return the output keys.
:meta private:
"""
_output_keys = [self.output_key]
_output_keys.append(self.sources_output_key)
if self.return_source_documents:
_output_keys = _output_keys + ["source_documents"]
return _output_keys
def _call(
self,
inputs: Dict[str, Any],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
result = super()._call(
inputs=inputs,
run_manager=run_manager,
)
answer = result[self.output_key]
if re.search(r"SOURCES:\s", answer):
answer, sources = re.split(r"SOURCES:\s", answer)
else:
sources = ""
result[self.output_key] = answer
result[self.sources_output_key] = sources
return result
async def _acall(
self,
inputs: Dict[str, Any],
run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
result = await super()._acall(
inputs=inputs,
run_manager=run_manager,
)
answer = result[self.output_key]
if re.search(r"SOURCES:\s", answer):
answer, sources = re.split(r"SOURCES:\s", answer)
else:
sources = ""
result[self.output_key] = answer
result[self.sources_output_key] = sources
return result
I love a new feat: auto-detect callback
But now I receive a this kind of errors
Any idea to include my custom chain on auto-detect callback
The langchain version in pyproject.toml is fixed via a caret requirement (langchain = "^0.0.164"
) .
As langchain is updated almost daily, this makes fastapi-async-langchain basically unusable with other langchain-dependent modules, e.g.:
and fastapi-async-langchain (0.5.2) depends on langchain (>=0.0.164,<0.0.165), jpllm (0.0.4) is incompatible with fastapi-async-langchain (0.5.2).
`` ``
It would be great to change the langchain requirement to e.g. ">0.0.164" or "~0.0.164" to make the use more flexible.
New milestone feature: db integration with sqlalchemy
Useful for storing user conversations, tracking token usage, and so on.
I was testing with RetrievalQAWithSourcesChain and it is working fine but the document sources are not obtained, I wanted to ask you if you have thought about how to approach this issue.
Perhaps I should include them in the last message, but I don't know the best way to present that information.
Hi,
Is there any way for the agent to send the first message. I could customize the websocket connection to do this. However, I don't want to store the initial message that triggered the bot to be stored. Is there any other possibility of doing it with websocket connection.
I am trying the library with an LLMChain instead of ConversationChain.
My chain has custom inputs.
Say, for example, the prompt is
Tell me the difference between {number-one} and {number-two}
and thus
input_variables=["number-one", "number-two"]
This leads to the error:
AttributeError: 'dict' object has no attribute 'kwargs'
from BaseLangchainStreamingResponse(StreamingResponse).
langchain==0.0.157
fastapi_async_langchain==0.4.3
This initiates at line 43 of BaseLangchainStreamingResponse:
outputs = await self.chain_executor(send_token)
which gives the exception:
A single string input was passed in, but this chain expects multiple inputs
Basically, my input variables are being ignored/lost.
What happens next is that we got in the catch block below,
except Exception as e:
if self.background is not None:
self.background.kwargs["outputs"] = str(e)
however self.background
(which I don't know what it means) has no attribute kwargs
I do not know why but since the recent changes passing of input variables seem to be broken when using LLMChain
(I don't know about ConversationChain). It was working fine before.
moreover, it is assumed that background
has kwargs
but that might not always the case.
I have created a custom streaming callback and am trying to register it with @register_streaming_callback("AgentExecutor")
but it raises KeyError: "<class 'app.callbacks.MyCustomAgentsStreamingCallback'> already registered as AgentExecutor"
.
It would be very convenient to be able to override the default lanarky AgentExecutor
callback since otherwise I have to create a custom AgentExecutor
, register it and then write more code that supports my custom agent (for example the initialize_agent
function in langchain which does not support custom agents atm. The workaround is further not ideal since I may not need a custom AgentExecutor
, but only a custom callback handler.
One simple solution would be to simply replace the raise
with a warning here instead https://github.com/ajndkr/lanarky/blob/main/lanarky/register/base.py#L26.
I'm getting this error message randomly when I launch the conversation_chain example :
I noticed the LangchainRouter in version v0.7.3. But how should I refactor my existing code?
app = FastAPI()
embeddings = OpenAIEmbeddings()
langchain.llm_cache = RedisSemanticCache(
redis_url="redis://127.0.0.1:7777",
embedding=embeddings
)
v_db = Milvus(
embeddings,
connection_args={"host": "127.0.0.1", "port": "19530"},
collection_name="LangChainCollection",
)
class Ask(BaseModel):
query: Optional[str] = None
@app.post("/sseAsk")
async def sseask(ask: Ask):
llm = ChatOpenAI(
max_tokens=1000,
temperature=0,
model_name="gpt-3.5-turbo",
streaming=True)
system_template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES").
If you don't know the answer, just say that it was not found in the knowledge base. Do not attempt to fabricate an answer.
ALWAYS return a "SOURCES" part in your answer.
Respond in Chinese.
QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER IN Chinese:"""
messages = [
SystemMessagePromptTemplate.from_template(system_template),
HumanMessagePromptTemplate.from_template('{question}')
]
prompt = ChatPromptTemplate.from_messages(messages)
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type_kwargs={"prompt": prompt},
chain_type="stuff",
retriever=v_db.as_retriever(),
return_source_documents=True,
)
return StreamingResponse.from_chain(qa_chain, ask.query, as_json=True, media_type="text/event-stream")
langchain_router = LangchainRouter(
langchain_url="/sseAsk",
langchain_object=LLMChain.from_string(
llm=OpenAI(temperature=0), template="Answer the query.\n{query}"
),
streaming_mode=0,
)
app.include_router(langchain_router)
Hi,
I have a conversational agent designed with custom prompt. If I don't define the type of the agent the agent works somehow, but its not able to give correct answers but I get streaming response. Whereas, if I define the type of the agent to Conversational Agent or Chat Agent. The streaming stops working. Not really sure what is the problem. Can you help me understand?
agent = ConversationalAgent.from_llm_and_tools(llm=llm, tools=tools, prefix=system_message)
#agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory, handle_parsing_errors="Check your output and make sure it conforms!")
agent_executor = initialize_agent(tools=tools, llm=llm, verbose=True, memory=memory, handle_parsing_errors="Check your output and make sure it conforms!")
langchain_router = LangchainRouter(
langchain_url="/chat",
langchain_object=agent_executor,
streaming_mode=1
)
langchain_router.add_langchain_api_websocket_route("/ws", langchain_object=agent_executor)
This is my code. I had to change to initialize_agent because of the same reason, but it doesn't work. If you could tell me what will be the fix. I can try fixing it and also become a contributor.
As a codeowner, I want to add some unit testing for this project as the library is growing and we want to avoid introducing breaking changes.
params = {
"query": "What is Google for?"
}
for resp in requests.post(url="http://127.0.0.1:8000/sseAsk", json=params, stream=True).iter_content(
chunk_size=None, decode_unicode=False):
print(str(resp, 'utf8'))
The returned results are as follows:
{"token":""}
{"token":"Google"}
...
{"token":"。"}
{"token":""}
{"token":"","source_documents":...}
How should I determine the timing of the appearance of source_documents?
Some chains may output the sources of answers. It is hoped to increase the content of streaming output in JSON format for easy extraction and display on the web.
Currently, the openai module is required, even if I'm using another LLM. Why is it required, and can it be removed?
def openai_aiosession(func):
"""Decorator to set openai.aiosession for StreamingResponse."""
@wraps(func)
async def wrapper(*args, **kwargs):
try:
import openai # type: ignore
except ImportError:
raise ImportError(
"openai is not installed. Install it with `pip install 'lanarky[openai]'`."
)
openai.aiosession.set(aiohttp.ClientSession())
logger.debug(f"opeanai.aiosession set: {openai.aiosession.get()}")
This project is currently limited to streaming responses. For building applications like chatbots, users might require a websockets-based solution.
The idea is already implemented in https://github.com/hwchase17/chat-langchain. We can integrate it here as well.
In more advanced agents use cases, you often want to have greater control of the loop that controls how agents act.
The langchain AgentExecutor
class has a method _take_next_step
which is encouraged to be overridden in order to "(...) take control of how the agent makes and acts on choices." (https://github.com/hwchase17/langchain/blob/master/langchain/agents/agent.py#L748)
Trying to do so in conjunction with the newly added agent streaming support raises:
"Error! Chain type 'CustomAgentExecutor' is not currently supported by 'AsyncStreamingResponseCallback'."
To reproduce the error you can run examples/app/zero_shot_agent.py
and replace the zero_shot_agent_dependency
with a modified version of the function. This function creates a dummy custom subclass of AgentExecutor
and initializes it manually rather than using the initialize_agent
helper function as it currently does not support custom agent executors.
def zero_shot_agent_dependency() -> Callable[[], AgentExecutor]:
@lru_cache(maxsize=1)
def dependency() -> AgentExecutor:
llm = ChatOpenAI(
temperature=0,
streaming=True,
)
tools = load_tools(["llm-math"], llm=llm)
# Additional imports
from langchain.agents import AgentExecutor
from langchain.agents.loading import AGENT_TO_CLASS
# Create dummy custom agent executor
class CustomAgentExecutor(AgentExecutor):
pass
# Initialize agent without initialize_agent helper function
agent = AgentType.ZERO_SHOT_REACT_DESCRIPTION
agent_cls = AGENT_TO_CLASS[agent]
agent_obj = agent_cls.from_llm_and_tools(llm, tools)
agent = CustomAgentExecutor.from_agent_and_tools(
agent=agent_obj,
tools=tools,
)
return agent
return dependency
AgentExecutor
no longer raises an errorWhen will support for GPTCache or RedisSemanticCache be added?
This is very important in local knowledge application.
Please add documentation for using JSON RESPONSE feature for streaming responses..
Hi,
I was wondering how we could plug the OpenAI Functions into Lanarky.
It takes the conversation chain as an input and Functions seem to be called by Agents
Do you think it's possible to add that and maybe some examples as well?
Thanks!
After #6, it is evident that the chain_wrapper_fn
logic is not generic for all chain types. There needs to be a Response
class for each chain type. This issue will track progress.
ConversationChain
LLMChain
LLMBashChain
LLMCheckerChain
LLMSummarizationCheckerChain
LLMMathChain
PALChain
QAWithSourcesChain
SQLDatabaseChain
SequentialChain
SimpleSequentialChain
VectorDBQA
VectorDBQAWithSourcesChain
APIChain
LLMRequestsChain
TransformChain
MapReduceChain
OpenAIModerationChain
SQLDatabaseSequentialChain
AnalyzeDocumentChain
HypotheticalDocumentEmbedder
ChatVectorDBChain
GraphQAChain
ConstitutionalChain
QAGenerationChain
RetrievalQA
RetrievalQAWithSourcesChain
ConversationalRetrievalChain
OpenAPIEndpointChain
ConversationChain
LLMChain
LLMBashChain
LLMCheckerChain
LLMSummarizationCheckerChain
LLMMathChain
PALChain
QAWithSourcesChain
SQLDatabaseChain
SequentialChain
SimpleSequentialChain
VectorDBQA
VectorDBQAWithSourcesChain
APIChain
LLMRequestsChain
TransformChain
MapReduceChain
OpenAIModerationChain
SQLDatabaseSequentialChain
AnalyzeDocumentChain
HypotheticalDocumentEmbedder
ChatVectorDBChain
GraphQAChain
ConstitutionalChain
QAGenerationChain
RetrievalQA
RetrievalQAWithSourcesChain
ConversationalRetrievalChain
OpenAPIEndpointChain
do we can handler multiple conversation, like new chat
in https://chat.openai.com/
✅ Conversation Chain
❌ Retrieval QA with Sources Chain
❌ Conversational Retrieval
Retrieval QA with Sources Chain:
thorge@Ts-MacBook-Air ~/w/f/examples (main)> uvicorn app.retrieval_
qa_chain:app --reload
INFO: Will watch for changes in these directories: ['/Users/thorge/workdir/fastapi-async-langchain/examples']
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [4004] using WatchFiles
ERROR: Error loading ASGI app. Could not import module "app.retrieval_qa_chain"
Conversational Retrieval starts, but only returns 'ConversationalRetrievalChain' in terminal and in the Gradio UI
thorge@Ts-MacBook-Air ~/w/fastapi-async-langchain (main)> curl -N -
X POST \
-H "A
ccept: text/event-stream" -H "Content-Type: application/json" \
-d '{
"query": "Give me list of text splitters available with code sample
s" }' \
http:
//localhost:8000/chat
'ConversationalRetrievalChain'⏎
Originally posted by votkon May 26, 2023
That would be great to have chat logging implemented somehow.
Currently, it is impossible(?) to save the chain.memory.chat_memory.messages on websocket disconnection.
Websocket disconnect doesn't seem to raise any exceptions when closed correctly.
chain executor inside the streaming response class throws an error
background task execution fails due to missing kwargs
background task should work normally
This is very useful in the use case of conversationalretrievalchain because you need an array of sources to properly format responses on any interface..
New milestone feature: support for llama-index
framework
llama-index
Add support for a simple gradio chatbot interface to test new API chat endpoint changes.
from dotenv import load_dotenv
from fastapi import FastAPI
from langchain import ConversationChain
from langchain.chat_models import ChatOpenAI
from lanarky import LangchainRouter
load_dotenv()
app = FastAPI()
langchain_router = LangchainRouter(
langchain_url="/chat",
langchain_object=ConversationChain(
llm=ChatOpenAI(temperature=0), verbose=True
),
streaming_mode=0
)
app.include_router(langchain_router)
According to this code, if I want to do more things in /chat
endpoint, how can I archive it? I mean, I am not just chat, maybe before&after /chat
, I need to access db, query user etc, somthing like this.
Currently, maybe we can create another endpoint chat-wrapper
to wrap this endpoint and then do other things, but obviously it is not a good way.
Recently we merged some changes into reworkd/AgentGPT#631 which enabled streaming on AgentGPT using lanarky!
After merging, I noticed an increase error rate due to what I assume is lanarky not closing the connection to openai if the client (React in our cases) closes the SSE stream early. This happens, for example, when the user closes the page while in the middle of a chat completion being streamed.
I was wondering if there was a way / best practice to close the connection to openai in the case mentioned above? Error details below for context.
from typing import List
from fastapi.responses import StreamingResponse as FastAPIStreamingResponse
from lanarky.responses import StreamingResponse
from langchain import LLMChain
from reworkd_platform.web.api.agent.model_settings import ModelSettings, create_model
def summarize(
model_settings: ModelSettings, goal: str, query: str, snippets: List[str]
) -> FastAPIStreamingResponse:
chain = LLMChain(
llm=create_model(model_settings, streaming=True), prompt=summarize_prompt
)
return StreamingResponse.from_chain(
chain,
{"goal": goal, "query": query, "snippets": snippets},
media_type="text/event-stream",
)
Unclosed connection
client_connection: Connection<ConnectionKey(host='api.openai.com', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=None)>
Thanks for the amazing work! You should setup github sponsors 😉
As a user, I want lanarky to provide a test client example in the docs for testing streaming responses.
Would this work together with ConversationalRetrievalChain?
E.g.
qa = ConversationalRetrievalChain(
retriever=vectorstore.as_retriever(),
combine_docs_chain=doc_chain,
question_generator=question_generator,
return_source_documents=False,
get_chat_history=get_chat_history,
)
I upgraded to version 0.6.0 and now I get the following errors
Error in on_llm callback: 'AsyncConversationalRetrievalChainStreamingCallback' object has no attribute 'on_llm'
and
Error in on_llm callback: 'AsyncRetrievalQAWithSourcesChainStreamingCallback' object has no attribute 'on_llm'
It seems to have no effect with the result
Not receive this error message
How should one go about extracting and outputting more doc meta data than just "source"?
In my Pinecone vector db, I have entered additional meta data like "author" etc.
The following code snippet is an example of how you can stream a response from Langchain's ConversationalRetrievalChain
into the console but I don't understand how you can add compatibility to Lanarky. This documentation doesn't make a whole lot of sense to me: https://lanarky.readthedocs.io/en/latest/advanced/custom_callbacks.html
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
import pinecone
from langchain.vectorstores import Pinecone
from langchain.prompts.prompt import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains.question_answering import load_qa_chain
from langchain.chat_models import ChatOpenAI
OPENAI_API_KEY = '...'
PINECONE_API_KEY = '...' # replace with your key
PINECONE_ENV = '...' # replace with your environment
PINECONE_INDEX = '...' # replace with your index name
# Construct a ConversationalRetrievalChain with a streaming llm for combine docs
# and a separate, non-streaming llm for question generation
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
# streaming_llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0, openai_api_key=OPENAI_API_KEY)
streaming_llm = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], openai_api_key=OPENAI_API_KEY, temperature=0, verbose=True)
QA_V2 = """You are a helpful AI assistant. Use the following pieces of context to answer the question at the end.
# If you don't know the answer, just say you don't know. DO NOT try to make up an answer.
# If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
# Use as much detail when as possible when responding.
# {context}
# Question: {question}
# All answers should be in MARKDOWN (.md) Format:"""
qap = PromptTemplate(
template=QA_V2, input_variables=["context", "question"]
)
CD_V2 = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
All answers should be in MARKDOWN (.md) Format:
Standalone question:"""
cdp = PromptTemplate.from_template(CD_V2)
question_generator = LLMChain(llm=llm, prompt=cdp)
doc_chain = load_qa_chain(streaming_llm, chain_type="stuff", prompt=qap)
pinecone.init(api_key=PINECONE_API_KEY,environment=PINECONE_ENV)
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002', openai_api_key=OPENAI_API_KEY)
vectorstore = Pinecone.from_existing_index(index_name=PINECONE_INDEX, embedding=embeddings, text_key='text', namespace='testing_rtd1')
qa = ConversationalRetrievalChain(retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, return_source_documents=True, question_generator=question_generator)
chat_history = []
query = input('Enter Question: ')
result = qa({"question": query, "chat_history": chat_history})
If we take the example that is in the docs:
def create_chain():
db = FAISS.load_local(
folder_path="vector_stores/",
index_name="langchain-python",
embeddings=OpenAIEmbeddings(),
)
return RetrievalQAWithSourcesChain.from_chain_type(
llm=ChatOpenAI(
temperature=0,
streaming=True,
),
chain_type="stuff",
retriever=db.as_retriever(),
return_source_documents=True,
verbose=True,
)
app = mount_gradio_app(FastAPI(title="RetrievalQAWithSourcesChainDemo"))
templates = Jinja2Templates(directory="templates")
chain = create_chain()
@app.get("/")
async def get(request: Request):
return templates.TemplateResponse("index.html", {"request": request})
langchain_router = LangchainRouter(
langchain_url="/chat", langchain_object=chain, streaming_mode=1
)
I want to be able to pass to the retriever in the chain the search_kwargs so it does some filtering,
But that should be based on the query of the input, for example, we might have another attribute in the input like a list of authorized_documents_codes so that we can pass this list to the retriever and it can filter the documents in the search.
How to do that with Lanarky?
Here is the original code without streaming to have an idea of what I'm talking about that I used for filtering and the goal is to make it in streaming but with filtering for the retriever:
def send_message(question: Question):
system_template = """Use the following pieces of context to answer the user's question.
If you don't know the answer, just say that "I don't know", don't try to make up an answer.
If the context is empty, just say that "I don't know", but always reply in the same language of the user.
----------------
context:
{summaries}"""
messages = [
SystemMessagePromptTemplate.from_template(system_template),
HumanMessagePromptTemplate.from_template(question.question),
]
prompt = ChatPromptTemplate.from_messages(messages)
llm = ChatOpenAI(
model_name=question.model_type,
temperature=question.temperature,
max_tokens=question.max_tokens,
)
search_kwargs = {"k": question.k}
if question.authorized_codes:
if len(question.authorized_codes) > 1:
or_filter = [
{"source": {"$eq": code}} for code in question.authorized_codes
]
search_kwargs["filter"] = {"$or": or_filter}
elif len(question.authorized_codes) == 1:
search_kwargs["filter"] = {
"source": {"$eq": question.authorized_codes[0]}
}
chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=langchain_db_client.as_retriever(search_kwargs=search_kwargs),
return_source_documents=True,
chain_type_kwargs={"prompt": prompt},
)
results = chain(question.question)
return results
@app.post("/query")
def query(body: Question):
logger.info(f"Received query: {body}")
response = send_message(body)
logger.info("Query processed successfully.")
return response
The problem with Lanarky is that the chain should be passed at the creation of the router, so that makes it not dynamic and susceptible for modifications..
Would it be possible to add open and free openai alternatives via langchain?
Thinking about https://huggingface.co/spaces/stabilityai/stablelm-tuned-alpha-chat
Would be very cool and open up some nice use cases. In our case we'd add it to a game, can hardly ask gamers to add a paid openai key (or use my own) at this point ✌️
When calling StreamingResponse.from_chain, stream is not returned, instead 'StuffDocumentsChain' str is returned.
This code works as intended when calling normally without streaming.
Motorhead memory was omitted to simplify code.
@app.get("/chat", response_model=ChatResponse)
async def chat_endpoint(
sessionID: str, question: str, model: str, aimodel: str, engine: str
):
chain, inputs = await answerQuestion(
sessionID, question, model, aimodel
)
return StreamingResponse.from_chain(chain, inputs, media_type="text/event-stream")
async def answerQuestion(sessionID: str, question: str, model: str, ai_model: str):
key = (
os.environ.get("OPEN_API_KEY_GPT4")
if ai_model == "gpt-4"
else os.environ.get("OPEN_API_KEY")
)
namespace = "05a77398-8e25-49de-9ba1-3338366f2386"
vectorstore = load_pinecone_existing_index(namespace, model)
llm = OpenAI(
openai_api_key=key,
temperature=0.125,
model_name=ai_model,
verbose=True,
streaming=True,
)
retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
verbose=True,
)
template = """Some template...
{context}
{chat_history}
Zákazník: {human_input}
Chatbot:"""
docs = retriever.get_relevant_documents(question)
llm = ChatOpenAI(
model_name=ai_model,
streaming=True,
verbose=True,
temperature=0.125,
max_tokens=600,
openai_api_key=key,
)
prompt = PromptTemplate(
input_variables=["chat_history", "human_input", "context"], template=template
)
chain = load_qa_chain(
llm=llm,
document_variable_name="context",
chain_type="stuff",
prompt=prompt,
verbose=True,
)
inputs = {
"input_documents": docs,
"human_input": question,
}
return chain, inputs
'StuffDocumentsChain'
Stream is returned. For example case, from Read.me it works
Endoint should return stream.
Could you provide some information and examples on how we can handle/display the streaming response on the frontend?
I came across this library and I love it so far, I got the conversation chain running locally and would like to use it for my application now where I use my own LLMChain that I pass to LLMChainStreamingResponse.
My Problem is that I dont know how to pass my input keys to my prompt template when I use the from_chain method on LLMChainStreamingResponse.
@app.post("/chat")
async def chat(request: Prompt) -> LLMChainStreamingResponse:
context_str = get_context(request.prompt)
return LLMChainStreamingResponse.from_chain(chain,
inputs={"question": request.prompt, "context": context_str,
"history": ""},
media_type="text/event-stream")
This is how I thought I could use it,
I would be super thankful for advice!
As a developer, I want an option to control whether to accept a WebSocket connection or not so I can prevent unnecessary "websocket.accept" messages after the connection has already been accepted.
Currently, my application is encountering an issue where it expects a "websocket.send" or "websocket.close" message after the WebSocket connection has been accepted, but it is receiving another "websocket.accept" message. This is likely due to the connection.connect() method trying to accept the WebSocket connection again, which is not allowed.
To address this issue, this pull request introduces an option to control whether to accept a WebSocket connection or not. This will provide more flexibility in managing WebSocket connections and help prevent the aforementioned issue.
For instance, in the case where the first message from the client is used to validate a JWT token, we can use this new option to prevent the connection.connect() method from accepting the WebSocket connection again after it has already been accepted:
@router.websocket("/chat")
async def chat_websocket_endpoint(
websocket: WebSocket,
chain: ConversationChain = Depends(conversation_chain),
):
# Accept the connection
await websocket.accept()
# Wait for the client to send the initial authentication message (jwt token)
token = await websocket.receive_text()
# Verify the token and get the user
user = await get_current_user_from_token(token)
if not user:
await websocket.close(code=status.WS_1008_POLICY_VIOLATION)
return
# If the user is authenticated, proceed with the connection
connection = WebsocketConnection.from_chain(chain=chain, websocket=websocket)
await connection.connect(accept_connection=False)
First of all, thank you for making this repo.
This is really convenient!
But, lanarky can't use Korean...
OPENAI_API(GPT api) can use Korean, Why lanarky can't use?
I'm trying to find something prompt_template, I can't find.
Can I get some advice?
User creates an application using RetrievalQAWithSourcesChain
with streaming_mode=0
API call fails due to:
pydantic.error_wrappers.ValidationError: 1 validation error for QbutpLangchainResponse
response -> source_documents
str type expected (type=type_error.str)
API call returns chain output
Here's my code:
from enum import StrEnum
import os
from fastapi import FastAPI
from fastapi.responses import FileResponse
from fastapi.staticfiles import StaticFiles
from fastapi.middleware.cors import CORSMiddleware
from dotenv import load_dotenv
from lanarky import LangchainRouter, StreamingResponse
from langchain.chains import LLMChain, ConversationalRetrievalChain
from langchain.embeddings import GPT4AllEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import UnstructuredHTMLLoader
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.memory import ConversationSummaryBufferMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts.chat import (
ChatPromptTemplate,
PromptTemplate)
from langchain.memory.chat_message_histories.in_memory import (
ChatMessageHistory)
from pydantic import BaseModel, constr
from AGPT4All import AGPT4All
load_dotenv() # load environment variables from .env file
model_path = os.environ.get("MODEL_PATH")
titleDownloads = os.environ.get("TITLE_DOWNLOADS")
print(f"TITLE_DOWNLOADS={titleDownloads}")
print(f"MODEL_PATH={model_path}")
app = FastAPI(title="AI", version="0.0.1",
description="AI Service")
app.mount("/static", StaticFiles(directory="static"), name="static")
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["GET", "POST"],
allow_headers=["*"],
)
callbacks = [AsyncIteratorCallbackHandler(), StreamingStdOutCallbackHandler()]
llm = AGPT4All(model=model_path, callbacks=callbacks, verbose=True,
max_tokens=4096, n_predict=4096, streaming=True)
embeddings = GPT4AllEmbeddings()
if os.path.exists("./chroma_db") and os.path.isdir("./chroma_db"):
vectorstore = Chroma(persist_directory="./chroma_db",
embedding_function=embeddings)
else:
loader = DirectoryLoader(
f'{titleDownloads}',
loader_cls=UnstructuredHTMLLoader,
recursive=True, glob="**/*.html", show_progress=True)
documents = loader.load()
print(f"Total docs: {len(documents)}")
chunks = []
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=32)
for chunk in splitter.split_documents(documents):
chunks.append(chunk)
texts = [doc.page_content for doc in chunks]
metadatas = [doc.metadata for doc in chunks]
vectorstore = Chroma.from_texts(texts=texts,
embedding=embeddings,
metadatas=metadatas,
persist_directory="./chroma_db")
retriever = vectorstore.as_retriever()
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
class Role(StrEnum):
SYSTEM = "system"
ASSISTANT = "assistant"
USER = "user"
class Message(BaseModel):
role: constr(
regex=f"^({Role.ASSISTANT}|{Role.USER}|{Role.SYSTEM})$") # NOQA
content: str
class ChatRequest(BaseModel):
model: str
messages: list[Message]
max_tokens: int
temperature: float
def create_chain(messages: list[Message]):
condense_template_str = (
"You are an expert at summarizing chat histories and questions. "
"Given the following Chat History and a Follow Up Question, "
"rephrase the follow up question to be a new Standalone Question. \n "
"Chat History: \n"
"{chat_history} \n"
"Follow Up Question: {question} \n"
"Standalone question:")
condense_template = PromptTemplate.from_template(
f"{B_INST} {B_SYS}{condense_template_str.strip()}{E_SYS} {E_INST}")
prompt_template_str = (
"You are a helpful AI. "
"Use the following Context and Chat History to answer the "
"question at the end with a helpful and detailed answer. "
"If you don't know the answer, just say "
"'I don't know'; don't try to make up an answer. \n"
"Context: {context} \n"
"Chat History: {chat_history} \n"
"Question: {question} \n"
"Helpful Answer:")
qa_prompt_template = ChatPromptTemplate.from_template(
f"{B_INST} {B_SYS}{prompt_template_str.strip()}{E_SYS} {E_INST}")
chat_memory = ChatMessageHistory()
for message in messages:
if message.role == Role.USER:
chat_memory.add_user_message(message.content)
elif message.role == Role.ASSISTANT:
chat_memory.add_ai_message(message.content)
memory = ConversationSummaryBufferMemory(
llm=llm,
chat_memory=chat_memory,
memory_key="chat_history",
input_key="question",
return_messages=True)
question_generator = LLMChain(llm=llm, prompt=condense_template,
memory=memory, verbose=True)
doc_chain = load_qa_chain(llm=llm, chain_type="stuff",
prompt=qa_prompt_template, verbose=True)
return ConversationalRetrievalChain(
combine_docs_chain=doc_chain,
memory=memory,
retriever=retriever,
question_generator=question_generator,
return_generated_question=True,
return_source_documents=True,
output_key="answer",
verbose=True)
router = LangchainRouter(
streaming_mode=1,
# llm_cache_mode=3, # GPTCache
)
@router.post(
"/chat",
summary="AI Chat",
description="Chat with AI Service",
)
def chat(request: ChatRequest):
chain = create_chain(
# model=request.model,
messages=request.messages[:-1],
# max_tokens=request.max_tokens,
# temperature=request.temperature
)
return StreamingResponse.from_chain(
chain, request.messages[-1].content, as_json=False)
app.include_router(router, tags=["chat"])
Everything is working except I want the source documents and metadata returned. If I set as_json=True in StreamingResponse.from_chain it returns a ton of "token"s but it also includes the sources and metadata, but it's an unexpected response format.
New milestone feature: support for guidance
framework
GuidanceRouter
As the project grows, there is a strong need for a docs page for new (and existing) users to find information which might be too verbose to add to the repo README.
For the first version, we can add:
I'm trying the provided example from "conversation_chain.py".
I use the following command: uvicorn app.conversation_chain:app --reload
Using the web template it works fine.
But if I use the gradio endpoint, the form is displayed but whenever I send a query I get the following error:
HTTP error occurred: 422 Client Error: Unprocessable Entity for url: http://localhost:8000/chat
The chatbot should display a response from the chatbot as in the web template UI.
Thank you for this great initiative.
It would be nice to have more complete examples to get started.
The inclusion of a testing gradio UI is great news but in order to get it to work there needs more stuff than what you provide in this quick-start code:
langchain_router = LangchainRouter(
langchain_url="/chat",
langchain_object=ConversationChain(
llm=ChatOpenAI(temperature=0), verbose=True
),
streaming_mode=1
)
app.include_router(langchain_router)
I figured I could get the gradio interface to work using testing.gradio.mount_gradio_app(app=app)
but that's not perfect because the gradio chat displays the answer as json code instead of text.
Also I'm still struggling to get the streaming to work on gradio. Something else is missing it fails and I get this error: "Error in on_chain_start callback: 'name'"). I guess a callback is missing?
Thanks for your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.