Comments (6)
I believe so. We are working on streaming LLM, though it may take some time. Please stay tuned.
from qwen-agent.
also this error seems to pop up ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/fastapi/applications.py", line 292, in call
await super().call(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/routing.py", line 69, in app
await response(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/sse_starlette/sse.py", line 233, in call
async with anyio.create_task_group() as task_group:
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 597, in aexit
raise exceptions[0]
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/sse_starlette/sse.py", line 236, in wrap
await func()
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/sse_starlette/sse.py", line 221, in stream_response
async for data in self.body_iterator:
File "/home/batman/dev/test1/Qwen/openai_api.py", line 432, in predict
for new_response in response_generator:
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 1273, in stream_generator
for token in self.generate_stream(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/transformers_stream_generator/main.py", line 931, in sample_stream
outputs = self(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 1108, in forward
transformer_outputs = self.transformer(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 938, in forward
outputs = block(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 639, in forward
attn_outputs = self.attn(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 564, in forward
attn_output, attn_weight = self._attn(
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 326, in _attn
attn_weights = attn_weights / torch.full(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.11 GiB (GPU 0; 11.73 GiB total capacity; 9.42 GiB already allocated; 819.75 MiB free; 10.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
from qwen-agent.
AttributeError: 'ChatCompletionResponse' object has no attribute 'model_dump_json'
Regarding the first error, please check if pip install "pydantic>=2.3.0"
helps. Remember to include the double quotes.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.11 GiB (GPU 0; 11.73 GiB total capacity; 9.42 GiB already allocated; 819.75 MiB free; 10.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
As for the second issue, Qwen-7B-Chat can consume around 14GB VRAM when handling a sequence of length 8192. Try reducing the sequence length by specifying python run_server.py --max_ref_token 1000
.
from qwen-agent.
thank you for the quick response agin. would streamingllm help the memory issue I understand that 14gb but does this framework would it benefit for implementing this https://github.com/mit-han-lab/streaming-llm
from qwen-agent.
Thank you. What about this https://x.com/arankomatsuzaki/status/1711401381247242683?s=20
from qwen-agent.
from qwen-agent.
Related Issues (20)
- 如何将一次对话的session_id传给工具使用呢? HOT 1
- lang参数设置语言无效 HOT 6
- 给出的例子大部分是qwen max接口的例子,能否对应本地运行的LLM给出一些例子。
- 本地文档浏览器打开添加后服务端无法读取文件 HOT 5
- 上下文如何把默认的readme去掉? HOT 1
- 调用Agent最终结果打印问题 HOT 2
- 可否考虑加入对GBNF Grammar 的支持? HOT 1
- FnCallAgent工具调用效果不如ReActChat HOT 5
- 为什么我的mac m1运行example的时候会报错 HOT 1
- 如何修改code interpreter工具生成图片的结果地址 HOT 1
- Tools should respect language as well HOT 4
- Reac调用本地部署的千问1.5与灵积API结果不一样 HOT 5
- FileNotFoundError: [Errno 2] No such file or directory: 'workspace/popup_url.jsonl' HOT 1
- 怎么把自己写的agent用run_server.py启动? HOT 5
- function call怎么添加system prompt和few-shot HOT 1
- 是否考虑将 function_call 的判断修改为官方文档提供的 function_call? HOT 2
- [BUG] <title>evaluate_plugin中评测模型调用工具的能力 代码给出的6个指标和官方给的3个指标之间的对应关系是什么? HOT 1
- 怎么使用ollama部署Qwen-Agent呢 HOT 2
- embedding只能使用DashScopeEmbeddings?
- 想用qwen-agent替代 langchain 的 agent,有什么是需要改动的?看到两个的 basetool 类不太一样
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qwen-agent.