Git Product home page Git Product logo

Comments (6)

JianxinMa avatar JianxinMa commented on June 12, 2024 1

I believe so. We are working on streaming LLM, though it may take some time. Please stay tuned.

from qwen-agent.

jmanhype avatar jmanhype commented on June 12, 2024

also this error seems to pop up ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/fastapi/applications.py", line 292, in call
await super().call(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/starlette/routing.py", line 69, in app
await response(scope, receive, send)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/sse_starlette/sse.py", line 233, in call
async with anyio.create_task_group() as task_group:
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 597, in aexit
raise exceptions[0]
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/sse_starlette/sse.py", line 236, in wrap
await func()
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/sse_starlette/sse.py", line 221, in stream_response
async for data in self.body_iterator:
File "/home/batman/dev/test1/Qwen/openai_api.py", line 432, in predict
for new_response in response_generator:
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 1273, in stream_generator
for token in self.generate_stream(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/transformers_stream_generator/main.py", line 931, in sample_stream
outputs = self(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 1108, in forward
transformer_outputs = self.transformer(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 938, in forward
outputs = block(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 639, in forward
attn_outputs = self.attn(
File "/home/batman/dev/test1/qwen_agent_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 564, in forward
attn_output, attn_weight = self._attn(
File "/home/batman/.cache/huggingface/modules/transformers_modules/QWen/QWen-7B-Chat-Int4/b725fe596dce755fe717c5b15e5c8243d5474f66/modeling_qwen.py", line 326, in _attn
attn_weights = attn_weights / torch.full(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.11 GiB (GPU 0; 11.73 GiB total capacity; 9.42 GiB already allocated; 819.75 MiB free; 10.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

from qwen-agent.

JianxinMa avatar JianxinMa commented on June 12, 2024

AttributeError: 'ChatCompletionResponse' object has no attribute 'model_dump_json'

Regarding the first error, please check if pip install "pydantic>=2.3.0" helps. Remember to include the double quotes.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.11 GiB (GPU 0; 11.73 GiB total capacity; 9.42 GiB already allocated; 819.75 MiB free; 10.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

As for the second issue, Qwen-7B-Chat can consume around 14GB VRAM when handling a sequence of length 8192. Try reducing the sequence length by specifying python run_server.py --max_ref_token 1000.

from qwen-agent.

jmanhype avatar jmanhype commented on June 12, 2024

thank you for the quick response agin. would streamingllm help the memory issue I understand that 14gb but does this framework would it benefit for implementing this https://github.com/mit-han-lab/streaming-llm

from qwen-agent.

jmanhype avatar jmanhype commented on June 12, 2024

Thank you. What about this https://x.com/arankomatsuzaki/status/1711401381247242683?s=20

from qwen-agent.

jmanhype avatar jmanhype commented on June 12, 2024

Screenshot_20231009-111810.png

from qwen-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.