Comments (4)
@Echozqn , the hbm memory efficiency is only needed for latency estimation. The memory usage calculation does not need this value.
from llm-analysis.
@Echozqn The hbm memory efficiency is defined as the percentage of theoretical memory bandwidth one can achieve, thus not relevant to the memory size.
Do you use flash attention in you run code? The default (flash_attn flag) is True. This might cause the difference between the estimated memory size and the Nvidia-smi output.
Also the memory estimates aim to be close to torch.cuda.max_memory_allocated.
from llm-analysis.
Yes, I found that nvidia-smi
measured inaccurately when flash_attn
was enabled. Thanks to the author for the answer.
from llm-analysis.
In my case, I used nvidia-smi
to monitor and it showed about 30GB of VRAM, but the prediction came out to only 20GB. Is there such a thing as memory efficiency for VRAM?
from llm-analysis.
Related Issues (9)
- [REQUEST] How to get other GPU config HOT 3
- [REQUEST] Support for paged attention? HOT 2
- A question about layernorm activation memory. HOT 3
- question about the memory calculation HOT 2
- mistral and mixtral inference[BUG] HOT 4
- latency [BUG] HOT 3
- How to get the analysis of model Qwen1.5-0.5B
- [BUG] MLP intermediate dimension not used
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-analysis.