Comments (4)
yea. I think maybe doing more things in a loop results in more register pressures and cause perf problem. Will need verify with some profiling though
It's very likely. With two separated loops, the intermediate result _tmp8
will be stored to SMEM via tl.sum
to free some regsiters, which isn't too bad (much better than tl.store).
from pytorch.
this codegen might be something we should try to fix on the inductor side. Since splitting loops up resulted in speedups, maybe we should explore doing some kernels in more loops.
yea. I think maybe doing more things in a loop results in more register pressures and cause perf problem. Will need verify with some profiling though
from pytorch.
Maybe we should actually do 5 loops here:
- max_a -- evict last
- sum_a (in reverse order to maximize cache hits) -- evict first
- max_b -- evict last
- sum_b (in reverse order to maximize cache hits) -- evict first
- output
from pytorch.
Maybe we should actually do 5 loops here:
- max_a
- sum_a (in reverse order to maximize cache hits)
- max_b
- sum_b (in reverse order to maximize cache hits)
- output
This sounds reasonable to me. In general I would not fuse loops that do not have data dependencies into one. When they have data dependencies, fusion can be helpful in that the intermediate results from the first loop can be pipelined directly to the second loop in the same loop iteration. When there are no dependencies, loop fission (loop distribution) would be the way to go usually.
from pytorch.
Related Issues (20)
- Incorrect index from torch.mode
- `python3 setup.py bdist_wheel` tries to write to /usr/local/... during build HOT 3
- PyTorch C++ API binary compiled with xmake crashes HOT 5
- [ExecutionTraceObserver] Tracer gets stuck using Pytorch 2.2 versions for some models using torch.compile
- [ONNX][low pri] Move old (non-public) implementation into legacy/ and schedule for deprecation
- `argsort()` can use the 0D tensor of a complex type value against error message HOT 1
- Upgrade dependencies MKL and Intel OpenMP to 2024.2.0 HOT 6
- The unexpected behavior of `argsort()`
- `msort()` can use the 0D tensor of a complex type value against error message HOT 1
- [TP+FSDP2] model weights become fully shard again after calling model.unshard() followed by dcp get_model_state_dict HOT 1
- `int` type for `dims` of `tile()` without `dims=` works with a tensor against the doc HOT 2
- `repeat_interleave()` without `repeats` argument and `input` keyword works HOT 1
- [export/dynamo] torch._check fails at compile time when the condition evaluates to False HOT 7
- Torch dynamo deep dive and overview discrepancy HOT 1
- _foreach_addc_
- Fuyou Training Framework Integration for PyTorch HOT 3
- Exporting the operator 'aten::fft_fft' to ONNX opset version 12 is not supported.
- torch.Tensor.register_hook() source link does not work HOT 3
- `start` and `step` of `arange()` should be optional on the doc HOT 1
- `end`, `start` and `step` argument of `arange()` work with a 0D tensor against error messages HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.