Comments (13)
But I am not willing to do an application-specific optimization in LLVM. So I file this issue to see whether we can fix it in hashbrown.
So as argued above by others, this is not an application-specific pattern. It is a pretty natural pattern for Rust programs: one has a checked operation that can fail in a safe way, and then one turns that into the unchecked variant (with UB on failure) by doing unwrap_unchecked()
(which expands to the match
statement from the issue description).
I'm not sure how else you are suggesting that code should be written, but it'd clearly be bad if calculate_layout_for
had to be written twice.
from hashbrown.
Even aside from the pattern overall being visible in Rust code beyond hashbrown, most of the code in hashbrown is transitively included in every single Rust program that uses a hashmap, so "application-specific" is an amusing way to describe "probably 20% of all Rust programs ever? higher, even?"
from hashbrown.
Looks like the same as llvm/llvm-project#80637. As I noted there, this technically loses information, but I guess in practice it's a lot better to replace with mul nuw
here. I wouldn't be opposed to doing that in LLVM, as this kind of pattern is probably somewhat common. It arises whenever you have uses of checked_*
in some method, and then it gets used in multiple places, some of them checking the result, and the others doing something unwrap_unchecked style. It avoids the duplication of implementing it both with checked_* and unchecked_* arithmetic.
@dtcxzyw Do you have some example where the replacement enables follow-up optimization in practice? As far as I can tell your examples will be lowered to the same asm after assumes are dropped in CGP.
from hashbrown.
It arises whenever you have uses of
checked_*
in some method, and then it gets used in multiple places, some of them checking the result, and the others doing something unwrap_unchecked style. It avoids the duplication of implementing it both with checked_* and unchecked_* arithmetic.
As @nikic said, we want to reuse calculate_layout_for
.
Lines 1671 to 1674 in 274c7bb
Lines 1745 to 1748 in 274c7bb
from hashbrown.
Could you further explain why alive2 proves the correctness of the transformation, but we cannot transform it in LLVM?
BTW, we should probably find a reduce example to report at rust. But I can also do this later.
from hashbrown.
but we cannot transform it in LLVM?
It is just not implemented. But I am not willing to do this in LLVM. We should fix it in hashbrown
or rustc
.
from hashbrown.
An example from tokio-rs
: https://godbolt.org/z/Gr55dfMc8
from hashbrown.
@nikic See the IR diff in https://github.com/dtcxzyw/llvm-opt-benchmark/pull/289/files:
bench/diesel-rs/optimized/1hskzwx2vflsavf7.ll: https://godbolt.org/z/8e6qd9Gjo
bench/image-rs/optimized/2mngkegtim1o10y3.ll: enables factorization: (%2505 + 1) * 392 + 15 --> %2505 * 392 + 407
from hashbrown.
@dtcxzyw Most of the callers that eventually arrived in hashbrown::raw::RawTableInner::free_buckets
had std::collections::HashMap::some_fn
higher on the stack, right? (I don't know what your tooling is like as to whether it can make this kind of thing obvious)
from hashbrown.
@dtcxzyw Most of the callers that eventually arrived in
hashbrown::raw::RawTableInner::free_buckets
hadstd::collections::HashMap::some_fn
higher on the stack, right? (I don't know what your tooling is like as to whether it can make this kind of thing obvious)
Yeah.
Even aside from the pattern overall being visible in Rust code beyond hashbrown, most of the code in hashbrown is transitively included in every single Rust program that uses a hashmap, so "application-specific" is an amusing way to describe "probably 20% of all Rust programs ever? higher, even?"
I believe there is a better way to fix this issue by adding some MIR optimizations or modifying hashbrown itself. It is tricky to handle it in LLVM :(
See llvm/llvm-project#84016.
from hashbrown.
Even aside from the pattern overall being visible in Rust code beyond hashbrown, most of the code in hashbrown is transitively included in every single Rust program that uses a hashmap, so "application-specific" is an amusing way to describe "probably 20% of all Rust programs ever? higher, even?"
I think you're missing the point here. When it comes to optimization issues, in Rust we are in the very fortunate position that the issue can be addressed either by adding new optimizations, or by changing library code to be more amenable to optimization. The fact that we can change the hashbrown implementation and have ~all Rust code benefit is the point, not an analysis mistake.
<rant>
This is contrary to the stupidity prevalent in C/C++, where an incredible amount of compiler engineering resources go into implementing completely ridiculous optimizations that target specific patterns in SPEC benchmarks, where the proper thing to do would have been to just change the benchmark code. </rant>
(This is just for context -- as said above, I'm not convinced that modifying hashbrown is indeed the best solution here.)
from hashbrown.
It is tricky to handle it in LLVM :(
Are there any small tweaks that we could do to what we emit to make it easier?
For example, I see that the current u64::checked_mul
looks like this
define { i64, i64 } @demo_checked_mul(i64 noundef %a, i64 noundef %b) unnamed_addr #0 !dbg !7 {
start:
%0 = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b), !dbg !12
%_6.1 = extractvalue { i64, i1 } %0, 1, !dbg !12
%not._6.1 = xor i1 %_6.1, true, !dbg !25
%. = zext i1 %not._6.1 to i64, !dbg !25
%_6.0 = extractvalue { i64, i1 } %0, 0, !dbg !12
%1 = insertvalue { i64, i64 } poison, i64 %., 0, !dbg !28
%2 = insertvalue { i64, i64 } %1, i64 %_6.0, 1, !dbg !28
ret { i64, i64 } %2, !dbg !28
}
which definitely can't be further optimized because there's nothing telling LLVM that second part of the pair will be unused.
Inspired by rust-lang/rust#124114 I started imagining something like
%product_or_undef = select i1 %_6.1, i64 undef, i64 %_6.0
%1 = insertvalue { i64, i64 } poison, i64 %., 0
%2 = insertvalue { i64, i64 } %1, i64 %product_or_undef, 1
but it looks like that's not actually useful because it gets immediately optimized away again.
from hashbrown.
One thing you might try on the hashbrown side: thanks to llvm/llvm-project#56563, it seemed like using wider types for the isize::MAX
check actually codegen'd well in rust-lang/rust#100866, just was a bit slower to compile.
Rather than doing all this math as checked in
Lines 261 to 263 in 274c7bb
maybe you could try writing it in u128 where I think it can't overflow? With the later isize::MAX
check it might optimize better than one would first guess. Certainly the first mul
there is nuw nsw
in i128
.
Alternatively, maybe there'd be a way to shift by the log of the buckets, since it said it's always a power of two, instead of multiplying? It looks like the function doesn't do anything to tell LLVM that it's actually a power-of-two, so it probably has no way of being smart about understanding & !(ctrl_align - 1)
either. (cc llvm/llvm-project#58996)
I wonder if making buckets
be NonZeroUsize
could do anything...
(Aside: I should push on https://doc.rust-lang.org/std/primitive.usize.html#method.carrying_mul again...)
from hashbrown.
Related Issues (20)
- `hashbrown` fails to compile as a transitive dependency HOT 2
- allocator-api2 default-feature? HOT 2
- Compiling hashbrown 0.14.2 for aarch64-unknown-linux-gnu with "target-cpu=cortex-a53" generates illegal instructions HOT 2
- Switching to GxHash? HOT 9
- Feature: increase capacity according to the actual size returned by the allocator HOT 2
- Hashbrown crash due to bad malloc HOT 1
- 0.14.3 - no method named `clear` found for struct `HashMap` in the current scope HOT 5
- Benchmark biaised due to no fence around input
- assertion failed: buckets.is_power_of_two() HOT 8
- Build breaks on nightly due to use of `stdsimd` rust feature in ahash 0.8.6 HOT 2
- Was swap-remove behavior ever considered when removing entries? HOT 10
- Consider returning to 1.63.0 MSRV HOT 1
- How to calculate the size of the hashbrown::HashMap at runtime? HOT 1
- Library test `map::test_map::test_clone_from_memory_leaks` errors with using uninitialized data under valgrind and miri
- update to ahash 0.8.7 or after to use new stdsimd feature portable_simd HOT 1
- Insertion performance with arena allocators HOT 3
- Do not grow the raw table when lots of deletion and insertion is performed HOT 3
- Unsound usages of unsafe implementation from `usize` to `T` HOT 2
- Support lower maximum size
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hashbrown.