Comments (8)
@Nugine re: the workaround: On current Rust, stable, the decode_asm
function here recovers exactly equivalent output to what you had before: https://rust.godbolt.org/z/fGEaYME1h
from rust.
Seems the early exit somehow makes llvm loose track of the equivalence to vpavgb
instruction. Another workaround thus seems to be to force llvm to calculate both Ok and Err versions:
#[target_feature(enable = "avx2")]
pub unsafe fn decode(
x: __m256i,
ch: __m256i,
ct: __m256i,
dh: __m256i,
dt: __m256i,
) -> Result<__m256i, __m256i> {
let shr3 = _mm256_srli_epi32::<3>(x);
let h1 = _mm256_avg_epu8(shr3, _mm256_shuffle_epi8(ch, x));
let h2 = _mm256_avg_epu8(shr3, _mm256_shuffle_epi8(dh, x));
let o1 = _mm256_shuffle_epi8(ct, h1);
let o2 = _mm256_shuffle_epi8(dt, h2);
let c1 = _mm256_adds_epi8(x, o1);
let c2 = _mm256_add_epi8(x, o2);
if _mm256_movemask_epi8(c1) != 0 {
return Err(c2);
}
Ok(c2)
}
But I guess this will break down as soon as the function gets inlined if the error value is not otherwise used.
from rust.
Blaming rust-lang/stdarch#1477
Did you confirm that this is the responsible change or are you guessing?
from rust.
@Nugine This is definitely more instructions and more bytes on each, so I'm marking it with I-heavy, but it appears this comes with a performance regression. Can you be precise about which of the ~19 benchmarks you appear to run have regressed, and on what architecture?
I would rather we not make the 2nd vpavgb instruction come back only for your algorithm to still be dog-slow because some of the other instructions are different.
Also, can you be more precise on what architectures and with what target features you're testing on? GitHub is allowed to change the CPU you run benchmarks on, and does, because their fleet is not perfectly uniform, so -Ctarget-cpu=native
makes it more likely your benchmarks can be run-to-run and job-to-job inconsistent.
from rust.
Base64-decode in base64-simd
has been slower than radix64
since Rust 1.75.0. By comparing the asm generated by 1.74.1 and 1.75.0, I found that one of vpavgb
is missing. LLVM doesn't emit vpavgb
for one of _mm256_avg_epu8
, but a lot of equivalent instructions.
rust-lang/stdarch#1477 made the change. However, the root cause may be elsewhere, possibly LLVM.
To see the asm, you can use the following commands.
git clone https://github.com/Nugine/simd.git
cd simd
rustup override set 1.74.1 # or 1.75.0
RUSTFLAGS="--cfg vsimd_dump_symbols" cargo asm -p base64-simd --lib --simplify --target x86_64-unknown-linux-gnu --context 1 -- base64_simd::multiversion::decode::avx2 > base64-decode-avx2.asm
cat base64-decode-avx2.asm
Target: x86_64-unknown-linux-gnu
Instruction: AVX2
I have extracted the decode function and reproduced the regression. https://rust.godbolt.org/z/KG4cT6aPK
I'm looking for:
- a stable workaround method to generate
vpavgb
- why the optimization is missing
from rust.
@Nugine re: the workaround: On current Rust, stable, the
decode_asm
function here recovers exactly equivalent output to what you had before: https://rust.godbolt.org/z/fGEaYME1h
Cool! I'll try asm wrapper.
from rust.
based on jhorstmann's remark, it would be nicest to fix this in LLVM, since LLVM appears to have the information necessary to do this optimization, it just is missing it in the early-return case. I don't think partially reverting a diff is unwarranted, however.
from rust.
WG-prioritization assigning priority (Zulip discussion).
@rustbot label -I-prioritize +P-medium
from rust.
Related Issues (20)
- Wrong diagnostic when returning `_` with a lifetime HOT 4
- Calling non-const function `core::slice::iter::<impl IntoIterator for &[u8]>::into_iter` HOT 3
- Tests fail with LLVM 19 due to unexpected "'+avx512er' is not a recognized feature for this target (ignoring feature)" output HOT 1
- `cargo test`: `thread 'coordinator' panicked at compiler\rustc_middle\src\util\bug.rs:36:26:` when building on a drive with low storage HOT 4
- ICE: You can't project to field 0 of `DynMetadata` because layout is weird and thinks it doesn't have fields. HOT 2
- ICE: trivial bounds : `Failed to normalize Alias(Projection` HOT 1
- Confusing rustdoc layout for `const` functions that are unstable even in non-`const` contexts HOT 2
- ICE: `cycle detected when computing function signature of`
- ICE: Opaque types got hidden types registered from within subtype predicate HOT 2
- Why don't slices use dedicated metadata structs? HOT 7
- ICE: `layout mismatch for result of MulWithOverflow` HOT 4
- ReentrantLockGuard's Sync impl is unsound HOT 2
- use osc 8 hyperlinks to link to the reference for lints when they are printed HOT 1
- frameworks not supported error HOT 6
- compiletest should note when test output is normalized HOT 1
- Uncredible virtual memory usage when compile, maybe memory leak HOT 1
- PhantomData confusing documentation
- Performance regression between v1.76.0 and v1.77.2 HOT 10
- Nightly rustc panic when compiling a simple no_std program HOT 2
- Wrong import hint HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust.