Git Product home page Git Product logo

Comments (6)

programmerjake avatar programmerjake commented on July 21, 2024

if we get generic-sized integers uint<N> (like C's unsigned _BitInt(N)) somewhat soon, I would just use those.

Without knowing those, here's a completely naive proposal: the array must be big enough to contain at least as many bits as the vector has elements (but that's just a lower bound, arbitrarily bigger arrays are allowed), and then vector elements are mapped to bits in the array as follows: the vector element i is represented in array element i / 8, in the bit i % 8, where bits are indexed from most significant to least significant.

The standard format that LLVM uses on little-endian (and x86 and a few other arches too) is that bits are counted from the LSB end to the MSB end, not MSB to LSB. The idea is if you have some integer type uint<N> then:
(uint::<N>::from_le_bytes(the_bytes) >> i) & 1 != 0 is true iif element i of the corresponding mask is true.

I strongly think that we should just use that format everywhere if we don't want an endian-dependent format and generic-sized integers aren't ready yet.

from rust.

RalfJung avatar RalfJung commented on July 21, 2024

from rust.

programmerjake avatar programmerjake commented on July 21, 2024

if we get generic-sized integers uint (like C's unsigned _BitInt(N)) somewhat soon, I would just use those.
I don't know of any initiative working on them. What is the current status?

There's a postponed RFC that people recently have been asking if it's been long enough to restart it: rust-lang/rfcs#2581 iirc 3-4 different people have been talking about it in the last month or two (mostly on Zulip or other random corners of the Rust project).

from rust.

RalfJung avatar RalfJung commented on July 21, 2024

I strongly think that we should just use that format everywhere if we don't want an endian-dependent format and generic-sized integers aren't ready yet.

I don't have an opinion either way -- this sounds perfectly reasonable to me. We'd then say:
the vector element i is represented in array element i / 8, in the bit i % 8, where bits are indexed from least significant to most significant.

I think the LLVM IR for big-endian would then be something like

  • do the bitcast from <N x i1> to iN
  • reverse the bits
  • zero-extend to match the size of the array
  • reverse the bytes (strangely LLVM bswap only works for types with an even number of bytes, so if the array has e.g. length 3 we'd have to use some other encoding...)
  • transmute to array

Is there some particular instruction sequence we want to generate here or would something like that work?

from rust.

RalfJung avatar RalfJung commented on July 21, 2024

FWIW I am also entirely open to the idea that the current behavior is already what we want (including on big-endian). But the fact that portable-simd stopped using the array-based variant entirely is an indication that something is not optimal. I have no idea what, as I don't really know the design space here. I see my role as that of an advisor with a t-opsem view point.

The reason the current semantics seem odd is that Miri currently has exactly 4 places where endianess matters:

  • loading integers
  • storing integers
  • simd_bitmask
  • simd_select_bitmask

So, the intrinsics are certainly somewhat striking. But maybe that's expected for converting between arrays of bits and a more compact representation; I don't have any intuition for what to expect here.

from rust.

programmerjake avatar programmerjake commented on July 21, 2024

FWIW I am also entirely open to the idea that the current behavior is already what we want (including on big-endian). But the fact that portable-simd stopped using the array-based variant entirely is an indication that something is not optimal.

the reasons we stopped are that because generic const exprs aren't working that well, we have to have the output byte array have the same length as the input mask, despite being 8x overkill -- this would be solved by uint<N> since that N specifies bit count rather than byte count. also because those intrinsics are currently plain broken for non-power-of-two lengths on at least aarch64, probably due to a combination of rustc and llvm bugs.

from rust.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.