Git Product home page Git Product logo

cranelift's Introduction

cranelift's People

Contributors

abrown avatar alexcrichton avatar amanieu avatar angusholder avatar bjorn3 avatar bnjbvr avatar carolinecullen avatar d1m0 avatar data-pup avatar denismerigoux avatar dependabot-preview[bot] avatar eqrion avatar fitzgen avatar iximeow avatar julian-seward1 avatar jyn514 avatar lachlansneff avatar mrowqa avatar nbp avatar pchickey avatar pepyakin avatar ryzokuken avatar sstangl avatar stoklund avatar sunfishcode avatar tyler avatar usize avatar waywardmonkeys avatar yurydelendik avatar zummenix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cranelift's Issues

Prettier verifier errors

The verifier (#3) is beginning to find real bugs now. Its method for displaying verifier errors could be improved. Here's an example of the current output:

$ target/debug/cton-util test filetests/isa/riscv/expand-i32.cton 
FAIL filetests/isa/riscv/expand-i32.cton: inst1: arg 0 (v0) has type i16, must match function signature of i32
inst1: return v0

function large_imm(i16) -> i32 {
ebb0(vx0: i16):
    v0 = iadd_imm vx0, 0x3b9a_ca00
    return v0
}

The reference to inst1 is cryptic because instruction references don't appear in the textual IL, so we print the instruction itself in the first line. This is not a great solution.

It would be much better to underline the offending instruction, like the filecheck tests do:

FAIL filetests/isa/riscv/expand-i32.cton: inst1: arg 0 (v0) has type i16, must match function signature of i32
function large_imm(i16) -> i32 {
ebb0(vx0: i16):
    v0 = iadd_imm vx0, 0x3b9a_ca00
    return v0
    ^~~~~~~~~
verifier: arg 0 (v0) has type i16, must match function signature of i32
}

Implementation suggestion

One possible implementation is to move the guts of write_function into a more general

pub fn decorate_function<WL: FnMut(AnyEntity, usize, &str) -> fmt::Result>(w: WF, func: &Function, isa: Option<&TargetIsa>) -> fmt::Result;

The WL closure would be called to write out each line with the arguments: The entity being defined on the line, the indentation to insert before the line, and the contents of the line itself.

Binary function names

Cretonne compiles functions independently, so function names are used differently than in LLVM. They serve two purposes:

  1. In .cton test cases, the function names are all ASCII, and are used to identify individual test cases in the same file.
  2. When Cretonne is embedded as a JIT compiler, function names are used to identify other functions that may be called. This identifier can be any sequence of bytes, it doesn't have to be ASCII or UTF-8. Cretonne doesn't interpret these function names, they are opaque identifiers.

The binary function names are not well supported. They get printed out as quoted ASCII with a bunch of backslash escapes.

  • The parser doesn't supported the quoted format of function names.
  • If the name is a binary encoding of some identifier, ASCII with escapes is not a good representation.
  • Right now, function names like v7 or ebb4 get printed without quotes, and the lexer recognizes them as value and EB tokens.

Alternative function name syntax.

Over in #24, I proposed two new identifier/data tokens: %nnnn and #xxxx. We should use these to represent function names everywhere:

  1. If the function name consists entirely of ASCII alphanumerical characters and _, use the %nnnn notation. Note that this also allows for names like %0. There is no need to give special treatment to the first character.
  2. If the function name contains any other characters, use the #xxxx hexadecimal representation of the function name bytes.
  3. If the name is empty, use a special syntax, maybe noname (no %).

With these changes, the parser should stop accepting unquoted identifiers as function names.

Binary name representation.

Currently, the FuncName struct contains a String:

pub struct FunctionName(String);

This restricts us to names in UTF-8 form. We should accept any sequence of bytes, so change this to:

pub struct FunctionName(Vec<u8>);

Allocation-free representation

For extra credit: Cretonne tries to minimize heap allocations everywhere, so an internal short-string optimization would make a lot of sense:

enum NameRepr {
    Short {
        length: u8,
        bytes: [u8;12],
    }
    Long(Vec<u8>),
}

Remove null value from Opcode

After the Rust language designers carefully removed nulls from the language, I went and added my own. It turns out that Option<T> is often larger than T, and tables with 32-bit or smaller entries can double in size when using Option<T>.

C-like Opcodeenum

The Opcode enum in the generated opcodes.rs file contains an explicit null value:

pub enum Opcode {
    /// An invalid opcode.
    NotAnOpcode,
    /// `jump EBB, args`. (Jump)
    Jump,
    /// `brz c, EBB, args`. (Branch)
    /// Type inferred from `c`.
    Brz,
    ...
}

With the current opcodes, Opcode is a single byte, but Option<Opcode> is two bytes. We use the explicit null to indicate the empty slots in a precomputed hash table:

const OPCODE_HASH_TABLE: [Opcode; 128] = [
    Opcode::Imul,
    Opcode::Fsub,
    Opcode::IconcatLohi,
    Opcode::Sshr,
    Opcode::Fmaxnum,
    Opcode::NotAnOpcode,
    Opcode::Nearest,
    ...
}

This table would double in size if we used Option<Opcode>. Since this table is only used by the parser when mapping strings to opcodes, perhaps we should just bite the bullet and use an Option<Opcode> here. Hopefully, the Rust compiler will soon learn how to represent this sum type efficiently. It's one of the easier nested enum optimizations.

Encoding tables also use NotAnOpcode currently. These tables are more critical to keep small, but there may be an alternative representation of the empty slots.

cretonne needs a recent rustc

rustc 1.13 doesn't build, but the documentation mentions everything more recent than 1.12 should work fine. I think suggesting 1.15.1 is more reasonable, but up to you.

Example error while running the unitttest:

======  Rust cretonne unit tests  ======
   Compiling cretonne v0.0.0 (file:///Users/davide/ng/cretonne/lib/cretonne)
error[E0433]: failed to resolve. Use of undeclared type or module `ArgumentLoc`
   --> lib/cretonne/src/ir/extfunc.rs:198:42
    |
198 |         sig.argument_types[1].location = ArgumentLoc::Stack(8);
    |                                          ^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `ArgumentLoc`

error[E0433]: failed to resolve. Use of undeclared type or module `ArgumentLoc`
   --> lib/cretonne/src/ir/extfunc.rs:203:42
    |
203 |         sig.argument_types[0].location = ArgumentLoc::Stack(24);
    |                                          ^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `ArgumentLoc`

error: aborting due to 2 previous errors

error: Could not compile `cretonne`.

Document NaN propagation semantics

The language reference is not so clear on what happens when floating point arithmetic instructions produce a NaN. We should specify that.

  • Can instructions return a signaling NaN or only quiet NaNs?
  • When propagating a NaN input, can the payload bits change?

The specification should be tight enough that WebAssembly semantics are simple to represent, and loose enough that we don't need lots of NaN adjustment code emitted for fadd on common targets.

Copying the WebAssembly semantics is probably a good choice.

Only assume local type inference when writing text format IL

Cretonne's IL parser supports a limited form of type inference for some polymorphic instructions where the controlling type variable can be inferred from one of the input operands:

    v0 = iconst.i32 2
    v1 = iconst.i8 6
    v2 = ishl v0, v1

The ishl instruction is polymorphic, but its i32 result type can be inferred from the type of the v0 operand.

The parser's type inference is best effort only, and in particular, it only knows the types of values that have been defined earlier in the input text. If v0 were defined in an EBB later in the text format, we would have to use:

    v2 = ishl.i32 v0, v1

It would be technically possible for the parser to defer type resolution until everything has been parsed so that more missing types could be inferred. I don't think this is desirable, though. Cretonne IL is not a programming language, it is a language for writing unit tests, so obvious typing is better. It is also possible to write test cases in Cretonne IL that are somewhat malformed. For example, defines don't have to dominate uses, and it is useful to be able to write verifier test cases like that.

Local type inference

The limited type inference is useful for reducing clutter, though. If a value is defined previously in the same EBB, its type is usually obvious, so there is no need repeat its type.

Currently, the instruction writer in write.rs is assuming a very smart type inference mechanism, even smarter than what the parser can do. It should be fixed to only depend on local type inference.

Only omit the type variable when printing an instruction with a type variable operand that is defined in the same EBB.

There is a TODO in the type_suffix() function.

Relax whitespace sensitivity in Filecheck

Filecheck is currently very stringent in matching whitespace of tests exactly, which leads to undesirable situations of having to add a regex match to pickup excess spaces between identifiers.

For example return v0,v1 isn't valid and must be return v0, v1 which is subtle and goes against intuition.

Another situation is that the code that stringifies IR can insert many spaces in unexpected places, such that when round-tripped through the a parse and restringify, the following:

[-,-] v2 = iadd v0, v1

becomes

[-]                     v2 = iadd v0, v1

meaning that a test to match it must look like the following:

; regex: WS=[ \t]*
; nextln: [-]$WS $v2 = iadd $v0, $v1

We should examine where (if anywhere) whitespace sensitivity is suitable, and remove it every else so that writing these common test cases has less friction.

Setting presets

The cretonne::settings module defines a Configurable trait which can be used to manipulate settings one at a time. In most realistic use cases, a client wants to choose between a few fixed configurations, e.g. first-tier JIT vs last-tier JIT or debug build vs release build.

Add named 'presets' which can apply a whole collection of settings at once:

trait Configurable {
    /// Apply all the settings in the named preset.
    fn preset(name: &str);
}

The presets can be defined in settings.py as a simple dictionary mapping settings to their preset values:

release_mode = Preset(
        """Settings for release builds""",
        {
            opt_level: 'best',
            enable_simd: true
        })

File tests

Cretonne has unit tests that are run with cargo test. These tests are important, but it is tedious to write Rust code that generates significant IR functions to test certain features.

We need a file-level test mechanism that loads IR from a .cton file and runs a test on the functions contained. This mechanism should be controlled by a cton-util test sub-command. Each test case is a single .cton file containing a test command and a number of functions. The test command determines how the functions should be tested.

This assumes that the .cton file can be parsed correctly. Tests for parser errors should be written in a different way. Probably as cargo test unit tests.

The cton-util test sub-command can be used to run a single test case or a whole directory tree full of test files. When running multiple tests, multi-core machines should be exploited.

Parser roundtrip tests

The test parse-roundtrip command is used to verify the IL parser and serializer. For each function in the file this tests that:

  • The function can be parsed without errors.
  • Serializing the parsed IR produces the exact same text.

If the file contains two consecutive functions with the same name, the text produced by serializing the first function is compared against the text of the second function. This is used to test certain canonicalizing features of the parser: Comments are removed and whitespace is canonicalized, for example.

This type of test is also used to check that all IR features and attributes are preserved in the serialization roundtrip. The textual compare is rough, but it doesn't miss anything. If we compared the in-memory IR, the comparison function would need constant updating to avoid missing new features.

An IR-text-IR roundtrip test would also be useful. This should focus on preserving things like CFG predecessor list orders and other derived properties not stored directly in the IR. This kind of testing is more difficult because IR just read from a file has less variance than IR coming out of some optimization pass. For example, if a test case runs text-IR-text-IR and compares the CFG predecessor lists in the two IRs, it is unlikely to find any differences sine those CFGs were just computed from the text.

Verifier tests

The test verify-ok command is used to test the IR verifier. For each function in the file this tests that:

  • The function can be parsed without errors.
  • The function can be verified without errors.

The test verify-fail command is used to test verifier failures. For each function in the file this tests that:

  • The function can be parsed without errors.
  • Verifying the function fails.
    The file may also contain FileCheck-style comments for checking for specific diagnostics.

Compiler pass tests

Syntax TBD, but we want to be able to inject an IR function into a specific pass and run FileCheck-style checks against the output.

`verify_context()` should also verify CFG integrity

The verifier::verify_context() function checks that the dominator tree in the context matches a newly computed one. We should do the same thing for the control flow graph.

There is an existing cfg_integrity method, but it just checks the just-computed CFG against the function. It doesn't look at the CFG in the context.

cc @angusholder

Define boolean conversion instructions

We are missing conversion instructions for the boolean types:

  • Bool-to-bool. Perhaps breduce and bextend to match the integer equivalents.
  • Bool-to-integer where true maps to 1.
  • Bool-to-integer where true maps to -1.

Integer-to-bool conversion is best handled by the icmp_imm instruction.

Suggested to-int conversions:

v1 = bint.i32 v0        ; 0x00000001
v2 = bmask.i16 v0       ; 0xffff

Memory pool and B-tree representation for live ranges

The LiveRange struct currently uses a Vec<Interval> to store its list of live-in intervals. This has some problems:

  • The vector is often empty, but it still has a 24-byte footprint.
  • The vector causes fine-grained heap allocation which is slow.
  • In degenerate cases, computing a live range can take quadratic time because new live-in intervals are being inserted near the front of a large vector. (Note that LLVM's LiveInterval has the same weakness, and it hasn't yet caused known compile time issues.)

These problems can be fixed in a way similar to #35, but instead of allocating lists from a memory pool, we allocate B+-tree nodes. This is similar to representing the live-intervals in a BTreeMap<Ebb, Inst>. Unfortunately, Rust's standard B-tree is not useful to us since it allocates nodes from the heap, and it doesn't provide a rich enough interface to allow things like coalescing intervals, see #37.

Implementation

  • The node pool is implemented with a vector of tree nodes.
  • Nodes are identified by 32-bit references that are simply indexes into the vector.
  • Nodes are cache line sized (64 bytes).
  • The LiveRange::liveins vector becomes a 32-bit node reference with a sentinel value to indicate 'empty'.

The B-tree node is defined as an enum:

enum Node {
    Inner {
        size: u8,
        keys: [K; 7],
        nodes: [NodeRef; 8],
    }
    Leaf {
        size: u8,
        keys: [K; 7],
        values: [V; 7],
    }
}

Long function names don't render correctly

Functions with names whose character count is higher than funcname::NAME_LENGTH_THRESHOLD (22) are not displayed properly: #6933322e6e6f5f666f6c645f636d705f735f6f6666736574 instead of i32.no_fold_cmp_s_offset (this string has 23 characters).

The bug should be in funcname::try_as_name.

Enumerate all legal encodings of an instruction

The TargetIsa::encode() function returns the first encoding of an instruction where both ISA and instruction predicates are satisfied. We also need a way of enumerating all of the possible encodings of an instruction, probably by creating an iterator that returns encodings.

Instruction encodings can have multiple constraints that are checked at different points in the compiler pipeline:

  1. ISA predicates depend only on static settings like available CPU and instruction set features. For example, the imul.i32 instruction can only be encoded if the RISC-V "M" extension is supported by the CPU.
  2. Instruction predicates check if the immediate operands can be encoded. For example, iadd_imm.i32 can only be encoded as a RISC-V addi instruction if the immediate is a 12-bit signed integer.
  3. Register constraints. These are described by the RecipeConstraints available from EncInfo and satisfied by the register allocator.
  4. Branch range constraints. These are described by the BranchRange constraints also available from EncInfo.

Only the first two types of constraints are checked by the iterator, just like TargetISa::encode() does it. This iterator would be used by:

  • The encoding verifier will permit any of the valid encodings instead of requiring a unique one as it does now.
  • The branch relaxation pass (#73) will use it to search for a different branch encoding with more relaxed range constraints. For example, x86 branches can be encoded with an 8-bit or a 32-bit displacement.
  • An instruction compression pass (which doesn't exist yet) can look for a smaller encoding of the same instruction that still satisfies register constraints. For example, 32-bit ARM instructions can be represented as 16-bit Thumb2 instructions if the registers are right.

The tables that are generated in files like encoding-riscv.rs should already be usable for this. The implementation of TargetIsa::encode() stops at the first match. The iterator would keep going until reaching a stop code.

cc @anholt

Include bound type variables in type inference

When defining AST nodes, it is possible to use instructions with bound type variables:

    Rtl(
        a << udiv.i32(x, y)
    )

The ValueTypes for the bound type variables appears in ast.Apply.typevars:

    def __init__(self, inst, args):
        # type: (instructions.MaybeBoundInst, Tuple[Expr, ...]) -> None  # noqa
        if isinstance(inst, instructions.BoundInstruction):
            self.inst = inst.inst
            self.typevars = inst.typevars
        else:
            assert isinstance(inst, instructions.Instruction)
            self.inst = inst
            self.typevars = ()

These bound type variables should be used top constrain type inference further. It effectively converts the involved type variables to singletons.

The typevars field is a Tuple[ValueType, ...]. A None entry means the the type variable is not constrained.

cc @d1m0

Add WiderThan constraint

Instructions such as sextend, uextend and ireduce require a type constraint between the widths of the argument and the result value. One way to implement this is to:

  • Add a ConstrainWiderThan TypeConstraint in ti.py
  • Extend Instruction constructors to allow for constraints amongst formal vars
  • Add the instruction specific constraints to the constraints accumulated by typeinference.

~Dimo

Parse encodings

The write_instruction() function will print out the encoding of an instruction if it has been set:

[R#0c]        v5 = iadd v1, v2

The parser should decode these annotations and fill out the Function::encodings table.

Since the encoding recipe (R) is ISA-dependent, we probably can't support encodings in a test file with multiple ISAs.

See the DisplayEncoding struct for details of the format.

Add instruction labels to function printing

When debugging using the verifier, you often have errors of the type inst589: use of non-dominating arg value. However it is impossible to locate inst589 when printing the function later because the instruction labels are not printed.

It would be very valuable to print these when printing the function to be able to actually use the verifier error messages.

br_table jump arguments

@sunfishcode and I ran into a problem when translating wasm into Cretonne IL. In wasm, the br_table instruction actually pops some return values from the stack and passes them to the block it is jumping to. However in Cretonne IL, contrary to the other branch instructions, br_table does not support jump arguments. This means that the Ebb that are in the jump table used by br_table must not have arguments. We don't know if this is done on purpose or not.

Here is the alternative space for solving this inconsistency:

  • either add an jump arguments field for the Cretonne IL br_table and require that all Ebbs inside a jump table have the same arguments count and types ;

  • either leave br_table as is but then it would mean that we would need to use stack_store and stack_load when we want to pass arguments to an Ebb that is target of a br_table instruction.

native code output

Question

Hello!

Been following this crate off and on, great work!

So, also been poking around every now and then at your native code generators. Now, from various issues #30 and internals thread, I thought that this wasn't a wasm targetting compiler, but was being considered, in some distant future, for a different rustc backend, which would entail native code generation and executable generation for say, x86 at the very least.

Now, hopefully I'm not barking up the wrong tree, but it does appear that there are, for example, arm and x86 asm emitters; if this is the case:

  1. What are your plans for outputting object files?

I ask, because it just so happens that I finished a working prototype for a pure rust object file emitter (emphasis mine). The repo is here (but be warned, it is really, really rough, and certain things are hardcoded at the moment)

Wow cool! But what do I mean exactly?

Intro

Here is a working snippet for x86_64 (working as in it is code that compiles, runs, and outputs a correct object file):

// try this for dynamically linked file
// ld -e _start -I/usr/lib/ld-linux-x86-64.so.2 -L/usr/lib/ /usr/lib/crti.o /usr/lib/Scrt1.o /usr/lib/crtn.o test.o -lc -o test
fn run () -> error::Result<()> {
    for (i, arg) in env::args().enumerate() {
        if i == 1 {
            let mut obj: Elf = Artifact::new(Target::X86_64, Some(arg.to_owned()));
            // 55	push   %rbp
            // 48 89 e5	mov    %rsp,%rbp
            // b8 ef be ad de	mov    $0xdeadbeef,%eax
            // 5d	pop    %rbp
            // c3	retq   
            obj.add_code("deadbeef".to_owned(), vec![0x55, 0x48, 0x89, 0xe5, 0xb8, 0xef, 0xbe, 0xad, 0xde, 0x5d, 0xc3]);
            // main:
            // 55	push   %rbp
            // 48 89 e5	mov    %rsp,%rbp
            // b8 00 00 00 00	mov    $0x0,%eax
            // e8 d4 ff ff ff	callq  0x0 <deadbeef>
            // 89 c6	mov    %eax,%esi
            // 48 8d 3d 00 00 00 00	lea    0x0(%rip),%rdi # will be: deadbeef: 0x%x\n
            // b8 00 00 00 00	mov    $0x0,%eax
            // e8 00 00 00 00	callq  0x3f <main+33>  # printf
            // b8 00 00 00 00	mov    $0x0,%eax
            // 5d	pop    %rbp
            // c3	retq
            obj.add_code("main".to_owned(), vec![0x55, 0x48, 0x89, 0xe5, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xe8, 0xe2, 0xff, 0xff, 0xff, 0x89, 0xc6, 0x48, 0x8d, 0x3d, 0x00, 0x00, 0x00, 0x00, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xe8, 0x00, 0x00, 0x00, 0x00, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x5d, 0xc3]);
            obj.add_data("str.1".to_owned(), b"deadbeef: 0x%x\n\0".to_vec());
            obj.link("main", "str.1", 19); // offset we need to relocate some data in main
            obj.import("printf".to_owned());
            obj.link_import("main", "printf", 29); // offset we need to relocate an import in main
            println!("res: {:#?}", obj);
            obj.write(::std::fs::File::create(Path::new(&arg))?)?;
        }
    }
    Ok(())
}

So, for the current form, the client basically specifies a Target, and a name, and the type determines the container format, in this case, ELF:

let mut obj: Elf = Artifact::new(Target::X86_64, Some(arg.to_owned()));

Raw intel x86_64 asm code can then be appended to this object, via:

obj.add_code("deadbeef".to_owned(), vec![0x55, 0x48, 0x89, 0xe5, 0xb8, 0xef, 0xbe, 0xad, 0xde, 0x5d, 0xc3])

which adds the function deadbeef with the associated, routine to the object file.

As you can see above, imports can be added, and relocations can be added (more on this later).

The call to write then emits, in this case, a correctly formed, and eminently linkable ELF64 object file.

If you ran this code and gave it the name test.o, and then ran (on my linux system):

ld -e _start -I/usr/lib/ld-linux-x86-64.so.2 -L/usr/lib/ /usr/lib/crti.o /usr/lib/Scrt1.o /usr/lib/crtn.o test.o -lc -o test

a dynamically linked executable that calls printf called test would be produced, which prints (in this case): "beef: 0xdeadbeef"

API

If this interests you (you could test your x86 code gen pretty much right now, with some caveats, which i won't get into right now), let me know, as:

  1. I'd like to work out what in particular a simple, but nice generic compiler API would look like for you. The above is experimental, and probably won't work for certain cases, or perhaps not; but this is part of the discussion I'd like to have! API Tracking issue
  2. More difficultly, I'd like to work out how best to do relocations in a, perhaps brazenly, a platform agnostic fashion, for use in a number of other projects. Relocation Tracking issue

Right now 2. may be too much to bite off at once (and the plan is always to give the API an escape hatch and provide compiler writers a direct relocation insertion api), but i do think something like it would be beneficial, and very cool.

So let me know what you think, or if this isn't appropriate for you at all right now, I'll be sad, but that's ok too! ๐Ÿ’ƒ

B-tree sets

The B+-trees in #38 should also be implemented in a set-like variant. At least our control-flow graph representation could benefit from that. The lists of successors and predecessors would be well implemented as B-tree sets.

The node enum for a set is:

enum Node {
    Inner {
        size: u8,
        keys: [K; 7],
        nodes: [NodeRef; 8],
    }
    Leaf {
        size: u8,
        keys: [K; 15],
    }
}

Loop basic blocks splitting

Right now, the loop analysis describes the loop by a header Ebb and a list of Ebbs pertaining to the loop. However, because Ebb are extended basic blocks, this description include instructions that are not part of the loop (they are after the back edge branch instruction in the Ebb),

This is a problem because the LICM pass will move upstream instructions that are not in the loop. It does't break the code because moving SSA definitions upwards (in the dominator tree) is always allowed but it will break code locality.

The way to solve this problem is to represent loops in the loop analysis as a list of couples (Ebb,Inst) where Inst is the last back edge to appear in Ebb, Ebb being part of the loop. Then the LICM pass would stop moving instructions around when it reaches Inst.

I will implement that later, the issue is just a reminder.

Verify instruction encodings

The verifier #3 can be extended to also verify ISA-dependent properties. We can start with instruction encodings.

ISA-dependent verification.

The verify_function() function should take an additional isa: Option<&TargetIsa> argument, like write_function() already does. Passing some ISA reference should cause additional ISA-dependent properties to be verified. The ISA-independent properties should be verified no matter what the isa argument is.

Encoding verification.

For all instructions that have a legal encoding in the func.encodings table, verify that it is consistent with the encoding chosen by isa.encode(dfg[inst]). Use TargetIsa::display_enc() to generate pretty error messages.

Instructions that don't have a legal encoding (see Encoding::is_legal()) should be ignored. Start by checking func.encodings.is_empty().

Handle bound instructions in pattern type inference

The pattern transformation syntax supports the use of polymorphic instructions with one or more type variables bound to a specific type:

widen32.legalize(
    a << add(x, y),
    Rtl(
        wx << uextend.i32(x),
        wy << uextend.i32(y),
        wa << iadd(wx, wy),
        a << ireduce(wa)
    ))

The uextend instruction is polymorphic with two type variables: The result type and the input type. The syntax uextend.i32 binds the first type variable (the result type) to i32, fixing the types of wx and wy in the example above.

The type inference system for patterns should the extended to understand this. Currently, it can't infer the types of the temporaries wx and wy.

The uextend.i32 syntax becomes a BoundInstruction instance which the cdsl.ast.Apply constructor deconstructs into the uextend instruction and a self.typevars = (i32,) tuple. Note that this is only a partial binding; the second type variable in uextend remains free.

Alternative example with the same effect:

widen32.legalize(
    a << add(x, y),
    Rtl(
        wx << uextend(x),
        wy << uextend(y),
        wa << iadd.i32(wx, wy),
        a << ireduce(wa)
    ))

This fully binds the iadd instruction which only has a single typevar.

Replace `try!` with `?`

Recently, the stable Rust compiler supports using ? in place of the Try! macro. We should switch.

This mostly affects the parser and write.rs, but there are scattered uses elsewhere.

Remove null values from entity reference newtypes

After the Rust language designers carefully removed nulls from the language, I went and added my own. It turns out that Option<T> is often larger than T, and tables with 32-bit or smaller entries can double in size when using Option<T>.

Integer index newtypes

The EntityRef types are used everywhere, and often in tables that we can't afford to double in size.

pub struct Ebb(u32);

impl EntityRef for Ebb {
    fn new(index: usize) -> Self {
        assert!(index < (u32::MAX as usize));
        Ebb(index as u32)
    }

    fn index(self) -> usize {
        self.0 as usize
    }
}

/// A guaranteed invalid EBB reference.
pub const NO_EBB: Ebb = Ebb(u32::MAX);

Again, Option<Ebb> is twice as big as Ebb, so we can't use it in tables, and we use NO_EBB as a sentinel instead. See for example the tables in layout.rs. I've tried to use Option<Ebb> where space optimization is not important, but the lack of consistency makes it tricky.

We can't expect the compiler to optimize this Option since it has no way of knowing that the u32::MAX value is reserved as a null. Rust has an unstable feature for annotating types as non-zero, but a) it is unstable and b) it seems clumsy.

The void type

There's a ir::types::VOID constant which doesn't represent a valid value type. It would be nice if we could use Option<Type> instead, but currently Type is only one byte, which is rather important.

This is a similar problem to the entity references.

Packed options

One possibility would be a PackedOption data type that works for types with a reserved value:

/// Types that have a reserved value which can't be created any other way.
trait ReservedValue {
    /// Create an instance of the reserved value.
    fn reserved() -> Self;
}

/// Packed representation of `Option<T>`.
struct PackedOption<T: ReservedValue + Eq>(T);

impl<T: ReservedValue + Eq> PackedOption<T> {
    pub is_none(&self) -> bool {
        self.0 == T::reserved()
    }

    pub is_some(&self) -> bool {
        !self.is_none()
    }

    pub fn pack(opt: Option<T>) -> PackedOption<T> {
        match opt {
            None -> T::reserved(),
            Some(t) => PackedOption(t),
        }
    }

    pub fn unpack(self) -> Option<T> {
        if self.is_none() {
            None
        } else {
            Some(self.0)
        }
    }
}

Add various Default, From, and Into impls.

Configuring Cretonne for a specific WebAssembly runtime

Cretonne is designed as a code generation library that can be used with multiple instruction set architectures and with multiple runtimes. When compiling WebAssembly, the generated code may need to behave differently for different runtimes on the same ISA. For example:

  1. WebAssembly instruction can cause traps, and we need to define what that means.
  2. Functions need to detect stack overflow and cause a trap. This can be implemented with stack guard pages or an explicit stack limit or a combination.
  3. Function calls to another WebAssembly module instance may need special handling, and we may want to support native function calls too.
  4. WebAssembly heap accesses need to be bounds checked. Runtimes have a range of options for implementing that. Generated code also needs to know how to compute the effective address of a heap access.
  5. WebAssembly global variables are tied to a module instance. Cretonne needs to know how to find them.

Prior art

Some related work has been done for Java virtual machines wanting a cleaner interface between the runtime and the JIT compiler.

Since WebAssembly's operations are generally lower level than some JVM operations, the snippet approach is probably overkill.

Design Constraints

Codegen as a Function

The code generated for a given Cretonne IL function depends only on three things:

  1. The value of ISA-independent settings as defined in meta/base/settings.py.
  2. The target ISA.
  3. ISA-dependent settings as defined in meta/isa/*/settings.py.

This functional approach is important for debugging. If a runtime using Cretonne is able to implement hooks and callbacks that change the generated code, it becomes very difficult to reproduce bugs without replicating the user's environment.

The translation from WebAssembly to Cretonne IL can be modulated by the embedder. This is less of a problem because they can submit both the WebAssembly and the initial Cretonne IL in a bug report. It would still be preferable to minimize the variation in Cretonne IL generated for a given WebAssembly function.

Enable reverse translation back to WebAssembly

There has been some talk of using Cretonne IL as an intermediate form when generating WebAssembly. It would be preferable if the IL generated when translating WASM to Cretonne were also capable of being translated back to WebAssembly.

We haven't yet defined the required invariants, but it would include an ability to identify heap and global accesses, so these operations should not be expanded to low-level code that can't be recognized.

Enable optimization

In the future, Cretonne will be able to optimize WebAssembly sandbox operations, for example by eliminating heap bounds checks. This will only work if heap accesses can be recognized by the optimizer, so they can't be expanded into low-level operations during the WASM to Cretonne translation.

Incremental CFG updates

We need a mechanism for updating the control flow graph when the code is changed. An example is an architecture that doesn't have a cmov instruction:

ebb0:
    v10 = select v1, v2, v3
    v11 = iadd_imm v10, 5

Legalizes into:

ebb0:
    brnz v1, ebb1(v2)
    jump ebb1(v3)

ebb1(vx0: i32):
    v11 = iadd_imm vx0, 5

This would require updating the CFG since ebb1 is a whole new EBB which inherits some of ebb0's successors.

There are three ways of updating the CFG after a program change:

  1. Recompute the CFG from scratch. We can do that now, but it is quite expensive in large functions.
  2. Invalidate and recompute the part of the CFG that is affected by a single EBB.
  3. Invalidate and recompute the part of the CFG that is affected by a single instruction.

Instruction-level CFG repair is potentially difficult. In the example above, ebb0 can have arbitrarily many CFG edges after the select instruction that is converted. These edges would all need to be moved to ebb1 after the change. It would not be much faster than recomputing all of the edges connecting to ebb0 and ebb1.

We should provide the second option as a CFG function for recomputing CFG edges affecting a single EBB. After performing the program change above, it should be possible to repair the CFG with code like this:

func.cfg.recompute_ebb(ebb0);
func.cfg.recompute_ebb(ebb1);

The call recompute_ebb(ebb0) should assume that ebb0 has changed, so repair the CFG in these steps:

  1. Go through the cached successor list and remove ebb0 as a predecessor of each successor block. Don't depend on the instructions currently in ebb0 to find edges. The instructions may have changed.
  2. Clear the successor list for ebb0.
  3. Scan the instructions now in ebb0 and add new CFG edges as they are discovered.

This algorithm recomputes all CFG edges from ebb0 while leaving edges to ebb0 intact.

Memory pool for value lists

The InstructionData struct is 16 bytes, and it is important to keep it small. Some instruction formats don't fit in 16 bytes, and so they use a Box<FooData> pointing to additional data allocated from the heap. Most notably, calls and branches keep their argument lists on the heap. This design has some disadvantages:

  • Creating and destroying instructions causes malloc/free traffic which can be slow.
  • Clearing the DataFlowGraph instruction list will fault in the whole array in order to call Drop code for the InstructionData members. This is slow.
  • A Vec<Value> has 24 bytes of overhead before the first value is even added.
  • A Box<Vec<Value>> leaves two pointer indirections between the instruction and the values.

These problems can be overcome by changing the large instruction representation to use references to value lists in a memory pool.

Value list pool

Value lists are similar to Vec<Value>, but with a different layout:

  • A ValueList is a 32-bit offset into the pool where the list is stored. Compare to the 24-byte footprint of a Vec.
  • The length of the value list is stored next to the list in the pool array.
  • The capacity is computed from the length, so it is not stored anywhere.

The pool is implemented as a single Vec<Value> shared by all the value lists used by a function. All manipulation of value lists goes through the pool:

pub struct ValueList(u32);
pub struct ValueListPool {...}
impl ValueListPool {
    fn alloc(&mut self, elements: &[Value]) -> ValueList;
    fn free(&mut self, vlist: ValueList);
    fn len(&self, vlist: ValueList) -> usize;
    fn get(&self, vlist: ValueList) -> &[Value];
    fn get_mut(&mut self, vlist: ValueList) -> &mut [Value];
    fn push(&mut self, vlist: &mut ValueList, element: Value);
    fn extend(&mut self, vlist: &mut ValueList, elements: &[Value]);
    fn insert(&mut self, vlist: &mut ValueList, index: usize, element: Value);
    fn remove(&mut self, vlist: &mut ValueList, index: usize);
}

The methods that change the length of a list are allowed to move it around, so they take a mutable reference to the list handle.

Implementation

In order to get amortized linear time push, lists are reallocated exponentially. Capacities are arranged into power-of-two bins, computed from the length of the list:

fn bin_for_length(length: u32) -> u32 {
    30 - (length | 3).leading_zeros()
}

fn bin_size(bin: u32) -> u32 {
    4 << bin
}

The bin sizes are always powers of two, starting from 4. The capacity of a bin is its size minus one. The first word is used to store the list length.

When lists are freed, memory can be recycled by keeping a free list for each bin.

Note that removing elements from a list may cause its capacity to shrink, and the list may have to be reallocated. This is not a problem, value lists are not used the same way as Vec in Cretonne.

Instruction formats

The large instruction formats can now be reimplemented in terms of value lists:

  • If an instruction format has too many fixed value arguments, or if it has variable_args arguments, move all value arguments to a ValueList.
  • If an instruction format has both fixed and variable_args arguments, move both to a single value_list. This applies to IndirectCall only.
  • All non-value arguments must fit in the InstructionData variant directly.

Changing the instruction formats like this has wide ranging effects:

  • Remove the boxed_storage flag from the Python InstructionFormat class. Replace it with a value_list flag.
  • Change the automatic instruction format mapping such that (value, variable_args) operands map to simply (variable_args,), and do the same for instructions with too many value operands like iadd_carry.
  • Change the generated InstBuilder methods to take ValueList arguments where needed.
  • Change the IL parser.
  • InstructionData::args() can be change to return a single slice, &[Value], which is a nice simplification.

The TernaryOverflow instruction format becomes a catch-all for instructions with many inputs and many outputs and no immediate operands. Maybe rename it to MultiAry or something better.

Write and parse value locations for EBB arguments

Value locations can be assigned to all SSA values, both instruction results and EBB arguments. Thanks to @angusholder, the parser can now understand value location annotations for instruction result values. We need the same for EBB arguments.

  • write_ebb_header() should include value locations for the EBB arguments if !func.locations.is_empty().
  • parse_ebb_arg() should parse the value location for the argument too.

There is no syntax established yet, but I suggest something that looks like the argument ABI annotations in #49:

ebb3(v4: i32 [%x10], v5: f64 [%f3], v6: i32x4 [ss4]):

When a value is not assigned to a location (ValueLoc::Unassigned), the whole annotation for that argument should be omitted instead of printing [-].

Compressed instruction encodings

It is common for instruction set architectures to provide alternative encodings for common instructions that use less code space. The compressed encodings have additional constraints on the range of immediate operands and sometimes tighter register operand constraints too.

  • RISC-V has the RVC extension which provides a 16-bit encoding for some instructions.
  • ARM's 32-bit Thumb-2 instruction set has 16-bit versions of some instructions.
  • Intel instructions often have 8-bit and 32-bit immediate operand variants.
  • 64-bit Intel instructions can sometimes be encoded without a REX prefix if the register operands are right.
  • Intel iadd can be encoded as an lea or as a smaller add instruction which has tied register operands.

Representation

Cretonne represents encoding variants of the same instruction using separate encoding recipes and bits. This also means that each recipe maps to an exact instruction size in bytes. All of the valid encodings for an instruction appear in the encoding tables used by TargetIsa::encode(). The legalizer will always pick the first encoding with a matching predicate, but there can be more encodings after that.

We should add a TargetIsa method that makes it possible to enumerate all of the valid encodings for an instruction. Probably by returning an iterator.

Selecting the best encoding

Sometimes, the legalizer is able to select a compact encoding up front. This is the case with Intel 8-bit immediates when the immediate operand is known, for example. Often, the legalizer does not have enough information to determine if a compact encoding would be valid:

  • The immediate operand may encode a stack frame offset, and the location of stack slots has not yet been determined.
  • The immediate operand my be a branch displacement, and we don't know the exact position of EBB headers yet.
  • The compact encoding would constrain the register allocator too much. This goes for both RVC and Thumb-2 encodings as well as REX prefix omission for x86-64. It is better to start with the unconstrained encoding and then decay to the compact encoding after register allocation.

Instruction compression pass

The legalizer selects a compact encoding when it has enough information to do so, and it won't constrain the register allocator.

The instruction compression pass is a pure optimization pass โ€” it can be skipped without violating correctness. It runs late when we have more information:

  • Registers have been assigned so the tighter register operand constraints of compact encodings can be checked.
  • The stack frame layout is complete, so load/store offsets are known for stack accesses.

Compressing instructions obviously changes code size, so the final offsets of EBB headers are not known, and so branch displacements are not yet known.

Verifier complaining about variable uses in unreachable code

As of now, the verifier yields an error when, while checking unreachable code, it detects the use of a variable defined in reachable code. This error is not relevant and the corresponding check should just be removed in the case of unreachable code.

Live range compression by coalescing intervals

Note: This is not the classical copy coalescing compiler optimization.

The LiveRange struct represents the live range of an SSA value as a list of intervals, one for each EBB where the value is live-in. In many cases, a value will be live through consecutive EBBs, and we can represent the live range more compactly in that case by coalescing the intervals for adjacent EBBs.

Example

ebb0:
   v1 = foo    ; inst1
   jump ebb1   ; inst9

ebb1:
   jump ebb2  ; inst19

ebb2:
   jump ebb3  ; inst29

ebb3:
   use v1     ; inst31
   return     ; inst39

Here, v1 has the def-interval inst1-inst9 and the live-in intervals ebb1-inst19, ebb2-inst29, ebb3-inst31, and that is how the intervals are currently stored in LiveRange. Since there is nothing interesting going on between adjacent EBBs, this could be represented with a single live-in interval ebb1-inst31. The def-interval stays the same.

Implementation

Coalescing two live-in intervals is only valid when they represent adjacent EBBs and the earlier interval extends all the way to the terminator of its EBB. We need a mechanism for recognizing this inter-EBB gap. The ProgramOrder trait is the right place for that functionality:

trait ProgramOrder {
    ...
    /// Is the range from `inst` to `ebb` just the gap between consecutive EBBs?
    ///
    /// This returns true if `inst` is the terminator in the EBB immediately before `ebb`.
    fn is_ebb_gap(&self, inst: Inst, ebb: Ebb) -> bool;
}

The actual coalescing happens in LiveRange::extend_in_ebb() when a new live-in interval is added. There is four possibilities:

  1. The new interval can be coalesced with both the previous and the next intervals, causing the liveins vector to shrink by 1.
  2. The new interval can be coalesced with the previous interval only.
  3. The new interval can be coalesced with the next interval only.
  4. The new interval can't be coalesced. The liveins vector grows by one.

Note that in cases 2. and 3., the elements in the liveins vector don't need to be shifted. The existing interval can be modified in place.

Verifier

Cretonne needs an IR verifier.

The verifier can be run before and after different passes. It checks those IR invariants that are not enforced by the type system.

Instruction integrity:

  • The instruction format must match the opcode.
  • All result values must be created for multi-valued instructions.
  • Instructions with no results must have a VOID first_type().
  • All referenced entities must exist. (Values, EBBs, stack slots, ...)

EBB integrity:

  • All instructions reached from the ebb_insts iterator must belong to the EBB as reported by inst_ebb().
  • Should we detect loops in the ebb_insts iterator? The current linked list representation could theoretically do it.
  • Every EBB must end in a terminator instruction, and no other instruction can be a terminator.
  • Every value in the ebb_args iterator belongs to the EBB as reported by value_ebb.

SSA form. All values used by an instruction must:

  • Be defined by an instruction that exists and that is inserted in an EBB, or be an argument of an existing EBB.
  • Dominate the instruction. This check should probably be skipped for EBBs that can't be reached from the entry.

Control flow graph and dominator tree integrity.

  • All predecessors in the CFG must be branches to the EBB.
  • All branches to an EBB must be present in the CFG.
  • A recomputed dominator tree is identical to the existing one.

Type checking:

  • Verify all input and output values against the opcode's type constraints. For polymorphic opcodes, determine the controlling type variable first.
  • Branches and jumps must pass arguments to destination EBBs that match the expected types excatly. The number of arguments must match.
  • All EBBs in a jump_table must take no arguments.
  • Function calls are type checked against their signature.
  • The entry block must take arguments that match the signature of the current function.
  • All return instructions must have return value operands matching the current function signature.

Ad hoc checking:

  • Stack slot loads and stores must be in-bounds.
  • Immediate constraints for certain opcodes, like udiv_imm v3, 0.
  • Extend / truncate instructions have more type constraints: Source type can't be larger / smaller than result type.

Parse value aliases

The write_value_aliases() function will print out any value aliases used by an instruction on lines before the instruction itself:

vx3 -> v4
v1 = iadd_imm vx3, 17

The parser in lib/reader should be taught to handle these aliases.

Since the parser is used mostly for test cases, value aliases in the input should be translated to value aliases in the in-memory IR. We don't want to resolve aliases in the parser, even though that would be easy.

  • Create value aliases as they are encountered in the source.
  • The same alias may appear multiple times. Verify that the instances are consistent.
  • A value alias may be created before the source value has been defined, so all of the aliases need to be rewritten, just like other value and EBB references.

Control flow graph

Cretonne's in-memory representation needs a control flow graph analysis. This would record for each EBB:

  • A set of predecessors, where a predecessor is an (EBB, Inst) pair identifying a basic block in an EBB.
  • A set of successor EBBs. This may not be worthwhile caching since it is easily derived from the branch instructions in the EBB.

The control flow graph is easily computed from the code, so we don't want to represent it in the textual IR. It would be useful if the iteration order of the predecessor set were stable so it doesn't change when saving out the IR and reloading it.

Write and parse ABI annotations on function signatures

In c8be39f I added support for ABI annotations on function signatures. These annotations indicate the location (stack or register) of individual function arguments and return values. We need:

  • Support for writing the ABI annotations using the correct ISA names for registers.
  • Support for parsing the ABI annotations in test files with a single unique ISA. Related to #24.

Writing ABI annotations

The new syntax for function signatures is partially implemented:

function average(i32 [%x10], i32 inreg [%x11]) -> f32 [%f10] {

If a signature has ABI annotations, they appear in brackets after each argument or return value. This is used everywhere function signatures appear:

  • In the function definition.
  • In a signature declaration in the preamble.
  • In a function declaration in the preamble.

Each argument is annotated as follows:

  • Registers: [%x10] using the display_regunit() function provided by the TargetIsa. Note that the currently committed implementation is incorrect, it just prints the register unit number [%10]. This is a stopgap until we can provide a &TargetIsa reference. It is not a syntax we want to support going forward.
  • Stack arguments: [24], where the number in brackets is the argument's byte offset in the argument array.
  • Unassigned arguments don't get an annotation in the text format.

The Signature struct also has an argument_bytes field now which indicates the needed size of the argument array. This field does not need to appear in the text format since it can be computed from the arguments.

Parsing ABI annotations

The parser should accept the annotations above and fill in the location field of the ArgumentType structs when present.

When parsing a file that doesn't specify exactly one ISA, the ABI annotations should be ignored.

After parsing a signature containing with ABI annotations, call the Signature::compute_argument_bytes() method.

test-all.sh exits early because of minor errors from flake8

For me, running test-all.sh after running "cargo clean" currently halts after running the flake8 tests:

pgarland@rimbaud:~/Hacking/cretonne$ ./test-all.sh
====== Rust formatting ======
====== Python 2.7.13, Python 3.5.4rc1 ======
=== flake8 ===
./gen_legalizer.py:22:62: E261 at least two spaces before inline comment
./gen_legalizer.py:26:37: E261 at least two spaces before inline comment
./gen_legalizer.py:27:39: E261 at least two spaces before inline comment
./test_gen_legalizer.py:7:73: E261 at least two spaces before inline comment
./test_gen_legalizer.py:8:41: E261 at least two spaces before inline comment
./test_gen_legalizer.py:9:34: E261 at least two spaces before inline comment
./test_gen_legalizer.py:11:30: E261 at least two spaces before inline comment
./test_gen_legalizer.py:12:34: E261 at least two spaces before inline comment
./test_gen_legalizer.py:13:57: E261 at least two spaces before inline comment
./test_gen_legalizer.py:18:62: E261 at least two spaces before inline comment
./semantics/init.py:5:41: E261 at least two spaces before inline comment
./semantics/init.py:6:29: E261 at least two spaces before inline comment
./semantics/init.py:7:38: E261 at least two spaces before inline comment
./semantics/init.py:8:34: E261 at least two spaces before inline comment
./semantics/init.py:9:68: E261 at least two spaces before inline comment
./semantics/elaborate.py:11:68: E261 at least two spaces before inline comment
./semantics/elaborate.py:12:33: E261 at least two spaces before inline comment
./semantics/elaborate.py:13:37: E261 at least two spaces before inline comment
./semantics/elaborate.py:14:34: E261 at least two spaces before inline comment
./semantics/primitives.py:13:20: E261 at least two spaces before inline comment
./semantics/smtlib.py:15:56: E261 at least two spaces before inline comment
./semantics/smtlib.py:16:38: E261 at least two spaces before inline comment
./semantics/smtlib.py:17:32: E261 at least two spaces before inline comment
./semantics/smtlib.py:18:34: E261 at least two spaces before inline comment
./semantics/smtlib.py:20:42: E261 at least two spaces before inline comment
./cdsl/ast.py:14:37: E261 at least two spaces before inline comment
./cdsl/instructions.py:10:28: E261 at least two spaces before inline comment
./cdsl/isa.py:352:25: W503 line break before binary operator
./cdsl/isa.py:353:25: W503 line break before binary operator
./cdsl/test_ti.py:16:70: E261 at least two spaces before inline comment
./cdsl/test_ti.py:17:62: E261 at least two spaces before inline comment
./cdsl/ti.py:10:72: E261 at least two spaces before inline comment
./cdsl/ti.py:11:64: E261 at least two spaces before inline comment
./cdsl/ti.py:13:34: E261 at least two spaces before inline comment
./cdsl/ti.py:14:26: E261 at least two spaces before inline comment
./cdsl/ti.py:15:33: E261 at least two spaces before inline comment
./cdsl/types.py:6:55: E261 at least two spaces before inline comment
./cdsl/typevar.py:13:71: E261 at least two spaces before inline comment
./cdsl/xform.py:11:37: E261 at least two spaces before inline comment

While these should possibly be fixed, they don't seem a good reason to halt the test suite.

The possible fixes that jump to my mind are:

  • don't use set -e in either test-all.sh or check.sh
  • pass the --exit-zero flag to flake8 so that it always exits with a code of 0
  • use the --ignore= or --select= flags to flake8 to explicitly specify which errors to check

I'm not a bash expert, so I may be missing other/better fixes. Of the 3 above possible fixes, passing --exit-zero to flake8 seems best as it shows the problems that flake8 discovers, but allows the rest of the tests to continue to run.

Conditional compilation of ISA targets

By default, Cretonne is a cross compiler with all supported target ISAs compiled in. That is overkill in many use cases (JITs) that only need to generate native machine code, and it causes some bloat in the code size of the Cretonne library.

We should add Cargo configuration settings to control which target ISAs are compiled into the Cretonne library. Two settings make sense:

  1. Include all target ISAs to get a full cross compiler.
  2. Include only the native ISA that Cretonne is being built for.

The build.rs build script should interpret the setting and select the set of ISAs to build using the TARGET environment variable set by Cargo. Then emit cargo:rustc-cfg=... lines for the compiler.

The isa::lookup() function needs to return two different errors: "I've never heard of that ISA" and "I know that ISA, but it's not been compiled in". This distinction is important for those file tests that request an unsupported ISA. These tests should be skipped silently. On the other hand, tests that request an unknown ISA should always fail.

Test compressed encodings

The test binemit file tests currently check against the first valid encoding of each instruction. These are the encodings that the register allocator sees.

It would also be good to test the compressed instruction encodings that will be generated when #72 is implemented.

Since there are other ways of testing the early instruction selection by the legalizer, it would make sense to change the test binemit driver to pick the smallest valid instruction encoding which:

  1. Satisfies all instruction and ISA predicates. (This is already checked by TargetIsa::legal_encodings()).
  2. Has satisfied register constraints given the register assignments in the test file.
  3. Has satisfied branch range constraints. (See the binemit/relaxation.rs pass).

Some of the Intel 64-bit encoding tests would need to be changed since they contain redundant REX prefixes. See comments on #127.

Opcode flags

Cretonne instruction opcodes should have flags that provide useful information for the verifier and optimizer. The flags would indicate constant properties of an opcode, such as:

  • is_terminator - This is a terminator instruction where control can never pass through, such as an unconditional branch, return, or trap instruction. Every EBB must end with a terminator, and terminators can't appear inside an EBB.
  • is_branch - This is a branch instruction that can transfer control to another EBB.
  • can_trap - This instruction can trap, causing control flow to diverge. Besides the obvious trap instructions, division by zero and memory access instructions can also trap. This does not make the instruction a terminator since the trapping can be conditional.

In the Python meta instruction descriptions, flags are passed as extra keyword arguments to the Instruction constructor:

brz = Instruction(
        'brz', r"""
        Branch when zero.

        If ``c`` is a :type:`b1` value, take the branch when ``c`` is false. If
        ``c`` is an integer value, take the branch when ``c = 0``.
        """,
        ins=(c, EBB, args), is_branch=True)

The gen_instr.py script should be modified to emit additional opcode methods, one for each flag:

impl Opcode {
    pub fn is_branch(self) -> bool {
        match self {
            Opcode::Brz => true
            ...
            _ => false
        }
    }
}

These flags should be generated from instruction definitions so that a target ISA can define custom terminator instructions. For example, the x86-64 ISA would define a ud2 instruction that is marked as a terminator because it always traps.

Dominator tree

Cretonne's in-memory IR needs a dominator tree analysis for verification, and likely for register allocation.

We expect most input programs to have structured control flow, so the Cooper / Harvey / Kennedy algorithm will work well.

  • Should the dominator tree be a static analysis, or should it be dynamically updated when the CFG (#1) changes?
  • How does a dominator tree interact with EBBs? The immediate dominator of every inner basic block in an EBB is obvious (it's the previous basic block), so it doesn't need to be stored. Inly the immediate dominator of an EBB is interesting.

Branch resolution and relaxation

We need to know the exact location of the EBB headers before we can emit binary machine code for branches to those EBB headers. Branches have limited ranges, and extending the range changes the size of the branch, so a non-trivial algorithm is needed to compute the final offsets of all the EBBs.

Branch relaxation

The range of branches varies a lot by ISA, and there are sometimes multiple branch encodings with different ranges:

ISA Cond. branch Uncond. branch
Intel 256 B - 4 GB 256 B - 4 GB
RISC-V 512 B - 8 KB 4 KB - 2 MB
ARM 32 512 B - 64 MB 4 KB - 64 MB
ARM 64 64 KB - 2 MB 256 MB

The range is the span of code reachable by a branch that sits approximately in the middle of the span.

Branch relaxation is the process of selecting the smallest possible branch encoding, and making sure that all branch targets are in range. Since some conditional branches have shorter ranges than unconditional branches on RISC architectures, it is sometimes necessary to replace a conditional branch with a negated branch that skips over a jump instruction to the original branch destination:

    brz v1, ebb17

=>

    brnz v1, ebb23
    jump ebb17
ebb23:
    ...

The result of the branch relaxation pass should be a table of EBB offsets and chosen encodings for all branch instructions such that the displacements can be encoded. The instruction compression in #72 does something similar for non-branch instructions. The encoding tables can probably be shared.

Optimistic algorithm

A simple iterative algorithm goes like this:

  1. Start by selecting the smallest available encoding for all branch instructions. Compute the resulting EBB offsets using the optimistic branch encodings.
  2. Visit all branch instructions and check that their target is in range. If not, repair by using a more relaxed encoding or by replacing with the branch-over-jump pattern.
  3. Repeat until convergence.

This should arrive at an optimal result (given the fixed EBB order), but requires iteration. This is the algorithm LLVM uses, except for the branch-over-jump transformation which happens earlier in ARM-specific passes.

Pessimistic algorithm

  1. Start by selecting branch encodings with the largest available range. Compute the resulting EBB offsets.
  2. Visit all branch instructions and switch to a smaller encoding where possible. Also insert branch-over-jump patterns where necessary.
  3. Repeat if any branches grew, i.e. branch-over-jump patterns were created.

This only needs iteration when branch-over-jump patterns are needed, which should not be very often. It is not guaranteed to find an optimal solution, though.

Cretonne algorithm

When implementing branch relaxation in Cretonne, we have a few special concerns:

  • We don't have a handy list of branches or basic blocks. Cretonne uses extended basic blocks which can contain multiple internal conditional branches. The data flow graph can provide lists of branch instructions in the form of EBB predecessor sets. An alternative is a linear scan of all instructions.
  • Even though we haven't emitted any binary machine code yet, we do have exact per-instruction code size information. The encoding recipes attached to all instructions each correspond to a unique instruction byte size.
  • We need to generate a list of EBB header offsets for use by the following binary emission pass.

This suggests the following variation of the optimistic algorithm:

  1. Select the smallest-range encoding for branches during legalization. This relaxation algorithm will fix any out-of-range branches.
  2. Assign a zero offset to all EBB headers. The offset is the distance in bytes from the function entry to the EBB header.
  3. Scan through all instructions in the function in layout order. Keep track of the instruction offset by using the byte size of the attached encoding recipes.
  4. When visiting a branch instruction where the target EBB is not in range, relax the encoding. Don't change the encoding if the target EBB offset hasn't been computed yet.
  5. When visiting an EBB header, update its offset with the current instruction offset. If the EBB oftset changed, make a note to iterate again.
  6. Iterate from step 3 until convergence.

This algorithm will complete in one pass for functions with a single EBB (and so only backwards branches). It will usually complete in three passes for the general case:

  1. Compute approximate offsets for all EBBs and relax backward branches.
  2. Relax forward branches,
  3. Check that all branch targets are in range.

If no forward branches need relaxing, it will complete in two passes.

It would be possible to build an auxiliary table that summarizes the non-branch instructions as just their code size, but given typical branch density and how fast it is to look up code size from encoding recipes, it is probably not worth it.

Canonical, split-invariant CFG post-order

In e0fd525 we landed the function DominatorTree::recompute_split_ebb() which updates the dominator tree and its cached CFG post-order after an EBB split. It looks like the way we repair the post order is not correct. Consider this example:

ebb0:
    brnz v1, ebb1
    brnz v2, ebb2
    brnz v3, ebb3
    brnz v4, ebb4
    jump ebb5

If we assume that all of ebb0's successors are not visited before ebb0 in the DFS, we get the following post order: ebb5, ebb4, ebb3, ebb2, ebb1, ebb0.

Now assume we split the EBB by inserting the ebb99 header:

ebb0:
    brnz v1, ebb1
    brnz v2, ebb2
    jump ebb99
ebb99
    brnz v3, ebb3
    brnz v4, ebb4
    jump ebb5

The post order is now: ebb5, ebb4, ebb3, ebb99, ebb2, ebb1, ebb0. Two claims:

  1. The current implementation of recompute_split_ebb() is not correct because it inserts the new EBB adjacent to ebb0. As the example shows, that is not always the right insertion spot.
  2. The post order currently computed by the dominator tree is split invariant. By this I mean that the effect on the post order of splitting an EBB is just to insert the new EBB header somewhere. The relative order of existing EBBs does not change. It would be good to prove this property more strictly.

Some implementation notes for fixing this:

  • Assuming the post order really is split invariant, the strategy in recompute_split_ebb() of inserting the new EBB somewhere in the existing post order is correct. It is only the insertion position that needs to be fixed.
  • The new EBB will always appear before the old EBB header in the post order.
  • The successors to the old EBB (ebb0 in the example) do not necessarily come before the old EBB in the post order. It is possible for any of the to be visited before ebb0 along another CFG path, which would cause them to appear after ebb0 in the post order.
  • If one of the successors after the split appear before ebb0 in the post order, the correct insertion point is after the first such successor.

General type inference and type checking for transformation patterns

In legalize.py and other places, we define transformation patterns like this:

# Expansions for immediates that are too large.
expand.legalize(
        a << iadd_imm(x, y),
        Rtl(
            a1 << iconst(y),
            a << iadd(x, a1)
        ))

Each transformation pattern is represented by the XForm class defined in meta/cdsl/xform.py.

We need a general type inference and type checking system for these patterns that uses the type constraints specified through type variables in the instruction definitions. The type inference is used to:

  • Infer the types of temporary symbols such as a1 in the output pattern above. The generated transformation code can't always depend on the simple forward type inference that many InstBuilder methods support.
  • Detect cases where the types of temporary symbols in the output can't be inferred from the types available in the input pattern and issue an error.
  • Verify ahead of time that the type constraints make sense, and we won't generate code that doesn't verify.
  • Inform the generation of test cases for an SMT solver. At some point we want to use z3 or another SMT solver to verify each transformation pattern. Since SMT solvers don't accept polymorphic problems, we need to select valid types for the symbols used in a pattern, or possible enumerate all valid type combinations to generate test cases.
  • Detect cases where a transformation only applies to a subset of the types that the input pattern can represent. This leads to additional preconditions to be checked before applying a pattern.

We currently have a type inference implementation that:

  • Assigns a type variable to every symbol used in the pattern. These variable can be free or derived from other type variables.
  • Collects a list of free type variables.
  • Complains of any of the free type variables are associated with a temporary symbol which only appears in the output.

The current implementation does not compute strict type sets for each type variable, and it is not clear that it can always determine a way of computing the type of a temporary symbol.

Free type variables

A general implementation of the type inference should first compute a set of free type variables, such that the type of every symbol can be computed as a function of these variables.

The type constraints imposed by the instructions in the pattern can be represented as a set of triples:

(Var, Func, Var)

Here, Var is a symbol in the pattern, and Func is one of the type variable functions listed in the TypeVar class. The meaning of a tuple (a, f, b) is the constraint that typeof(a).f() == typeof(b). The function can be TypeVar.SAMEAS to require a and b to be of the same type.

The set of tuples can also be seen as a labeled, directional graph where the nodes are symbols and the labeled edges are constraints. We need to prove that every temporary symbol (i.e., one that appears only in the output pattern) can be reached from a symbol that appears in the input pattern.

Determine a root set of free variables such that the entire graph can be reached for that root set. The set doesn't have to be minimal, but smaller is better.

If the root set can consist of mutually independent type variables, even better. This would make it much easier to enumerate all the possible type combinations for an SMT solver.

Type sets

In the instruction definitions, every type variable has an associated set of allowed types. We should compute the corresponding type sets for the free pattern type variables.

The current implementation is incomplete. It doesn't do anything with type variable functions.

Cleanup for PR #123 - Semantics

Comments that need to be addressed:

  • T = MTypeVar('T') in cdsl/ti.py is unused
  • in semantics of iadd_cin and iadd_carry should use bvsignext instead of bvzeroext on c_in.
  • Use explicit z3 bindings for smtlib.py (z3 becomes a proof dependency)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.