The cranelift source code now lives in the wasmtime repository
bytecodealliance / cranelift Goto Github PK
View Code? Open in Web Editor NEWCranelift code generator
Home Page: https://cranelift.readthedocs.io/
Cranelift code generator
Home Page: https://cranelift.readthedocs.io/
The cranelift source code now lives in the wasmtime repository
Functions with names whose character count is higher than funcname::NAME_LENGTH_THRESHOLD
(22) are not displayed properly: #6933322e6e6f5f666f6c645f636d705f735f6f6666736574
instead of i32.no_fold_cmp_s_offset
(this string has 23 characters).
The bug should be in funcname::try_as_name
.
client code (rust) -> MIR -> wasm?
will generated binary code can call FFI functions (from hosted process, no link needed)?
See rust-lang/rust#43554 for more details. We of course have to publish that crate on crates.io first.
Would be glad to hear what specific needs Cretonne has from software floating-point.
Cretonne's IL parser supports a limited form of type inference for some polymorphic instructions where the controlling type variable can be inferred from one of the input operands:
v0 = iconst.i32 2
v1 = iconst.i8 6
v2 = ishl v0, v1
The ishl
instruction is polymorphic, but its i32
result type can be inferred from the type of the v0
operand.
The parser's type inference is best effort only, and in particular, it only knows the types of values that have been defined earlier in the input text. If v0
were defined in an EBB later in the text format, we would have to use:
v2 = ishl.i32 v0, v1
It would be technically possible for the parser to defer type resolution until everything has been parsed so that more missing types could be inferred. I don't think this is desirable, though. Cretonne IL is not a programming language, it is a language for writing unit tests, so obvious typing is better. It is also possible to write test cases in Cretonne IL that are somewhat malformed. For example, defines don't have to dominate uses, and it is useful to be able to write verifier test cases like that.
The limited type inference is useful for reducing clutter, though. If a value is defined previously in the same EBB, its type is usually obvious, so there is no need repeat its type.
Currently, the instruction writer in write.rs
is assuming a very smart type inference mechanism, even smarter than what the parser can do. It should be fixed to only depend on local type inference.
Only omit the type variable when printing an instruction with a type variable operand that is defined in the same EBB.
There is a TODO in the type_suffix()
function.
Cretonne has unit tests that are run with cargo test
. These tests are important, but it is tedious to write Rust code that generates significant IR functions to test certain features.
We need a file-level test mechanism that loads IR from a .cton
file and runs a test on the functions contained. This mechanism should be controlled by a cton-util test
sub-command. Each test case is a single .cton
file containing a test command and a number of functions. The test command determines how the functions should be tested.
This assumes that the
.cton
file can be parsed correctly. Tests for parser errors should be written in a different way. Probably ascargo test
unit tests.
The cton-util test
sub-command can be used to run a single test case or a whole directory tree full of test files. When running multiple tests, multi-core machines should be exploited.
The test parse-roundtrip
command is used to verify the IL parser and serializer. For each function in the file this tests that:
If the file contains two consecutive functions with the same name, the text produced by serializing the first function is compared against the text of the second function. This is used to test certain canonicalizing features of the parser: Comments are removed and whitespace is canonicalized, for example.
This type of test is also used to check that all IR features and attributes are preserved in the serialization roundtrip. The textual compare is rough, but it doesn't miss anything. If we compared the in-memory IR, the comparison function would need constant updating to avoid missing new features.
An IR-text-IR roundtrip test would also be useful. This should focus on preserving things like CFG predecessor list orders and other derived properties not stored directly in the IR. This kind of testing is more difficult because IR just read from a file has less variance than IR coming out of some optimization pass. For example, if a test case runs text-IR-text-IR and compares the CFG predecessor lists in the two IRs, it is unlikely to find any differences sine those CFGs were just computed from the text.
The test verify-ok
command is used to test the IR verifier. For each function in the file this tests that:
The test verify-fail
command is used to test verifier failures. For each function in the file this tests that:
Syntax TBD, but we want to be able to inject an IR function into a specific pass and run FileCheck-style checks against the output.
Instructions such as sextend, uextend and ireduce require a type constraint between the widths of the argument and the result value. One way to implement this is to:
~Dimo
After the Rust language designers carefully removed nulls from the language, I went and added my own. It turns out that Option<T>
is often larger than T
, and tables with 32-bit or smaller entries can double in size when using Option<T>
.
Opcode
enumThe Opcode
enum in the generated opcodes.rs
file contains an explicit null value:
pub enum Opcode {
/// An invalid opcode.
NotAnOpcode,
/// `jump EBB, args`. (Jump)
Jump,
/// `brz c, EBB, args`. (Branch)
/// Type inferred from `c`.
Brz,
...
}
With the current opcodes, Opcode
is a single byte, but Option<Opcode>
is two bytes. We use the explicit null to indicate the empty slots in a precomputed hash table:
const OPCODE_HASH_TABLE: [Opcode; 128] = [
Opcode::Imul,
Opcode::Fsub,
Opcode::IconcatLohi,
Opcode::Sshr,
Opcode::Fmaxnum,
Opcode::NotAnOpcode,
Opcode::Nearest,
...
}
This table would double in size if we used Option<Opcode>
. Since this table is only used by the parser when mapping strings to opcodes, perhaps we should just bite the bullet and use an Option<Opcode>
here. Hopefully, the Rust compiler will soon learn how to represent this sum type efficiently. It's one of the easier nested enum optimizations.
Encoding tables also use NotAnOpcode
currently. These tables are more critical to keep small, but there may be an alternative representation of the empty slots.
In c8be39f I added support for ABI annotations on function signatures. These annotations indicate the location (stack or register) of individual function arguments and return values. We need:
The new syntax for function signatures is partially implemented:
function average(i32 [%x10], i32 inreg [%x11]) -> f32 [%f10] {
If a signature has ABI annotations, they appear in brackets after each argument or return value. This is used everywhere function signatures appear:
signature
declaration in the preamble.function
declaration in the preamble.Each argument is annotated as follows:
[%x10]
using the display_regunit()
function provided by the TargetIsa
. Note that the currently committed implementation is incorrect, it just prints the register unit number [%10]
. This is a stopgap until we can provide a &TargetIsa
reference. It is not a syntax we want to support going forward.[24]
, where the number in brackets is the argument's byte offset in the argument array.The Signature
struct also has an argument_bytes
field now which indicates the needed size of the argument array. This field does not need to appear in the text format since it can be computed from the arguments.
The parser should accept the annotations above and fill in the location
field of the ArgumentType
structs when present.
When parsing a file that doesn't specify exactly one ISA, the ABI annotations should be ignored.
After parsing a signature containing with ABI annotations, call the Signature::compute_argument_bytes()
method.
The verifier #3 can be extended to also verify ISA-dependent properties. We can start with instruction encodings.
The verify_function()
function should take an additional isa: Option<&TargetIsa>
argument, like write_function()
already does. Passing some ISA reference should cause additional ISA-dependent properties to be verified. The ISA-independent properties should be verified no matter what the isa
argument is.
For all instructions that have a legal encoding in the func.encodings
table, verify that it is consistent with the encoding chosen by isa.encode(dfg[inst])
. Use TargetIsa::display_enc()
to generate pretty error messages.
Instructions that don't have a legal encoding (see Encoding::is_legal()
) should be ignored. Start by checking func.encodings.is_empty()
.
Right now, the loop analysis describes the loop by a header Ebb
and a list of Ebb
s pertaining to the loop. However, because Ebb
are extended basic blocks, this description include instructions that are not part of the loop (they are after the back edge branch instruction in the Ebb
),
This is a problem because the LICM pass will move upstream instructions that are not in the loop. It does't break the code because moving SSA definitions upwards (in the dominator tree) is always allowed but it will break code locality.
The way to solve this problem is to represent loops in the loop analysis as a list of couples (Ebb,Inst)
where Inst
is the last back edge to appear in Ebb
, Ebb
being part of the loop. Then the LICM pass would stop moving instructions around when it reaches Inst
.
I will implement that later, the issue is just a reminder.
The language reference is not so clear on what happens when floating point arithmetic instructions produce a NaN. We should specify that.
The specification should be tight enough that WebAssembly semantics are simple to represent, and loose enough that we don't need lots of NaN adjustment code emitted for fadd
on common targets.
Copying the WebAssembly semantics is probably a good choice.
The write_instruction()
function will print out the encoding of an instruction if it has been set:
[R#0c] v5 = iadd v1, v2
The parser should decode these annotations and fill out the Function::encodings
table.
Since the encoding recipe (R
) is ISA-dependent, we probably can't support encodings in a test file with multiple ISAs.
See the DisplayEncoding
struct for details of the format.
The LiveRange
struct currently uses a Vec<Interval>
to store its list of live-in intervals. This has some problems:
LiveInterval
has the same weakness, and it hasn't yet caused known compile time issues.)These problems can be fixed in a way similar to #35, but instead of allocating lists from a memory pool, we allocate B+-tree nodes. This is similar to representing the live-intervals in a BTreeMap<Ebb, Inst>
. Unfortunately, Rust's standard B-tree is not useful to us since it allocates nodes from the heap, and it doesn't provide a rich enough interface to allow things like coalescing intervals, see #37.
LiveRange::liveins
vector becomes a 32-bit node reference with a sentinel value to indicate 'empty'.The B-tree node is defined as an enum:
enum Node {
Inner {
size: u8,
keys: [K; 7],
nodes: [NodeRef; 8],
}
Leaf {
size: u8,
keys: [K; 7],
values: [V; 7],
}
}
Filecheck is currently very stringent in matching whitespace of tests exactly, which leads to undesirable situations of having to add a regex match to pickup excess spaces between identifiers.
For example return v0,v1
isn't valid and must be return v0, v1
which is subtle and goes against intuition.
Another situation is that the code that stringifies IR can insert many spaces in unexpected places, such that when round-tripped through the a parse and restringify, the following:
[-,-] v2 = iadd v0, v1
becomes
[-] v2 = iadd v0, v1
meaning that a test to match it must look like the following:
; regex: WS=[ \t]*
; nextln: [-]$WS $v2 = iadd $v0, $v1
We should examine where (if anywhere) whitespace sensitivity is suitable, and remove it every else so that writing these common test cases has less friction.
Hello!
Been following this crate off and on, great work!
So, also been poking around every now and then at your native code generators. Now, from various issues #30 and internals thread, I thought that this wasn't a wasm targetting compiler, but was being considered, in some distant future, for a different rustc backend, which would entail native code generation and executable generation for say, x86 at the very least.
Now, hopefully I'm not barking up the wrong tree, but it does appear that there are, for example, arm and x86 asm emitters; if this is the case:
I ask, because it just so happens that I finished a working prototype for a pure rust object file emitter (emphasis mine). The repo is here (but be warned, it is really, really rough, and certain things are hardcoded at the moment)
Wow cool! But what do I mean exactly?
Here is a working snippet for x86_64 (working as in it is code that compiles, runs, and outputs a correct object file):
// try this for dynamically linked file
// ld -e _start -I/usr/lib/ld-linux-x86-64.so.2 -L/usr/lib/ /usr/lib/crti.o /usr/lib/Scrt1.o /usr/lib/crtn.o test.o -lc -o test
fn run () -> error::Result<()> {
for (i, arg) in env::args().enumerate() {
if i == 1 {
let mut obj: Elf = Artifact::new(Target::X86_64, Some(arg.to_owned()));
// 55 push %rbp
// 48 89 e5 mov %rsp,%rbp
// b8 ef be ad de mov $0xdeadbeef,%eax
// 5d pop %rbp
// c3 retq
obj.add_code("deadbeef".to_owned(), vec![0x55, 0x48, 0x89, 0xe5, 0xb8, 0xef, 0xbe, 0xad, 0xde, 0x5d, 0xc3]);
// main:
// 55 push %rbp
// 48 89 e5 mov %rsp,%rbp
// b8 00 00 00 00 mov $0x0,%eax
// e8 d4 ff ff ff callq 0x0 <deadbeef>
// 89 c6 mov %eax,%esi
// 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # will be: deadbeef: 0x%x\n
// b8 00 00 00 00 mov $0x0,%eax
// e8 00 00 00 00 callq 0x3f <main+33> # printf
// b8 00 00 00 00 mov $0x0,%eax
// 5d pop %rbp
// c3 retq
obj.add_code("main".to_owned(), vec![0x55, 0x48, 0x89, 0xe5, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xe8, 0xe2, 0xff, 0xff, 0xff, 0x89, 0xc6, 0x48, 0x8d, 0x3d, 0x00, 0x00, 0x00, 0x00, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xe8, 0x00, 0x00, 0x00, 0x00, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x5d, 0xc3]);
obj.add_data("str.1".to_owned(), b"deadbeef: 0x%x\n\0".to_vec());
obj.link("main", "str.1", 19); // offset we need to relocate some data in main
obj.import("printf".to_owned());
obj.link_import("main", "printf", 29); // offset we need to relocate an import in main
println!("res: {:#?}", obj);
obj.write(::std::fs::File::create(Path::new(&arg))?)?;
}
}
Ok(())
}
So, for the current form, the client basically specifies a Target
, and a name, and the type determines the container format, in this case, ELF:
let mut obj: Elf = Artifact::new(Target::X86_64, Some(arg.to_owned()));
Raw intel x86_64 asm code can then be appended to this object, via:
obj.add_code("deadbeef".to_owned(), vec![0x55, 0x48, 0x89, 0xe5, 0xb8, 0xef, 0xbe, 0xad, 0xde, 0x5d, 0xc3])
which adds the function deadbeef
with the associated, routine to the object file.
As you can see above, imports can be added, and relocations can be added (more on this later).
The call to write then emits, in this case, a correctly formed, and eminently linkable ELF64 object file.
If you ran this code and gave it the name test.o
, and then ran (on my linux system):
ld -e _start -I/usr/lib/ld-linux-x86-64.so.2 -L/usr/lib/ /usr/lib/crti.o /usr/lib/Scrt1.o /usr/lib/crtn.o test.o -lc -o test
a dynamically linked executable that calls printf
called test
would be produced, which prints (in this case): "beef: 0xdeadbeef"
If this interests you (you could test your x86 code gen pretty much right now, with some caveats, which i won't get into right now), let me know, as:
Right now 2. may be too much to bite off at once (and the plan is always to give the API an escape hatch and provide compiler writers a direct relocation insertion api), but i do think something like it would be beneficial, and very cool.
So let me know what you think, or if this isn't appropriate for you at all right now, I'll be sad, but that's ok too! ๐
The verifier (#3) is beginning to find real bugs now. Its method for displaying verifier errors could be improved. Here's an example of the current output:
$ target/debug/cton-util test filetests/isa/riscv/expand-i32.cton
FAIL filetests/isa/riscv/expand-i32.cton: inst1: arg 0 (v0) has type i16, must match function signature of i32
inst1: return v0
function large_imm(i16) -> i32 {
ebb0(vx0: i16):
v0 = iadd_imm vx0, 0x3b9a_ca00
return v0
}
The reference to inst1
is cryptic because instruction references don't appear in the textual IL, so we print the instruction itself in the first line. This is not a great solution.
It would be much better to underline the offending instruction, like the filecheck tests do:
FAIL filetests/isa/riscv/expand-i32.cton: inst1: arg 0 (v0) has type i16, must match function signature of i32
function large_imm(i16) -> i32 {
ebb0(vx0: i16):
v0 = iadd_imm vx0, 0x3b9a_ca00
return v0
^~~~~~~~~
verifier: arg 0 (v0) has type i16, must match function signature of i32
}
One possible implementation is to move the guts of write_function
into a more general
pub fn decorate_function<WL: FnMut(AnyEntity, usize, &str) -> fmt::Result>(w: WF, func: &Function, isa: Option<&TargetIsa>) -> fmt::Result;
The WL
closure would be called to write out each line with the arguments: The entity being defined on the line, the indentation to insert before the line, and the contents of the line itself.
@sunfishcode and I ran into a problem when translating wasm into Cretonne IL. In wasm, the br_table
instruction actually pops some return values from the stack and passes them to the block it is jumping to. However in Cretonne IL, contrary to the other branch instructions, br_table
does not support jump arguments. This means that the Ebb
that are in the jump table used by br_table
must not have arguments. We don't know if this is done on purpose or not.
Here is the alternative space for solving this inconsistency:
either add an jump arguments field for the Cretonne IL br_table
and require that all Ebb
s inside a jump table have the same arguments count and types ;
either leave br_table
as is but then it would mean that we would need to use stack_store
and stack_load
when we want to pass arguments to an Ebb
that is target of a br_table
instruction.
The InstructionData
struct is 16 bytes, and it is important to keep it small. Some instruction formats don't fit in 16 bytes, and so they use a Box<FooData>
pointing to additional data allocated from the heap. Most notably, calls and branches keep their argument lists on the heap. This design has some disadvantages:
DataFlowGraph
instruction list will fault in the whole array in order to call Drop
code for the InstructionData
members. This is slow.Vec<Value>
has 24 bytes of overhead before the first value is even added.Box<Vec<Value>>
leaves two pointer indirections between the instruction and the values.These problems can be overcome by changing the large instruction representation to use references to value lists in a memory pool.
Value lists are similar to Vec<Value>
, but with a different layout:
ValueList
is a 32-bit offset into the pool where the list is stored. Compare to the 24-byte footprint of a Vec
.The pool is implemented as a single Vec<Value>
shared by all the value lists used by a function. All manipulation of value lists goes through the pool:
pub struct ValueList(u32);
pub struct ValueListPool {...}
impl ValueListPool {
fn alloc(&mut self, elements: &[Value]) -> ValueList;
fn free(&mut self, vlist: ValueList);
fn len(&self, vlist: ValueList) -> usize;
fn get(&self, vlist: ValueList) -> &[Value];
fn get_mut(&mut self, vlist: ValueList) -> &mut [Value];
fn push(&mut self, vlist: &mut ValueList, element: Value);
fn extend(&mut self, vlist: &mut ValueList, elements: &[Value]);
fn insert(&mut self, vlist: &mut ValueList, index: usize, element: Value);
fn remove(&mut self, vlist: &mut ValueList, index: usize);
}
The methods that change the length of a list are allowed to move it around, so they take a mutable reference to the list handle.
In order to get amortized linear time push
, lists are reallocated exponentially. Capacities are arranged into power-of-two bins, computed from the length of the list:
fn bin_for_length(length: u32) -> u32 {
30 - (length | 3).leading_zeros()
}
fn bin_size(bin: u32) -> u32 {
4 << bin
}
The bin sizes are always powers of two, starting from 4. The capacity of a bin is its size minus one. The first word is used to store the list length.
When lists are freed, memory can be recycled by keeping a free list for each bin.
Note that removing elements from a list may cause its capacity to shrink, and the list may have to be reallocated. This is not a problem, value lists are not used the same way as
Vec
in Cretonne.
The large instruction formats can now be reimplemented in terms of value lists:
variable_args
arguments, move all value arguments to a ValueList
.variable_args
arguments, move both to a single value_list. This applies to IndirectCall
only.InstructionData
variant directly.Changing the instruction formats like this has wide ranging effects:
boxed_storage
flag from the Python InstructionFormat
class. Replace it with a value_list
flag.(value, variable_args)
operands map to simply (variable_args,)
, and do the same for instructions with too many value operands like iadd_carry
.InstBuilder
methods to take ValueList
arguments where needed.InstructionData::args()
can be change to return a single slice, &[Value]
, which is a nice simplification.The TernaryOverflow
instruction format becomes a catch-all for instructions with many inputs and many outputs and no immediate operands. Maybe rename it to MultiAry
or something better.
I didn't know that we have generated documentation: http://cretonne.readthedocs.io/en/latest/
Probably it makes sense to mention it in the readme.
rustc 1.13 doesn't build, but the documentation mentions everything more recent than 1.12 should work fine. I think suggesting 1.15.1 is more reasonable, but up to you.
Example error while running the unitttest:
====== Rust cretonne unit tests ======
Compiling cretonne v0.0.0 (file:///Users/davide/ng/cretonne/lib/cretonne)
error[E0433]: failed to resolve. Use of undeclared type or module `ArgumentLoc`
--> lib/cretonne/src/ir/extfunc.rs:198:42
|
198 | sig.argument_types[1].location = ArgumentLoc::Stack(8);
| ^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `ArgumentLoc`
error[E0433]: failed to resolve. Use of undeclared type or module `ArgumentLoc`
--> lib/cretonne/src/ir/extfunc.rs:203:42
|
203 | sig.argument_types[0].location = ArgumentLoc::Stack(24);
| ^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `ArgumentLoc`
error: aborting due to 2 previous errors
error: Could not compile `cretonne`.
The verifier::verify_context()
function checks that the dominator tree in the context matches a newly computed one. We should do the same thing for the control flow graph.
There is an existing cfg_integrity
method, but it just checks the just-computed CFG against the function. It doesn't look at the CFG in the context.
cc @angusholder
Cretonne needs an IR verifier.
The verifier can be run before and after different passes. It checks those IR invariants that are not enforced by the type system.
Instruction integrity:
VOID
first_type()
.EBB integrity:
ebb_insts
iterator must belong to the EBB as reported by inst_ebb()
.ebb_insts
iterator? The current linked list representation could theoretically do it.ebb_args
iterator belongs to the EBB as reported by value_ebb
.SSA form. All values used by an instruction must:
Control flow graph and dominator tree integrity.
Type checking:
jump_table
must take no arguments.return
instructions must have return value operands matching the current function signature.Ad hoc checking:
udiv_imm v3, 0
.Cretonne is designed as a code generation library that can be used with multiple instruction set architectures and with multiple runtimes. When compiling WebAssembly, the generated code may need to behave differently for different runtimes on the same ISA. For example:
Some related work has been done for Java virtual machines wanting a cleaner interface between the runtime and the JIT compiler.
Since WebAssembly's operations are generally lower level than some JVM operations, the snippet approach is probably overkill.
The code generated for a given Cretonne IL function depends only on three things:
meta/base/settings.py
.meta/isa/*/settings.py
.This functional approach is important for debugging. If a runtime using Cretonne is able to implement hooks and callbacks that change the generated code, it becomes very difficult to reproduce bugs without replicating the user's environment.
The translation from WebAssembly to Cretonne IL can be modulated by the embedder. This is less of a problem because they can submit both the WebAssembly and the initial Cretonne IL in a bug report. It would still be preferable to minimize the variation in Cretonne IL generated for a given WebAssembly function.
There has been some talk of using Cretonne IL as an intermediate form when generating WebAssembly. It would be preferable if the IL generated when translating WASM to Cretonne were also capable of being translated back to WebAssembly.
We haven't yet defined the required invariants, but it would include an ability to identify heap and global accesses, so these operations should not be expanded to low-level code that can't be recognized.
In the future, Cretonne will be able to optimize WebAssembly sandbox operations, for example by eliminating heap bounds checks. This will only work if heap accesses can be recognized by the optimizer, so they can't be expanded into low-level operations during the WASM to Cretonne translation.
It is common for instruction set architectures to provide alternative encodings for common instructions that use less code space. The compressed encodings have additional constraints on the range of immediate operands and sometimes tighter register operand constraints too.
iadd
can be encoded as an lea
or as a smaller add
instruction which has tied register operands.Cretonne represents encoding variants of the same instruction using separate encoding recipes and bits. This also means that each recipe maps to an exact instruction size in bytes. All of the valid encodings for an instruction appear in the encoding tables used by TargetIsa::encode()
. The legalizer will always pick the first encoding with a matching predicate, but there can be more encodings after that.
We should add a TargetIsa
method that makes it possible to enumerate all of the valid encodings for an instruction. Probably by returning an iterator.
Sometimes, the legalizer is able to select a compact encoding up front. This is the case with Intel 8-bit immediates when the immediate operand is known, for example. Often, the legalizer does not have enough information to determine if a compact encoding would be valid:
The legalizer selects a compact encoding when it has enough information to do so, and it won't constrain the register allocator.
The instruction compression pass is a pure optimization pass โ it can be skipped without violating correctness. It runs late when we have more information:
Compressing instructions obviously changes code size, so the final offsets of EBB headers are not known, and so branch displacements are not yet known.
The test binemit
file tests currently check against the first valid encoding of each instruction. These are the encodings that the register allocator sees.
It would also be good to test the compressed instruction encodings that will be generated when #72 is implemented.
Since there are other ways of testing the early instruction selection by the legalizer, it would make sense to change the test binemit
driver to pick the smallest valid instruction encoding which:
TargetIsa::legal_encodings()
).binemit/relaxation.rs
pass).Some of the Intel 64-bit encoding tests would need to be changed since they contain redundant REX prefixes. See comments on #127.
In legalize.py
and other places, we define transformation patterns like this:
# Expansions for immediates that are too large.
expand.legalize(
a << iadd_imm(x, y),
Rtl(
a1 << iconst(y),
a << iadd(x, a1)
))
Each transformation pattern is represented by the XForm
class defined in meta/cdsl/xform.py
.
We need a general type inference and type checking system for these patterns that uses the type constraints specified through type variables in the instruction definitions. The type inference is used to:
a1
in the output pattern above. The generated transformation code can't always depend on the simple forward type inference that many InstBuilder
methods support.We currently have a type inference implementation that:
The current implementation does not compute strict type sets for each type variable, and it is not clear that it can always determine a way of computing the type of a temporary symbol.
A general implementation of the type inference should first compute a set of free type variables, such that the type of every symbol can be computed as a function of these variables.
The type constraints imposed by the instructions in the pattern can be represented as a set of triples:
(Var, Func, Var)
Here, Var
is a symbol in the pattern, and Func
is one of the type variable functions listed in the TypeVar
class. The meaning of a tuple (a, f, b)
is the constraint that typeof(a).f() == typeof(b)
. The function can be TypeVar.SAMEAS
to require a
and b
to be of the same type.
The set of tuples can also be seen as a labeled, directional graph where the nodes are symbols and the labeled edges are constraints. We need to prove that every temporary symbol (i.e., one that appears only in the output pattern) can be reached from a symbol that appears in the input pattern.
Determine a root set of free variables such that the entire graph can be reached for that root set. The set doesn't have to be minimal, but smaller is better.
If the root set can consist of mutually independent type variables, even better. This would make it much easier to enumerate all the possible type combinations for an SMT solver.
In the instruction definitions, every type variable has an associated set of allowed types. We should compute the corresponding type sets for the free pattern type variables.
The current implementation is incomplete. It doesn't do anything with type variable functions.
When defining AST nodes, it is possible to use instructions with bound type variables:
Rtl(
a << udiv.i32(x, y)
)
The ValueType
s for the bound type variables appears in ast.Apply.typevars
:
def __init__(self, inst, args):
# type: (instructions.MaybeBoundInst, Tuple[Expr, ...]) -> None # noqa
if isinstance(inst, instructions.BoundInstruction):
self.inst = inst.inst
self.typevars = inst.typevars
else:
assert isinstance(inst, instructions.Instruction)
self.inst = inst
self.typevars = ()
These bound type variables should be used top constrain type inference further. It effectively converts the involved type variables to singletons.
The typevars
field is a Tuple[ValueType, ...]
. A None
entry means the the type variable is not constrained.
cc @d1m0
As of now, the verifier yields an error when, while checking unreachable code, it detects the use of a variable defined in reachable code. This error is not relevant and the corresponding check should just be removed in the case of unreachable code.
The cretonne::settings
module defines a Configurable
trait which can be used to manipulate settings one at a time. In most realistic use cases, a client wants to choose between a few fixed configurations, e.g. first-tier JIT vs last-tier JIT or debug build vs release build.
Add named 'presets' which can apply a whole collection of settings at once:
trait Configurable {
/// Apply all the settings in the named preset.
fn preset(name: &str);
}
The presets can be defined in settings.py
as a simple dictionary mapping settings to their preset values:
release_mode = Preset(
"""Settings for release builds""",
{
opt_level: 'best',
enable_simd: true
})
Recently, the stable Rust compiler supports using ?
in place of the Try!
macro. We should switch.
This mostly affects the parser and write.rs
, but there are scattered uses elsewhere.
Comments that need to be addressed:
Cretonne's in-memory representation needs a control flow graph analysis. This would record for each EBB:
The control flow graph is easily computed from the code, so we don't want to represent it in the textual IR. It would be useful if the iteration order of the predecessor set were stable so it doesn't change when saving out the IR and reloading it.
We are missing conversion instructions for the boolean types:
breduce
and bextend
to match the integer equivalents.Integer-to-bool conversion is best handled by the icmp_imm
instruction.
Suggested to-int conversions:
v1 = bint.i32 v0 ; 0x00000001
v2 = bmask.i16 v0 ; 0xffff
The TargetIsa::encode()
function returns the first encoding of an instruction where both ISA and instruction predicates are satisfied. We also need a way of enumerating all of the possible encodings of an instruction, probably by creating an iterator that returns encodings.
Instruction encodings can have multiple constraints that are checked at different points in the compiler pipeline:
imul.i32
instruction can only be encoded if the RISC-V "M" extension is supported by the CPU.iadd_imm.i32
can only be encoded as a RISC-V addi
instruction if the immediate is a 12-bit signed integer.RecipeConstraints
available from EncInfo
and satisfied by the register allocator.BranchRange
constraints also available from EncInfo
.Only the first two types of constraints are checked by the iterator, just like TargetISa::encode()
does it. This iterator would be used by:
The tables that are generated in files like encoding-riscv.rs
should already be usable for this. The implementation of TargetIsa::encode()
stops at the first match. The iterator would keep going until reaching a stop code.
cc @anholt
By default, Cretonne is a cross compiler with all supported target ISAs compiled in. That is overkill in many use cases (JITs) that only need to generate native machine code, and it causes some bloat in the code size of the Cretonne library.
We should add Cargo configuration settings to control which target ISAs are compiled into the Cretonne library. Two settings make sense:
The build.rs
build script should interpret the setting and select the set of ISAs to build using the TARGET
environment variable set by Cargo. Then emit cargo:rustc-cfg=...
lines for the compiler.
The isa::lookup()
function needs to return two different errors: "I've never heard of that ISA" and "I know that ISA, but it's not been compiled in". This distinction is important for those file tests that request an unsupported ISA. These tests should be skipped silently. On the other hand, tests that request an unknown ISA should always fail.
Value locations can be assigned to all SSA values, both instruction results and EBB arguments. Thanks to @angusholder, the parser can now understand value location annotations for instruction result values. We need the same for EBB arguments.
write_ebb_header()
should include value locations for the EBB arguments if !func.locations.is_empty()
.parse_ebb_arg()
should parse the value location for the argument too.There is no syntax established yet, but I suggest something that looks like the argument ABI annotations in #49:
ebb3(v4: i32 [%x10], v5: f64 [%f3], v6: i32x4 [ss4]):
When a value is not assigned to a location (ValueLoc::Unassigned
), the whole annotation for that argument should be omitted instead of printing [-]
.
The pattern transformation syntax supports the use of polymorphic instructions with one or more type variables bound to a specific type:
widen32.legalize(
a << add(x, y),
Rtl(
wx << uextend.i32(x),
wy << uextend.i32(y),
wa << iadd(wx, wy),
a << ireduce(wa)
))
The uextend
instruction is polymorphic with two type variables: The result type and the input type. The syntax uextend.i32
binds the first type variable (the result type) to i32
, fixing the types of wx
and wy
in the example above.
The type inference system for patterns should the extended to understand this. Currently, it can't infer the types of the temporaries wx
and wy
.
The uextend.i32
syntax becomes a BoundInstruction
instance which the cdsl.ast.Apply
constructor deconstructs into the uextend
instruction and a self.typevars = (i32,)
tuple. Note that this is only a partial binding; the second type variable in uextend
remains free.
Alternative example with the same effect:
widen32.legalize(
a << add(x, y),
Rtl(
wx << uextend(x),
wy << uextend(y),
wa << iadd.i32(wx, wy),
a << ireduce(wa)
))
This fully binds the iadd
instruction which only has a single typevar.
We need a mechanism for updating the control flow graph when the code is changed. An example is an architecture that doesn't have a cmov
instruction:
ebb0:
v10 = select v1, v2, v3
v11 = iadd_imm v10, 5
Legalizes into:
ebb0:
brnz v1, ebb1(v2)
jump ebb1(v3)
ebb1(vx0: i32):
v11 = iadd_imm vx0, 5
This would require updating the CFG since ebb1
is a whole new EBB which inherits some of ebb0
's successors.
There are three ways of updating the CFG after a program change:
Instruction-level CFG repair is potentially difficult. In the example above, ebb0
can have arbitrarily many CFG edges after the select
instruction that is converted. These edges would all need to be moved to ebb1
after the change. It would not be much faster than recomputing all of the edges connecting to ebb0
and ebb1
.
We should provide the second option as a CFG function for recomputing CFG edges affecting a single EBB. After performing the program change above, it should be possible to repair the CFG with code like this:
func.cfg.recompute_ebb(ebb0);
func.cfg.recompute_ebb(ebb1);
The call recompute_ebb(ebb0)
should assume that ebb0
has changed, so repair the CFG in these steps:
ebb0
as a predecessor of each successor block. Don't depend on the instructions currently in ebb0
to find edges. The instructions may have changed.ebb0
.ebb0
and add new CFG edges as they are discovered.This algorithm recomputes all CFG edges from ebb0
while leaving edges to ebb0
intact.
After the Rust language designers carefully removed nulls from the language, I went and added my own. It turns out that Option<T>
is often larger than T
, and tables with 32-bit or smaller entries can double in size when using Option<T>
.
The EntityRef
types are used everywhere, and often in tables that we can't afford to double in size.
pub struct Ebb(u32);
impl EntityRef for Ebb {
fn new(index: usize) -> Self {
assert!(index < (u32::MAX as usize));
Ebb(index as u32)
}
fn index(self) -> usize {
self.0 as usize
}
}
/// A guaranteed invalid EBB reference.
pub const NO_EBB: Ebb = Ebb(u32::MAX);
Again, Option<Ebb>
is twice as big as Ebb
, so we can't use it in tables, and we use NO_EBB
as a sentinel instead. See for example the tables in layout.rs
. I've tried to use Option<Ebb>
where space optimization is not important, but the lack of consistency makes it tricky.
We can't expect the compiler to optimize this Option
since it has no way of knowing that the u32::MAX
value is reserved as a null. Rust has an unstable feature for annotating types as non-zero, but a) it is unstable and b) it seems clumsy.
There's a ir::types::VOID
constant which doesn't represent a valid value type. It would be nice if we could use Option<Type>
instead, but currently Type
is only one byte, which is rather important.
This is a similar problem to the entity references.
One possibility would be a PackedOption
data type that works for types with a reserved value:
/// Types that have a reserved value which can't be created any other way.
trait ReservedValue {
/// Create an instance of the reserved value.
fn reserved() -> Self;
}
/// Packed representation of `Option<T>`.
struct PackedOption<T: ReservedValue + Eq>(T);
impl<T: ReservedValue + Eq> PackedOption<T> {
pub is_none(&self) -> bool {
self.0 == T::reserved()
}
pub is_some(&self) -> bool {
!self.is_none()
}
pub fn pack(opt: Option<T>) -> PackedOption<T> {
match opt {
None -> T::reserved(),
Some(t) => PackedOption(t),
}
}
pub fn unpack(self) -> Option<T> {
if self.is_none() {
None
} else {
Some(self.0)
}
}
}
Add various Default
, From
, and Into
impls.
Cretonne instruction opcodes should have flags that provide useful information for the verifier and optimizer. The flags would indicate constant properties of an opcode, such as:
is_terminator
- This is a terminator instruction where control can never pass through, such as an unconditional branch, return, or trap instruction. Every EBB must end with a terminator, and terminators can't appear inside an EBB.is_branch
- This is a branch instruction that can transfer control to another EBB.can_trap
- This instruction can trap, causing control flow to diverge. Besides the obvious trap
instructions, division by zero and memory access instructions can also trap. This does not make the instruction a terminator since the trapping can be conditional.In the Python meta instruction descriptions, flags are passed as extra keyword arguments to the Instruction
constructor:
brz = Instruction(
'brz', r"""
Branch when zero.
If ``c`` is a :type:`b1` value, take the branch when ``c`` is false. If
``c`` is an integer value, take the branch when ``c = 0``.
""",
ins=(c, EBB, args), is_branch=True)
The gen_instr.py
script should be modified to emit additional opcode methods, one for each flag:
impl Opcode {
pub fn is_branch(self) -> bool {
match self {
Opcode::Brz => true
...
_ => false
}
}
}
These flags should be generated from instruction definitions so that a target ISA can define custom terminator instructions. For example, the x86-64 ISA would define a ud2
instruction that is marked as a terminator because it always traps.
In e0fd525 we landed the function DominatorTree::recompute_split_ebb()
which updates the dominator tree and its cached CFG post-order after an EBB split. It looks like the way we repair the post order is not correct. Consider this example:
ebb0:
brnz v1, ebb1
brnz v2, ebb2
brnz v3, ebb3
brnz v4, ebb4
jump ebb5
If we assume that all of ebb0's successors are not visited before ebb0 in the DFS, we get the following post order: ebb5, ebb4, ebb3, ebb2, ebb1, ebb0.
Now assume we split the EBB by inserting the ebb99 header:
ebb0:
brnz v1, ebb1
brnz v2, ebb2
jump ebb99
ebb99
brnz v3, ebb3
brnz v4, ebb4
jump ebb5
The post order is now: ebb5, ebb4, ebb3, ebb99, ebb2, ebb1, ebb0. Two claims:
recompute_split_ebb()
is not correct because it inserts the new EBB adjacent to ebb0. As the example shows, that is not always the right insertion spot.Some implementation notes for fixing this:
recompute_split_ebb()
of inserting the new EBB somewhere in the existing post order is correct. It is only the insertion position that needs to be fixed.Currently in lib/filecheck/Cargo.toml
we have regex = "0.1.71"
, the plan is:
regex = "0.2.2"
cargo build
and follow-fix compiler errorscargo test
to make sure tests passThis is a tedious task but suitable for starters as it doesn't require deep knowledge about the project (I think). Also https://github.com/rust-lang/regex/blob/master/CHANGELOG.md can help you.
When debugging using the verifier, you often have errors of the type inst589: use of non-dominating arg value
. However it is impossible to locate inst589
when printing the function later because the instruction labels are not printed.
It would be very valuable to print these when printing the function to be able to actually use the verifier error messages.
For me, running test-all.sh after running "cargo clean" currently halts after running the flake8 tests:
pgarland@rimbaud:~/Hacking/cretonne$ ./test-all.sh
====== Rust formatting ======
====== Python 2.7.13, Python 3.5.4rc1 ======
=== flake8 ===
./gen_legalizer.py:22:62: E261 at least two spaces before inline comment
./gen_legalizer.py:26:37: E261 at least two spaces before inline comment
./gen_legalizer.py:27:39: E261 at least two spaces before inline comment
./test_gen_legalizer.py:7:73: E261 at least two spaces before inline comment
./test_gen_legalizer.py:8:41: E261 at least two spaces before inline comment
./test_gen_legalizer.py:9:34: E261 at least two spaces before inline comment
./test_gen_legalizer.py:11:30: E261 at least two spaces before inline comment
./test_gen_legalizer.py:12:34: E261 at least two spaces before inline comment
./test_gen_legalizer.py:13:57: E261 at least two spaces before inline comment
./test_gen_legalizer.py:18:62: E261 at least two spaces before inline comment
./semantics/init.py:5:41: E261 at least two spaces before inline comment
./semantics/init.py:6:29: E261 at least two spaces before inline comment
./semantics/init.py:7:38: E261 at least two spaces before inline comment
./semantics/init.py:8:34: E261 at least two spaces before inline comment
./semantics/init.py:9:68: E261 at least two spaces before inline comment
./semantics/elaborate.py:11:68: E261 at least two spaces before inline comment
./semantics/elaborate.py:12:33: E261 at least two spaces before inline comment
./semantics/elaborate.py:13:37: E261 at least two spaces before inline comment
./semantics/elaborate.py:14:34: E261 at least two spaces before inline comment
./semantics/primitives.py:13:20: E261 at least two spaces before inline comment
./semantics/smtlib.py:15:56: E261 at least two spaces before inline comment
./semantics/smtlib.py:16:38: E261 at least two spaces before inline comment
./semantics/smtlib.py:17:32: E261 at least two spaces before inline comment
./semantics/smtlib.py:18:34: E261 at least two spaces before inline comment
./semantics/smtlib.py:20:42: E261 at least two spaces before inline comment
./cdsl/ast.py:14:37: E261 at least two spaces before inline comment
./cdsl/instructions.py:10:28: E261 at least two spaces before inline comment
./cdsl/isa.py:352:25: W503 line break before binary operator
./cdsl/isa.py:353:25: W503 line break before binary operator
./cdsl/test_ti.py:16:70: E261 at least two spaces before inline comment
./cdsl/test_ti.py:17:62: E261 at least two spaces before inline comment
./cdsl/ti.py:10:72: E261 at least two spaces before inline comment
./cdsl/ti.py:11:64: E261 at least two spaces before inline comment
./cdsl/ti.py:13:34: E261 at least two spaces before inline comment
./cdsl/ti.py:14:26: E261 at least two spaces before inline comment
./cdsl/ti.py:15:33: E261 at least two spaces before inline comment
./cdsl/types.py:6:55: E261 at least two spaces before inline comment
./cdsl/typevar.py:13:71: E261 at least two spaces before inline comment
./cdsl/xform.py:11:37: E261 at least two spaces before inline comment
While these should possibly be fixed, they don't seem a good reason to halt the test suite.
The possible fixes that jump to my mind are:
I'm not a bash expert, so I may be missing other/better fixes. Of the 3 above possible fixes, passing --exit-zero to flake8 seems best as it shows the problems that flake8 discovers, but allows the rest of the tests to continue to run.
Cretonne's in-memory IR needs a dominator tree analysis for verification, and likely for register allocation.
We expect most input programs to have structured control flow, so the Cooper / Harvey / Kennedy algorithm will work well.
Cretonne compiles functions independently, so function names are used differently than in LLVM. They serve two purposes:
.cton
test cases, the function names are all ASCII, and are used to identify individual test cases in the same file.The binary function names are not well supported. They get printed out as quoted ASCII with a bunch of backslash escapes.
v7
or ebb4
get printed without quotes, and the lexer recognizes them as value and EB tokens.Over in #24, I proposed two new identifier/data tokens: %nnnn
and #xxxx
. We should use these to represent function names everywhere:
_
, use the %nnnn
notation. Note that this also allows for names like %0
. There is no need to give special treatment to the first character.#xxxx
hexadecimal representation of the function name bytes.noname
(no %
).With these changes, the parser should stop accepting unquoted identifiers as function names.
Currently, the FuncName
struct contains a String
:
pub struct FunctionName(String);
This restricts us to names in UTF-8 form. We should accept any sequence of bytes, so change this to:
pub struct FunctionName(Vec<u8>);
For extra credit: Cretonne tries to minimize heap allocations everywhere, so an internal short-string optimization would make a lot of sense:
enum NameRepr {
Short {
length: u8,
bytes: [u8;12],
}
Long(Vec<u8>),
}
We need to know the exact location of the EBB headers before we can emit binary machine code for branches to those EBB headers. Branches have limited ranges, and extending the range changes the size of the branch, so a non-trivial algorithm is needed to compute the final offsets of all the EBBs.
The range of branches varies a lot by ISA, and there are sometimes multiple branch encodings with different ranges:
ISA | Cond. branch | Uncond. branch |
---|---|---|
Intel | 256 B - 4 GB | 256 B - 4 GB |
RISC-V | 512 B - 8 KB | 4 KB - 2 MB |
ARM 32 | 512 B - 64 MB | 4 KB - 64 MB |
ARM 64 | 64 KB - 2 MB | 256 MB |
The range is the span of code reachable by a branch that sits approximately in the middle of the span.
Branch relaxation is the process of selecting the smallest possible branch encoding, and making sure that all branch targets are in range. Since some conditional branches have shorter ranges than unconditional branches on RISC architectures, it is sometimes necessary to replace a conditional branch with a negated branch that skips over a jump
instruction to the original branch destination:
brz v1, ebb17
=>
brnz v1, ebb23
jump ebb17
ebb23:
...
The result of the branch relaxation pass should be a table of EBB offsets and chosen encodings for all branch instructions such that the displacements can be encoded. The instruction compression in #72 does something similar for non-branch instructions. The encoding tables can probably be shared.
A simple iterative algorithm goes like this:
This should arrive at an optimal result (given the fixed EBB order), but requires iteration. This is the algorithm LLVM uses, except for the branch-over-jump transformation which happens earlier in ARM-specific passes.
This only needs iteration when branch-over-jump patterns are needed, which should not be very often. It is not guaranteed to find an optimal solution, though.
When implementing branch relaxation in Cretonne, we have a few special concerns:
This suggests the following variation of the optimistic algorithm:
This algorithm will complete in one pass for functions with a single EBB (and so only backwards branches). It will usually complete in three passes for the general case:
If no forward branches need relaxing, it will complete in two passes.
It would be possible to build an auxiliary table that summarizes the non-branch instructions as just their code size, but given typical branch density and how fast it is to look up code size from encoding recipes, it is probably not worth it.
The write_value_aliases()
function will print out any value aliases used by an instruction on lines before the instruction itself:
vx3 -> v4
v1 = iadd_imm vx3, 17
The parser in lib/reader
should be taught to handle these aliases.
Since the parser is used mostly for test cases, value aliases in the input should be translated to value aliases in the in-memory IR. We don't want to resolve aliases in the parser, even though that would be easy.
The B+-trees in #38 should also be implemented in a set-like variant. At least our control-flow graph representation could benefit from that. The lists of successors and predecessors would be well implemented as B-tree sets.
The node enum for a set is:
enum Node {
Inner {
size: u8,
keys: [K; 7],
nodes: [NodeRef; 8],
}
Leaf {
size: u8,
keys: [K; 15],
}
}
Note: This is not the classical copy coalescing compiler optimization.
The LiveRange
struct represents the live range of an SSA value as a list of intervals, one for each EBB where the value is live-in. In many cases, a value will be live through consecutive EBBs, and we can represent the live range more compactly in that case by coalescing the intervals for adjacent EBBs.
ebb0:
v1 = foo ; inst1
jump ebb1 ; inst9
ebb1:
jump ebb2 ; inst19
ebb2:
jump ebb3 ; inst29
ebb3:
use v1 ; inst31
return ; inst39
Here, v1 has the def-interval inst1-inst9
and the live-in intervals ebb1-inst19, ebb2-inst29, ebb3-inst31
, and that is how the intervals are currently stored in LiveRange
. Since there is nothing interesting going on between adjacent EBBs, this could be represented with a single live-in interval ebb1-inst31
. The def-interval stays the same.
Coalescing two live-in intervals is only valid when they represent adjacent EBBs and the earlier interval extends all the way to the terminator of its EBB. We need a mechanism for recognizing this inter-EBB gap. The ProgramOrder
trait is the right place for that functionality:
trait ProgramOrder {
...
/// Is the range from `inst` to `ebb` just the gap between consecutive EBBs?
///
/// This returns true if `inst` is the terminator in the EBB immediately before `ebb`.
fn is_ebb_gap(&self, inst: Inst, ebb: Ebb) -> bool;
}
The actual coalescing happens in LiveRange::extend_in_ebb()
when a new live-in interval is added. There is four possibilities:
liveins
vector to shrink by 1.liveins
vector grows by one.Note that in cases 2. and 3., the elements in the liveins
vector don't need to be shifted. The existing interval can be modified in place.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.