vnmakarov / mir Goto Github PK

A lightweight JIT compiler based on MIR (Medium Internal Representation) and C11 JIT compiler and interpreter based on MIR

License: MIT License

Makefile 0.78% C 98.69% Shell 0.33% CMake 0.20%

jit-compiler compiler interpreter intermediate-representation c x86-64 aarch64 ppc64 s390x m1

mir's Introduction

MIR Project

MIR means Medium Internal Representation
MIR project goal is to provide a basis to implement fast and lightweight interpreters and JITs
Plans to try MIR light-weight JIT first for CRuby or/and MRuby implementation
Motivations for the project can be found in this blog post
C2MIR compiler description can be found in this blog post
Future of code specialization in MIR for dynamic language JITs can be found in this blog post

Disclaimer

This code is in initial stages of development. It is present only for familiarization with the project. There is absolutely no warranty that MIR will not be changed in the future and the code will work for any tests except ones given here and on platforms other than x86_64 Linux/OSX, aarch64 Linux/OSX(Apple M1), and ppc64be/ppc64le/s390x/riscv64 Linux

MIR

MIR is strongly typed IR
MIR can represent machine 32-bit and 64-bit insns of different architectures
MIR.md contains detail description of MIR and its API. Here is a brief MIR description:
MIR consists of modules
- Each module can contain functions and some declarations and data
- Each function has signature (parameters and return types), local variables (including function arguments) and instructions
  - Each local variable has type which can be only 64-bit integer, float, double, or long double
  - Each instruction has opcode and operands
    - Operand can be a local variable (or a function argument), immediate, memory, label, or reference
      - Immediate operand can be 64-bit integer, float, double, or long double value
  - Memory operand has a type, displacement, base and index integer local variable, and integer constant as a scale for the index
    - Memory type can be 8-, 16-, 32- and 64-bit signed or unsigned integer type, float type, double, or long double type
      - When integer memory value is used it is expanded with sign or zero promoting to 64-bit integer value first
  - Label operand has name and used for control flow instructions
  - Reference operand is used to refer to functions and declarations in the current module, in other MIR modules, or for C external functions or declarations
- opcode describes what the instruction does
- There are conversion instructions for conversion between different 32- and 64-bit signed and unsigned values, float, double, and long double values
- There are arithmetic instructions (addition, subtraction, multiplication, division, modulo) working on 32- and 64-bit signed and unsigned values, float, double, and long double values
- There are logical instructions (and, or, xor, different shifts) working on 32- and 64-bit signed and unsigned values
- There are comparison instructions working on 32- and 64-bit signed and unsigned values, float, double, and long double values
- There are branch insns (unconditional jump, and jump on zero or non-zero value) which take a label as one their operand
- There are combined comparison and branch instructions taking a label as one operand and two 32- and 64-bit signed and unsigned values, float, double, and long double values
- There is switch instruction to jump to a label from labels given as operands depending on index given as the first operand
- There are function and procedural call instructions
- There are return instructions working on 32- and 64-bit integer values, float, double, and long double values

MIR Example

You can create MIR through API consisting of functions for creation of modules, functions, instructions, operands etc
You can also create MIR from MIR binary or text file
The best way to get a feel about MIR is to use textual MIR representation
Example of Eratosthenes sieve on C

#define Size 819000
int sieve (int N) {
  int64_t i, k, prime, count, n; char flags[Size];

  for (n = 0; n < N; n++) {
    count = 0;
    for (i = 0; i < Size; i++)
      flags[i] = 1;
    for (i = 0; i < Size; i++)
      if (flags[i]) {
        prime = i + i + 3;
        for (k = i + prime; k < Size; k += prime)
          flags[k] = 0;
        count++;
      }
  }
  return count;
}
void ex100 (void) {
  printf ("sieve (100) = %d\", sieve (100));
}

Example of MIR textual file for the same function:

m_sieve:  module
          export sieve
sieve:    func i32, i32:N
          local i64:iter, i64:count, i64:i, i64:k, i64:prime, i64:temp, i64:flags
          alloca flags, 819000
          mov iter, 0
loop:     bge fin, iter, N
          mov count, 0;  mov i, 0
loop2:    bge fin2, i, 819000
          mov u8:(flags, i), 1;  add i, i, 1
          jmp loop2
fin2:     mov i, 0
loop3:    bge fin3, i, 819000
          beq cont3, u8:(flags,i), 0
          add temp, i, i;  add prime, temp, 3;  add k, i, prime
loop4:    bge fin4, k, 819000
          mov u8:(flags, k), 0;  add k, k, prime
          jmp loop4
fin4:     add count, count, 1
cont3:    add i, i, 1
          jmp loop3
fin3:     add iter, iter, 1
          jmp loop
fin:      ret count
          endfunc
          endmodule
m_ex100:  module
format:   string "sieve (10) = %d\n"
p_printf: proto p:fmt, i32:result
p_sieve:  proto i32, i32:iter
          export ex100
          import sieve, printf
ex100:    func v, 0
          local i64:r
          call p_sieve, sieve, r, 100
          call p_printf, printf, format, r
          endfunc
          endmodule

func describes signature of the function (taking 32-bit signed integer argument and returning 32-bit signed integer value) and function argument N which will be local variable of 64-bit signed integer type
- Function results are described first by their types and have no names. Parameters always have names and go after the result description
- Function may have more than one result but possible number and combination of result types are currently machine defined
You can write several instructions on one line if you separate them by ;
The instruction result, if any, is always the first operand
We use 64-bit instructions in calculations
We could use 32-bit instructions in calculations which would have sense if we use 32-bit CPU
- When we use 32-bit instructions we take only 32-bit significant part of 64-bit operand and high 32-bit part of the result is machine defined (so if you write a portable MIR code consider the high 32-bit part value is undefined)
string describes data in form of C string
- C string can be used directly as an insn operand. In this case the data will be added to the module and the data address will be used as an operand
export describes the module functions or data which are visible outside the current module
import describes the module functions or data which should be defined in other MIR modules
proto describes function prototypes. Its syntax is the same as func syntax
call are MIR instruction to call functions

Running MIR code

After creating MIR modules (through MIR API or reading MIR binary or textual files), you should load the modules
- Loading modules makes visible exported module functions and data
- You can load external C function with MIR_load_external
After loading modules, you should link the loaded modules
- Linking modules resolves imported module references, initializes data, and set up call interfaces
After linking, you can interpret functions from the modules or call machine code for the functions generated with MIR JIT compiler (generator). What way the function can be executed is usually defined by set up interface. How the generated code is produced (lazily on the first call or ahead of time) can be also dependent on the interface
Running code from the above example could look like the following (here m1 and m2 are modules m_sieve and m_e100, func is function ex100, sieve is function sieve):

    /* ctx is a context created by MIR_init */
    MIR_load_module (ctx, m1); MIR_load_module (ctx, m2);
    MIR_load_external (ctx, "printf", printf);
    MIR_link (ctx, MIR_set_interp_interface, import_resolver);
    /* or use MIR_set_gen_interface to generate and use the machine code */
    /* or use MIR_set_lazy_gen_interface to generate function code on its 1st call */
    /* use MIR_gen (ctx, func) to explicitly generate the function machine code */
    MIR_interp (ctx, func, &result, 0); /* zero here is arguments number  */
    /* or ((void (*) (void)) func->addr) (); to call interpr. or gen. code through the interface */

Running binary MIR files on Linux through `binfmt_misc`

The mir-bin-run binary is prepared to be used from binfmt_misc with the following line (example):

line=:mir:M::MIR::/usr/local/bin/mir-bin-run:P
echo $line > /proc/sys/fs/binfmt_misc/register

Do adapt the mir-bin-run binary path to your system, that is the default one

And run with

c2m your-file.c -o your-file
chmod +x your-file
./your-file your args

The executable is "configurable" with environment variables:

MIR_TYPE sets the interface for code execution: interp (for interpretation), jit (for generation) and lazy (for lazy generation, default);
MIR_LIBS (colon separated list) defines a list of extra libraries to load;
MIR_LIB_DIRS or LD_LIBRARY_PATH (colon separated list) defines an extra list of directories to search the libraries on.

Due to the tied nature of mir-bin-run with binfmt_misc, it may be a bit weird to call mir-bin-run directly. The P flag on the binfmt_misc passes an extra argument with the full path to the MIR binary.

The current state of MIR project

You can use C setjmp/longjmp functions to implement longjump in MIR
Binary MIR code is usually upto 10 times more compact and upto 10 times faster to read than analogous MIR textual code
MIR interpreter is about 6-10 times slower than code generated by MIR JIT compiler
MIR to C compiler is currently about 90% implemented

The possible future state of MIR project

WASM to MIR translation should be pretty straightforward
- Only small WASM runtime for WASM floating point round insns needed to be provided for MIR
Implementation of Java byte code to/from MIR and LLVM IR to/from MIR compilers will be a challenge:
- big runtime and possibly MIR extensions will be required
Porting GCC to MIR is possible too. An experienced GCC developer can implement this for 6 to 12 months
On my estimation porting MIR JIT compiler to mips64 or sparc64 will take 1-2 months of work for each target
Performance minded porting MIR JIT compiler to 32-bit targets will need an implementation of additional small analysis pass to get info what 64-bit variables are used only in 32-bit instructions

MIR JIT compiler

Compiler Performance Goals relative to GCC -O2:
- 70% of generated code speed
- 100 times faster compilation speed
- 100 times faster start-up
- 100 times smaller code size
- less 15K C LOC
Very short optimization pipeline for speed and light-weight
Only the most valuable optimization usage:
- function inlining
- global common sub-expression elimination
- variable renaming
- register pressure sensitive loop invariant code motion
- sparse conditional constant propagation
- dead code elimination
- code selection
- fast register allocator with implicit coalescing hard registers and stack slots for copy elimination
Different optimization levels to tune compilation speed vs generated code performance
SSA form of MIR is used before register allocation
- We use a form of Braun's algorithm to build SSA (M. Braun et al. "Simple and Efficient Construction of Static Single Assignment Form")
- We keep SSA in conventional form all the time to make out-of-SSA pass trivial
Simplicity of optimizations implementation over extreme generated code performance
More details about full JIT compiler pipeline:
Simplify: lowering MIR
Inline: inlining MIR calls
Build CFG: building Control Flow Graph (basic blocks and CFG edges)
Build SSA: Building Single Static Assignment Form by adding phi nodes and SSA edges to operands
Copy Propagation: SSA copy propagation keeping conventional SSA form and removing redundant extension insns
Global Value Numbering: Removing redundant insns through GVN
Dead Code Elimination: removing insns with unused outputs
Sparse Conditional Constant Propagation: constant propagation and removing death paths of CFG
Out of SSA: Removing phi nodes and SSA edges (we keep conventional SSA all the time)
Machinize: run machine-dependent code transforming MIR for calls ABI, 2-op insns, etc
Find Loops: finding natural loops and building loop tree
Build Live Info: calculating live in and live out for the basic blocks
Build Live Ranges: calculating program point ranges for registers
Assign: fast RA for -O0 or priority-based linear scan RA for -O1 and above
Rewrite: transform MIR according to the assign using reserved hard regs
Combine (code selection): merging data-depended insns into one
Dead Code Elimination: removing insns with unused outputs
Generate Machine Insns: run machine-dependent code creating machine insns

C to MIR translation

Currently work on 2 different ways of the translation are ongoing
- Implementation of a small C11 (2011 ANSI C standard) to MIR compiler. See README.md
- Implementation of LLVM Bitcode to MIR translator. See README.md

Structure of the project code

Files mir.h and mir.c contain major API code including input/output of MIR binary and MIR text representation
Files mir-dlist.h, mir-mp.h, mir-varr.h, mir-bitmap.h, mir-htab.h contain generic code correspondingly for double-linked lists, memory pools, variable length arrays, bitmaps, hash tables. File mir-hash.h is a general, simple, high quality hash function used by hashtables
File mir-interp.c contains code for interpretation of MIR code. It is included in mir.c and never compiled separately
Files mir-gen.h, mir-gen.c, mir-gen-x86_64.c, mir-gen-aarch64.c, mir-gen-ppc64.c, mir-gen-s390x.c, and mir-gen-riscv.c contain code for MIR JIT compiler
- Files mir-gen-x86_64.c, mir-gen-aarch64.c, mir-gen-ppc64.c, mir-gen-s390x.c, and mir-gen-riscv.c is machine dependent code of JIT compiler
Files mir-<target>.c contain simple machine dependent code common for interpreter and JIT compiler
Files mir2c/mir2c.h and mir2c/mir2c.c contain code for MIR to C compiler
Files c2mir/c2mir.h, c2mir/c2mir.c, c2mir/c2mir-driver.c, and c2mir/mirc.h contain code for C to MIR compiler. Files in directories c2mir/x86_64 and c2mir/aarch64, c2mir/ppc64, c2mir/s390x, and c2mir/riscv contain correspondingly x86_64, aarch64, ppc64, s390x, and riscv machine-dependent code for C to MIR compiler

Playing with current MIR project code

MIR project is far away from any serious usage
The current code can be used only to familiarize future users with the project and approaches it uses
You can run some benchmarks and tests by make bench and make test

Current MIR Performance Data

Intel i7-9700K with 16GB memory under FC29 with GCC-8.2.1

	MIR-gen	MIR-interp	gcc -O2	gcc -O0
compilation [1]	1.0 (69us)	0.17 (12us)	193 (13.35ms)	186 (12.8ms)
compilation [2]	1.0 (116us)	0.10 (12us)	115 (13.35ms)	110 (12.8ms)
execution [3]	1.0 (3.05s)	6.0 (18.3s)	0.95 (2.9s)	2.08 (6.34s)
code size [4]	1.0 (408KB)	0.51 (209KB)	62 (25.2MB)	62 (25.2MB)
startup [5]	1.0 (1.3us)	1.0 (1.3us)	9310 (12.1ms)	9850 (12.8ms)
LOC [6]	1.0 (19.1K)	0.53 (10.1K)	77 (1480K)	77 (1480K)

[1] is based on wall time of compilation of sieve code (w/o any include file and with using memory file system for GCC) 100 times. The used optimization level is 1

[2] is analogous to [1] but with MIR-optimization level 2

[3] is based on the best wall time of 10 runs with used MIR-generator optimization level 1

[4] is based on stripped sizes of cc1 for GCC and MIR core and interpreter or generator for MIR

[5] is based on wall time of generation of object code for empty C file or generation of empty MIR module through API

[6] is based only on files required for x86-64 C compiler and files for minimal program to create and run MIR code

MIR project competitors

I only see three projects which could be considered or adapted as real universal light-weight JIT competitors
QBE:
- It is small (10K C lines)
- It uses SSA based IR (kind of simplified LLVM IR)
- It has the same optimizations as MIR-generator plus aliasing but QBE has no inlining
- It generates slower code
- It generates assembler code which makes QBE 30 slower in machine code generation than MIR-generator
- It generates code for more targets
LIBJIT started as a part of DotGNU Project:
- LIBJIT is bigger:
  - 80K C lines (for LIBJIT w/o dynamic Pascal compiler) vs 10K C lines for MIR (excluding C to MIR compiler)
  - 420KB object file vs 170KB
- LIBJIT has fewer optimizations: only copy propagation and register allocation
RyuJIT is a part of runtime for .NET Core:
- RyuJIT is even bigger: 360K SLOC
- RyuJIT optimizations is basically MIR-generator optimizations minus SCCP
- RyuJIT uses SSA
Other candidates:
- LIBFirm: less standalone-, big- (140K LOC), SSA, ASM generation-, LGPL2
- CraneLift: less standalone-, big- (70K LOC of Rust-), SSA, Apache License
- NanoJIT, standalone+, medium (40K C++ LOC), only simple RA-, Mozilla Public License

Porting MIR

Currently MIR works on x86_64, aarch64, ppc64be, ppc64le, s390x, riscv-64 Linux and x86_64/aarch64 (Apple M1) MacOS
HOW-TO-PORT-MIR.md outlines process of porting MIR

mir's People

Stargazers

Watchers

Forkers

juanitofatas dibyendumajumdar martinfx michaelforney ubuntu-repo peterholzer corona10 octaplexsys warvstar longjohncoder pmatos ngaut sletz ashelangovan dm1try linuxb uplinkcoder logzero biggoodman terryguo xujuntwt95329 yodalee tmpvar xushiwei davidmalcolm sondro killvxk zeta1999 mdfl64 happyfacade mafm mbrukman jasu rodgert panda-sheep akorotchenko aosemp crackercat iiicp org-mars mnislam01 00kai0 yibit lygstate zking1000 zhuguangxiang googol-lab russellhaley berryjs varlardohaeris imaginary-person tsukanov-as alecco thinkhero djcp1942 laplacekorea cyrilmhansen oab viekai acaldwell-pixel jjykh redchew-fork yfw123 edubart lightsun light1707 doout mookel emulator-dbt eliphatfs shawsumma cnguoyj itay2805 kyulee-com v-script sdvcn darrentianye yetanotheropensource cyw3 zhuomingliang forksnd uamhforever bulat-ziganshin graydon coconutxin yangtau hacklinjiuyue oxalica devillove084 rnshah9 mechslayer bikallem glasslight nasingfaund glegris mingodad paulwratt data-gami oopsilon seerkong

mir's Issues

Enabling -Wall generates lots of warnings when compiling

I noticed that in my builds there were lots of warning messages - but if I used the supplied Makefile then none. I suppose this is because the supplied Makefile does not enable warnings, i.e. leaves them at default, am I right?

Is it worth increasing the warning level when building?

Faust MIR backend: failure in JIT when woking in interpreter mode

We have automatic tests for Faust DSP.

This MIR module https://gist.github.com/sletz/75bdd717a67d9836cba01a934728d371 correctly works in interpreter mode, but fails (that is does not compute the correct sequence of samples) in JIT mode in x86_64. Testing it is the m2bgives a strange ln 549: undeclared name error.

The module uses mir_min and mir_max which simply branch in std::min and std::max functions.

The be more precise: the symptom is that in JIT mode the first generated samples are correct, but then gradually diverge from the correct values, as if some accumulating errors issues were occurring. Note that "double" type is used.

Issues in MIR documentation

MIR_T_V is documented in https://github.com/vnmakarov/mir/blob/master/MIR.md but does not appear in the code base anymore. Is it replaced by something else ?
MIR_I2F/MIR_I2D/MIR_I2LDand MIR_UI2F/MIR_UI2D/MIR_UI2LD documentation seems to be reversed

Memory operation

I'm trying to implement a "load in array" operation, so giving an array filled with some values to the "compute" function, which reads and returns the content of array[4]. The generated code seems correct, but the returned value is not:

static void test2()
{
    MIR_context_t fContext = MIR_init();
    MIR_module_t fModule = MIR_new_module(fContext, "Faust");
    
    // Create 'compute' function
    MIR_type_t res_type = MIR_T_D;
    MIR_item_t fCompute = MIR_new_func(fContext, "compute", 1, &res_type, 1, MIR_T_P, "real_heap");
    
    // Get 'heap' argument
    MIR_reg_t HEAP = MIR_reg(fContext, "real_heap", fCompute->u.func);
    
    // Create a local
    MIR_reg_t VAR1 = MIR_new_func_reg(fContext, fCompute->u.func, MIR_T_D, "VAR1");
    
    // Create and set 'index'
    MIR_reg_t INDEX = MIR_new_func_reg(fContext, fCompute->u.func, MIR_T_I64, "INDEX");
    
    MIR_append_insn(fContext, fCompute,
                     MIR_new_insn(fContext,
                                  MIR_MOV,
                                  MIR_new_reg_op(fContext, INDEX),
                                  MIR_new_int_op(fContext, 4)));
    
    // Get the 'heap' content at 'index'
    MIR_append_insn(fContext, fCompute,
                     MIR_new_insn(fContext, MIR_DMOV,
                                   MIR_new_reg_op(fContext, VAR1),
                                   MIR_new_mem_op(fContext, MIR_T_D, 0, HEAP, INDEX, 1)));
    
    MIR_append_insn(fContext, fCompute, MIR_new_ret_insn(fContext, 1, MIR_new_reg_op(fContext, VAR1)));
    
    // Finish function
    MIR_finish_func(fContext);
    
    // Finish module
    MIR_finish_module(fContext);
    
    MIR_load_module (fContext, fModule);
    MIR_link(fContext, MIR_set_interp_interface, import_resolver);
    
    // Dump module
    MIR_output(fContext, stderr);
    
    // Preparing test heap with some values
    double heap[8];
    for (int i = 0; i < 8; i++) {
        heap[i] = (double)i;
        printf("Heap: %f\n", heap[i]);
    }
    
    MIR_val_t val;
    MIR_interp(fContext, fCompute, &val, 1, (MIR_val_t){.a = (void*)heap});
    printf("Result: %f\n", val.d);
    
    MIR_finish(fContext);
}

Faust:	module
compute:	func	d, p:real_heap
	local	d:VAR1, i64:INDEX, i64:t1
# 1 arg, 3 locals
	mov	INDEX, 4
	add	t1, real_heap, INDEX
	dmov	VAR1, d:(t1)
	ret	VAR1
	endfunc
	endmodule
Heap: 0.000000
Heap: 1.000000
Heap: 2.000000
Heap: 3.000000
Heap: 4.000000
Heap: 5.000000
Heap: 6.000000
Heap: 7.000000
Result: 0.000000

Compiling MIR code with a C++ compiler

The C code is not ready for that and produces a lot of errors. Is it planned feature? Would a PR help?

WASM backend with WASI support?

Would it be hard to add a WASM backend compilation target to MIR, so MIR could compile from C to WASM, with support for the WASI interface? This would provide a way to use MIR as an intermediate language for compiled languages, which want to target WASM.

It seems to me that MIR and WASM is pretty close in opcodes.

Another use case is to compile the MIR compiler itself to WASM (which can be done with emscripten), where the goal is to have an optimizing compiler run in a browser, producing WASM for execution at runtime from the MIR text language.

fatal failure in matching insn: dmov hr16, 7.0

So MIR_DMOV does not work with immediates on x86_64, which makes sense. What would be the proper way to load them using MIR api?

E2K (Elbrus VLIW) support

I hope that in a future I will have time to implement E2K support or at least participate as possible.
In a meantime, I would like to use this issue to be aware of if anyone else is interested or will take up this task.

Regards.

Segmentation fault when compiling C program

One of my JIT backends in Ravi generates C code which is then JIT compiled. I tried running the sieve generated code through c2mir but got a crash.

Attached in the input file.
sieve.txt

Can MIR use interpretation phase to generate better code?

Since MIR can interpret code I was wondering if it is possible to have 2 phase compilation: interpret first, gather some profiling data and then generate machine code in second phase.

Various failures

This fails even in interpreter mode:

MIR_context_t fContext = MIR_init();
MIR_module_t fModule = MIR_new_module(fContext, "Faust");

// Create 'compute' function
MIR_type_t res_type = MIR_T_D;
MIR_item_t fCompute = MIR_new_func(fContext, "compute", 1, &res_type, 0);
MIR_append_insn(fContext, fCompute, MIR_new_ret_insn(fContext, 1, MIR_new_double_op(fContext, 15.123)));
// Finish function
MIR_finish_func(fContext);

// Finish module
MIR_finish_module(fContext);

// Dump module
MIR_output(fContext, stderr);

MIR_val_t val;
MIR_interp(fContext, fCompute, &val, 0);
printf("Result: %f\n", val.d);

Faust:	module
compute:	func	d

# 0 args, 0 locals
	ret	1.51229999999999993320898283855058252811431884765625000e+01
	endfunc
	endmodule
Assertion failed: (ops[i].mode == MIR_OP_REG), function generate_icode, file /Documents/JIT-compilation/mir/mir-interp.c, line 392.
Abort trap: 6

This works in interpreter mode:

MIR_context_t fContext = MIR_init();
MIR_module_t fModule = MIR_new_module(fContext, "Faust");

// Create 'compute' function
MIR_type_t res_type = MIR_T_D;
MIR_item_t fCompute = MIR_new_func(fContext, "compute", 1, &res_type, 0);

MIR_reg_t VAR1 = MIR_new_func_reg(fContext, fCompute->u.func, MIR_T_D, "VAR1");
MIR_append_insn(fContext, fCompute, MIR_new_insn(fContext, MIR_DMOV, MIR_new_reg_op(fContext, VAR1),
                                                 MIR_new_double_op (fContext, 15.123)));
MIR_reg_t VAR2 = MIR_new_func_reg(fContext, fCompute->u.func, MIR_T_D, "VAR2");
MIR_append_insn(fContext, fCompute, MIR_new_insn(fContext, MIR_DMOV, MIR_new_reg_op(fContext, VAR2),
                                                 MIR_new_double_op (fContext, 30.456)));

MIR_append_insn(fContext, fCompute, MIR_new_insn(fContext, MIR_DADD, MIR_new_reg_op(fContext, VAR1),
                                                 MIR_new_reg_op(fContext, VAR1),
                                                 MIR_new_reg_op(fContext, VAR2)));

MIR_append_insn(fContext, fCompute, MIR_new_ret_insn(fContext, 1, MIR_new_reg_op(fContext, VAR1)));

// Finish function
MIR_finish_func(fContext);

// Finish module
MIR_finish_module(fContext);

But fails in JIT mode adding:

 // Code generation and link
MIR_gen_init(fContext);
MIR_gen_set_debug_file(fContext, stderr);
compiledFun fCompiledFun = (compiledFun)MIR_gen(fContext, fCompute);
MIR_gen_finish(fContext);

produces:

fatal failure in matching insn: dmov hr16, 1.51229999999999993320898283855058252811431884765625000e+01

[C2M] Parse statement expressions

Statement expression is no standard C, but an extension supported by GCC/Clang/TCC, it allows to run statements within an expression, see https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

For my specific use case I find this extension very useful to inline arbitrary C code, where defining functions to evaluate these expressions would be more boilerplate and add complexity.

[C2M] c2mir.h dependency on mir.h should be avoided

It seems that c2mir.h currently depends on mir.h, which pulls in a few other header files. IMO it would be preferable if c2mir.h avoided a dependency on mir.h.

Use of reserved identifiers

I noticed that MIR uses reserved identifiers throughout the codebase:

Names beginning with an underscore and capital letter are reserved for the implementation for any use (C11 7.1.3p1). MIR uses many symbols named this way (_MIR_*).
POSIX reserves identifiers ending in _t for use in any header (POSIX.1-2017 name space). MIR names many of its types this way (MIR_*_t).

It is possible to choose different naming schemes for the identifiers used in MIR so that they don't conflict with those reserved for ISO C and POSIX?

[macOS] make test failed with 'Unsupported compiler detected'

I know this not the priority of this project.
Just for the information.

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/Availability.h:577:19: wrong preprocessor expression
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/cdefs.h:81:2: warning -- #warning "Unsupported compiler detected"
FAIL (code = 1)

Faust compiling for MIR

The Faust compiler now generates MIR code. Here is a benchmark on a set of DSPs (on macOS 10.13, 2.2 Ghz Core I7):

The backend code is here.

Now waiting to have more CPUs supported by MIR !

typo in MIR blog post

In the fine MIR: A lightweight JIT compiler project blog post, you write "The MIR-to-C compiler is about 12 thousand lines of C code." I think you mean "C-to-MIR"?

(Sorry, couldn't figure it how to contact you.)

Suggestion for api design

One of the good ideas in LLVM, and also NanoJIT is that all api calls return pointers; and take arguments that are pointers. So that a front-end can use a simple scalar type to hold instructions, operands, types etc.

I see the MIR has several different types ... op, item, insn, module etc. I would suggest not exposing these at the api level; instead the api can use opaque pointer types. This will also give MIR more flexibility in future design as users will not need to know about the internal structure of these various values.

Certainly from a front-end point of view - in dmr_C I have multiple backends. The front-end does not know about the backends, and having concrete types to deal with is a problem.

It does appear that mostly MIR is also using pointer types, except for a few cases?

Redundant ext32 after inlining

Hi. Toying around with c2m I've run into a case that might be worth optimizing?

Minimal example:
int test () { return 4.0f; }
int main () { return test(); }

mov hr0, .lc1
f2i hr0, f:(hr0) # dead: hr0
ext32 hr0, hr0 # dead: hr0
L1
ext32 hr0, hr0 # dead: hr0
ret hr0 # dead: hr0

movabs $0x10cd930,%rax
cvttss2si (%rax),%rax
movslq %eax,%rax
movslq %eax,%rax
retq

target_translate in mir-gen-x86_64.c continues even if replacement fails, is this correct?

Not sure if this is an error or not - I got a failure where it seems to have hit this branch.

[C2M] ungetc_func should take a user supplied void pointer argument

The c2mir_compile() takes ungetc_func function as parameter. However ungetc_func() implementation needs to be able to refer to some state in order to work. The standard way to allow the ungetc_func() to access its state is to pass a void * argument which is then passed to ungetc_func().

At the moment the only way for ungetc_func() to work is to access through static lifetime data.

Memory model.

The project looks very interesting.

I was wondering whether it would be a good fit for my hobby purely functional language. But I could not find much information about the MIR memory model in readme.

Do I have to manage memory manually, or is there a GC?
Can I do anything with pointers, I could do in C?
Are there structures (C struct) and unions (or ADTs)?

A #define such as MIRC would be useful during compilation of C code

It would allow programs to detect that MIR is used; purpose of this macro being similar to __GNUC__ .

Question: compiling multiple modules incrementally

Hi @vnmakarov

In a JIT environment many modules need to be compiled incrementally, i.e. not at the same time. I am planning to use the C2MIR component in Ravi - and my question is: is this currently supported with c2mir?

I noticed that there is a compile / link process in c2mir test code; presumably this needs to be done for all modules as they are compiled. But compiled functions need to be held in memory.

The model I use with LLVM or OMR is this:

Create Context - this holds all compiled objects
Repeat as required:
Compile a new module - compiled functions from new module are saved in the context.
When shutting down, destroy the context, thereby destroying all the compiled objects.

target-dependent function return values

hello,

i'm reading MIR.md and found this paragraph:

* MIR functions can have more one result but possible number of results
   and combination of their types are machine-defined.  For example, for x86-64
   the function can have upto six results and return two integer
   values, two float or double values, and two long double values
   in any combination

this decision seems to make mir programs dependent on ARCH it runs on.
i'm wondering whether it would be preferable to modify API such that any number of args can be returned.
the target-specific codegen could then decide how these are returned (optimally the same way C does it when returning a struct).
so a scripting language can freely use MIR without being restricted in this aspect.

of course i may be missing some important aspects you already considered.

interestingly, c2mir compiles the following snippet

struct foo { long long l1, l2, l3, l4; };
struct foo bar(int a1, int a2, int a3) {
        return (struct foo) {.l1 = a1, .l2 = a2, .l3 = a3, .l4 = a3+a2};
}

but using m2c[0] on mir code generated by -S causes assert failure.

[0]:

diff --git a/mir2c/mir2c.c b/mir2c/mir2c.c
index bf9d92f..20454cd 100644
--- a/mir2c/mir2c.c
+++ b/mir2c/mir2c.c
@@ -462,7 +462,7 @@ void MIR_module2c (MIR_context_t ctx, FILE *f, MIR_module_t m) {
 }
 
 /* ------------------------- Small test example ------------------------- */
-#ifdef TEST_MIR2C
+#if defined(TEST_MIR2C)
 
 #include "mir-tests/scan-sieve.h"
 #include "mir-tests/scan-hi.h"
@@ -478,4 +478,27 @@ int main (int argc, const char *argv[]) {
   MIR_finish (ctx);
   return 0;
 }
+
+#elif defined(M2C)
+
+int main (int argc, const char *argv[]) {
+  MIR_module_t m;
+  MIR_context_t ctx = MIR_init ();
+
+  FILE *f = fopen(argv[1], "r");
+  fseeko(f, 0, SEEK_END);
+  off_t l = ftello(f);
+  fseeko(f, 0, SEEK_SET);
+  char *buf = malloc(l+1);
+  assert(fread(buf, 1, l, f) == l);
+  fclose(f);
+  buf[l] = 0;
+  MIR_scan_string (ctx, buf);
+  m = DLIST_TAIL (MIR_module_t, *MIR_get_module_list (ctx));
+  MIR_module2c (ctx, stdout, m);
+  MIR_finish (ctx);
+  return 0;
+}
+
+
 #endif

Adding a Lightning backend in MIR project ?

Discussing on Lightning (https://www.gnu.org/software/lightning/) I've got this answer from Paul Cercueil:

=======

One thing I will add - MIR and Lightning have different scopes; MIR is a JIT engine tailored for programming languages, while Lightning can be best described as a code generator. MIR would be unsuitable for some tasks Lightning is good for, e.g. writing dynamic recompilers. On the other hand, MIR provides much more support for implementing programming languages.

What would be interesting is a Lightning backend for MIR; then MIR would take care of IR optimization and register allocation, and it would run on the wide range of archs supported by Lightning.

=======

Would it make sense ? Any comments ?

Question: how to output readable machine code (i.e. asm) output?

Hi,

In my project I would like to be able to display the output from MIR at various levels; the MIR textual code, and the final machine code in assembly. I would like to capture this output rather than send to a file. I believe that the MIR text output can be sent to a file. Is there a way to get the readable disassembly of the final output?

Thanks and regards
Dibyendu

Automated Meta Code Generation of MIR Code and Infrastructure

This might not be immediately actionable but can we have and DSL which describes the passing of a language MIR mapping and the optimisations applied where the code needed will be automatically generated. Also, the DSL should be able to describe itself so a future version of the DSL can be described in an older version. This way the code base will not be C but a purpose-built language on its own right. This way MIR will become more easy to understand, modify and improve.

[Feature request] Use third party libraries

I've noted in the code that just symbols from libm and libc is available. Would be nice to use third party libraries with some command line option.

For now I've hard coded another library in the sources, added libSDL2.so in my case, and was able to run a game coded in C with mir!

Question regarding tests

Hi, I would like to help with the testing of the C to MIR translator. One of the approaches I used for dmrC was to source tests from various places and adapt them. The issue of course is sometimes the tests have their own licences. Do you care about that?

Anyway please let me know if this is of interest.

how to make recursion?

Приветствую
Если я правильно понял документацию по поводу инструкции вызова

The first operand is a prototype reference operand

The second operand is a called function address

Но не нашел как эти данные получить...
Или же для этого можно передать в "MIR_new_insn" два поля "MIR_item_t::proto" and "MIR_item_t::addr"? Но разве addr, пока фун-ция не закончена, валидный?
Например
MIR_item_t foo = MIR_new_func( ctx, "foo", 0, NULL, 1, MIR_T_I32, "bar" );

MIR_reg_t bar = MIR_reg( ctx, "bar", foo->u.func );

MIR_label_t finish = MIR_new_label( ctx );

// if( bar <= 0 ) goto finish;
MIR_append_insn( ctx, foo, MIR_new_insn( ctx, MIR_BLES

, MIR_new_label_op( ctx, finish )
, MIR_new_reg_op( ctx, bar )
, MIR_new_int_op( ctx, 0 )
) );

// bar -= 1;
MIR_append_insn( ctx, foo, MIR_new_insn( ctx, MIR_SUBS

, MIR_new_reg_op( ctx, bar )
, MIR_new_reg_op( ctx, bar )
, MIR_new_int_op( ctx, 1 )
) );

// foo( bar );
MIR_append_insn( ctx, foo, MIR_new_call_insn( ctx, 3
, foo->proto// ???
, foo->addr// ???
, MIR_new_reg_op( ctx, bar )
) );

// finish:
MIR_append_insn( ctx, foo, finish );

MIR_finish_func( ctx );

П.С.
Крутая библиотека, очень понравился API, похож на libjit но более простой и понятный :-)

I think that it would be useful if the c2mir was also treated as a library / API, which means exposing some api for clients to call.

I think that it would be useful if the c2mir was also treated as a library / API, which means exposing some api for clients to call. My suggestion is something like:

MIR_module_t 
MIR_compile_C_module(MIR_context_t ctx, int argc, const char *argv[], const char *inputbuffer);

That is, given an optional buffer plus command line arguments, generate a module and return it.

Raising this as a separate issue.

Originally posted by @dibyendumajumdar in #10 (comment)

[C2M] Float constants from math.h give errors and warnings

Doing some more tests with C2M I've found that the following is unable to compile:

#include <stdio.h>
#include <math.h>
void main() {
  printf("%lf\n", NAN);
  printf("%lf\n", INFINITY);
  printf("%lf\n", HUGE_VAL);
  printf("%f\n", HUGE_VALF);
  printf("%Lf\n", HUGE_VALL);
}

When running the example with c2m the output is:

/usr/include/math.h:62:21: warning -- number 1e10000f is out of range
/usr/include/math.h:55:19: warning -- number 1e10000 is out of range
/usr/include/math.h:62:21: warning -- number 1e10000f is out of range
/usr/include/math.h:63:21: warning -- number 1e10000L is out of range
/usr/include/math.h:103:16: Division by zero

What was expected is no warnings or errors.

The culprits are those lines from GLIBC 2.30:
https://github.com/bminor/glibc/blob/glibc-2.30/math/math.h#L62
https://github.com/bminor/glibc/blob/glibc-2.30/math/math.h#L103

hardcoded GLIBC library names

currently i use the following hack so mir works on my musl-libc based linux distro:

diff --git a/c2mir/c2mir.c b/c2mir/c2mir.c
index 6659bf6..483b4d2 100644
--- a/c2mir/c2mir.c
+++ b/c2mir/c2mir.c
@@ -11877,7 +11877,13 @@ static int fancy_printf (const char *fmt, ...) { abort (); }
 static struct lib {
   char *name;
   void *handler;
-} libs[] = {{"/lib64/libc.so.6", NULL}, {"/lib64/libm.so.6", NULL}};
+} libs[] = {
+#ifdef __GLIBC__
+{"/lib64/libc.so.6", NULL}, {"/lib64/libm.so.6", NULL}
+#else
+{"/lib/libc.so", NULL},
+#endif
+};
 
 static void close_libs (void) {
   for (int i = 0; i < sizeof (libs) / sizeof (struct lib); i++)
diff --git a/mir-bin-driver.c b/mir-bin-driver.c
index 9f86915..6abb3f8 100644
--- a/mir-bin-driver.c
+++ b/mir-bin-driver.c
@@ -20,7 +20,13 @@ static int read_byte (MIR_context_t ctx) {
 static struct lib {
   char *name;
   void *handler;
-} libs[] = {{"/lib64/libc.so.6", NULL}, {"/lib64/libm.so.6", NULL}};
+} libs[] = {
+#ifdef __GLIBC__
+{"/lib64/libc.so.6", NULL}, {"/lib64/libm.so.6", NULL}
+#else
+{"/lib/libc.so", NULL},
+#endif
+};
 
 static void close_libs (void) {
   for (int i = 0; i < sizeof (libs) / sizeof (struct lib); i++)

not trying to push for anything, just raising this for consideration purposes.

unrelated, the following hunk removes dependency on LLVM

diff --git a/Makefile b/Makefile
index 0dd847b..da9bb5d 100644
--- a/Makefile
+++ b/Makefile
@@ -7,7 +7,7 @@ CFLAGS=-O3 -g -DNDEBUG
 TARGET=x86_64
 MIR_DEPS=mir.h mir-varr.h mir-dlist.h mir-htab.h mir-hash.h mir-interp.c mir-x86_64.c
 MIR_GEN_DEPS=$(MIR_DEPS) mir-bitmap.h mir-gen-$(TARGET).c
-OBJS=mir.o mir-gen.o c2m l2m m2b b2m b2ctab
+OBJS=mir.o mir-gen.o c2m m2b b2m b2ctab
 Q=@
 
 all: $(OBJS)

[C2M] C comments parsing error

The parser confuses when using multiples * in a comment, the following:

/**** TEXT ****/
int main(int argc, char **argv) {return 0;}

Outputs:

unfinished comment
cannot link program w/o main function

But it should parse and compile with no errors.

c2mir - while loop hangs

int printf(const char *s, ...);

int main(void) {
	int i = 10;

	do {
		if (i % 3 == 0)
			continue;
		if (i == 2)
			break;
		printf("%d\n", i);
	} while (i--);
	return 0;
}

Above program hangs - i.e. loop doesn't terminate.

Will there be support for Windows x64 abi?

I had assumed the plan is to only support Unix like systems, but I saw the use of WIN32 conditional logic .... is there any plan to support the Win32 X64 calling convention?

[C2M] Fail to parse GLIBC bits/wchar.h

With glibc 2.30 (under ArchLinux), the following fails to parse:

#include <wchar.h>
int main(int argc, char *argv[]){return 0;}

Outputs:

/usr/include/bits/wchar.h:35:8: wrong preprocessor expression
/usr/include/bits/wchar.h:43:8: wrong preprocessor expression

The culprit is this odd line
https://github.com/bminor/glibc/blob/glibc-2.30/bits/wchar.h#L35

PS: I've run into the issue while trying out to run with c2m a game made with SDL2. By commenting the lines in wchar.h I was able to run! So this was the only parsing issue.

MIR should avoid global state

I would like to use MIR in my project but the use of global state is a blocker for me.

If you are willing to take patches I can work on eliminating global state at least from the core MIR module.

Calling an external vararg function

How many args does MIR_new_vararg_proto have to declare, just the fixed ones, or all of them?

In mir examples I see
p_printf: proto p:fmt, i32:r

The API docs say
nargs and arg_vars define only fixed arguments

32 bits and 64 bits integer status

I am a bit confused by the status of 32 bits and 64 bits integers:

integer math operations exist in 32 bits or 64 bits version (like MIR_ADD and MIR_ADDS)
but memory move only exists in 64 bits version with MIR_MOV, why is that?

I my application I'm using 32 bits integers. How am I supposed to generate MIR code then?

Code formattting is inconsistent: please consider adding .clang-format file

I noticed that when I open the sources in an editor I get inconsistent formatting. I think this is due to a mix of TABs and spaces being used. Adding a .clang-format file will enable consistent formatting through many editors.

More, more competitors (lightweightedness is questionable of course)

From README:

I only see two real universal light-weight JIT competitors

I know @dibyendumajumdar maintains (or whatever he does to it, dibyendumajumdar/nanojit#15) https://github.com/dibyendumajumdar/nanojit because he likes that it's lightweight. He's also dissatisfied with the bloatedness of Eclipse OMR up to a level of forking it: https://github.com/dibyendumajumdar/nj .

Oh, he also has a C compiler for those JITs: https://github.com/dibyendumajumdar/dmr_c ;-)

Adding support for SIMD datatypes and basic vector operations

I am interested in implementing a custom language specialized in real-time audio processing, and I have found that MIR could be a very good backend for it.

To achieve high throughput DSP in current CPU architectures, SIMD instructions should be used, so I would like to know if adding simple 128-bit vectors (4 packed floats/2 packed doubles) support to MIR is too hard.

I am talking only about supporting basic arithmetic operations like add, sub, mul and div of new MIR datatypes f32x4 and f64x2, and vector masking to enable branchless select (equivalent to C's ternary operator cond ? a : b). All of these features are currently common in x86_64 (SSE) and aarch64 (NEON).

WebAssemby is currently implementing something similar:
https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md

Regards.

Anyway to use the interpreter in systems that dont allow exec memory.

The interpreter appears to use exec mmap memory, is there an easy way around that so this can work on systems that don't allow that (iOS).

Add Qemu TCG to JIT comparision in README

Qemu includes its own JIT and interpreter subsystems.

Maybe it can be useful to describe mir design difference from it in JIT comparison section.

https://wiki.qemu.org/Documentation/TCG
https://wiki.qemu.org/Features/TCI

First use of MIR JIT in Ravi

Hi @vnmakarov

I implemented a PoC using MIR JIT as the backend for Ravi.
You can see this here:
dibyendumajumdar/ravi@4e445f4

So far I have only tested basics ... looks promising.

Regards

Some warnings when compiling MIR

In file included from /home/dylan/github/ravi/mir/c2mir/c2mir.h:1,
                 from /home/dylan/github/ravi/include/ravi_mirjit.h:29,
                 from /home/dylan/github/ravi/src/ravi_mirjit.c:24:
/home/dylan/github/ravi/mir/mir.h:495:34: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
                                  const int (*writer_func) (MIR_context_t, uint8_t));
                                  ^~~~~
/home/dylan/github/ravi/mir/mir.h:497:41: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
                                         const int (*writer_func) (MIR_context_t, uint8_t),
                                         ^~~~~
/home/dylan/github/ravi/mir/mir.h:499:52: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
 extern void MIR_read_with_func (MIR_context_t ctx, const int (*reader_func) (MIR_context_t));
                                                    ^~~~~

Assertion failure in MIR

Hi,

I am hitting an assertion failure in MIR:

wrong destroy for uint8_travi: /home/dylan/github/mir/mir-varr.h:22: mir_var_assert_fail: Assertion `0' failed.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffffe840801 in __GI_abort () at abort.c:79
#2  0x00007ffffe83039a in __assert_fail_base (fmt=0x7ffffe9b77d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x81527f7 "0", file=file@entry=0x8153188 "/home/dylan/github/mir/mir-varr.h",
    line=line@entry=22, function=function@entry=0x8154f80 <__PRETTY_FUNCTION__.2808> "mir_var_assert_fail")
    at assert.c:92
#3  0x00007ffffe830412 in __GI___assert_fail (assertion=0x81527f7 "0",
    file=0x8153188 "/home/dylan/github/mir/mir-varr.h", line=22,
    function=0x8154f80 <__PRETTY_FUNCTION__.2808> "mir_var_assert_fail") at assert.c:101
#4  0x00000000080a83a7 in mir_var_assert_fail ()
#5  0x00000000080ab27c in VARR_uint8_tdestroy ()
#6  0x00000000080c838d in machine_finish ()
#7  0x00000000080bd288 in code_finish ()
#8  0x00000000080afcf0 in MIR_finish ()
#9  0x0000000008087fe6 in raviV_close (L=0x858a280) at /home/dylan/github/ravi/src/ravi_mirjit.c:194
#10 0x0000000008042d64 in close_state (L=0x858a280) at /home/dylan/github/ravi/src/lstate.c:280
#11 0x000000000804360c in lua_close (L=0x858a280) at /home/dylan/github/ravi/src/lstate.c:426
#12 0x00000000080195b3 in main (argc=3, argv=0x7ffffffee098) at /home/dylan/github/ravi/src/lua.c:637

I will add any more information I can obtain.

[C2M] Wrong initialization of struct's string fields

The following:

#include <stdio.h>
struct Boo { char data[5]; };
struct Boo boo = {"test"};
int main(int argc, char **argv) {
  puts(boo.data);
  return 0;
}

Outputs:

tests/bug1.c:3:19: warning -- assigning pointer without cast to integer
����U

It is expected as in GCC/Clang to print test, but it prints random data, because looks like it's wrongly assigning a pointer instead of initializing the struct.

PS: I was trying this project, mainly because I'm working in other language that already compiles to C, and I found this project promising as it's minimal and can compile C very fast. During my tests I found this issue.

vnmakarov / mir Goto Github PK

mir's Introduction

MIR Project

Disclaimer

MIR

MIR Example

Running MIR code

Running binary MIR files on Linux through binfmt_misc

The current state of MIR project

The possible future state of MIR project

MIR JIT compiler

C to MIR translation

Structure of the project code

Playing with current MIR project code

Current MIR Performance Data

MIR project competitors

Porting MIR

mir's People

Stargazers

Watchers

Forkers

mir's Issues

Recommend Projects

Recommend Topics

Recommend Org

Running binary MIR files on Linux through `binfmt_misc`