Git Product home page Git Product logo

cle's People

Contributors

azure-pipelines[bot] avatar polsys avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

cle's Issues

Imported functions

Allow importing functions from DLLs such as the C Runtime via a special attribute.

  • Design and implement the Import attribute, and allow not having a method body in annotated methods
  • Special-case the method compilation for these methods
  • Emit the import table, etc.

Reference parameters

  • Allow passing a reference to a value type
  • Make sure SSA plays nice with this (value numbers must be updated after call)
  • Consider whether to allow this (at least initially) for reference types

Especially useful with #19, and a prerequisite for doing instance methods with this references.

Global constants

  • Parse global constants
  • Compile and store them
    • How should the compilation happen? Constants can reference each other
    • Cyclical dependencies must be disallowed
  • Compile references to them

Should global constants live in the same namespace as functions, or can they be distinguished by the usage? Local variables should certainly shadow global constants.

Flesh out the module system

  • Decide the file system layout for modules
  • Read module dependencies and build a dependency graph
  • Compile modules in parallel
  • Make sure that internal visibility is respected

Take C# 8.0 into use

I'll be updating to VS2019 soon after it is released, and since there is no reason to maintain VS2017 compatibility once the CI servers have the latest compilers, we might as well have a bit safer compiler.

  • Replace ReSharper annotations with the new (non-)nullable reference types everywhere:
    • Parser
    • Semantic analysis
    • Code generation
    • Frontend, compiler, common
    • Tests
  • Make method bodies completely non-nullable (associated with #42)

Implement SSA transformation pass

The intention is that the IR would always be in SSA form. MethodCompiler should probably invoke a transformation pass directly after completing the non-SSA IR. This pass would also clean up the basic block graph of dropped blocks.

I think this is worth doing even in non-optimizing builds. The perf impact shouldn't be that bad, and codegen may then take advantage of the SSA guarantees.

  • SSA construction working locally
  • SSA construction working with calls
  • SSA construction working for complete CFGs
  • SSA construction working for incomplete CFGs
  • Test improvement: implement an IR assembler
  • Update Cle.SemanticAnalysis README
  • Extend debug logging to dump post-transformation method
  • Refactor how locals are initialized (turns out this makes lifetime tracking and phis simpler)
  • Write a benchmark and do first performance pass

SSA optimizations and IR cleanup should happen in a separate pass for debuggability.

Implement a proper SSA register allocator

The current allocator is a quick prototype for the initial codegen bringup. It does not support

  • stack (moved over to #61),
  • interval splitting (this requires rearchitecting how local locations are represented),
  • register preferences (hints) (to be done later, once needed),
  • correctness.

Implement a proper SSA LSRA algorithm, as described by Wimmer et al.

String type

Related to #15. String would be an immutable UTF-8 array, a special case of the array type. Strings can be created at run time or compile time.

  • Parsing string literals
    • Escape characters
  • Extending the type system
  • String expressions
    • Concatenation
    • Equality comparison
  • Reference counting special-cased to allow constant strings in binary
  • Byte array to string conversion method
  • Integer to string conversions (e.g. int32::ToString())

While loops

Might be initially simpler to implement without break and continue keywords? Also implement do-while loops.

Needed before #5 and #6 to provide tricky cases for the SSA.

  • Parsing
  • Compilation
  • Return guarantee
  • Follow-up issue for do...while loops (#29)
  • Follow-up issue for break and continue (#30)

Runtime support for console output

Investigate the best way to interface with the console API: directly consuming C Runtime, a wrapper library, or something else?

  • Runtime startup, with relevant changes to Clé application entry point
  • Basic console API in the core standard library module

Function inlining

  • Develop a simple heuristic for whether to inline a function
  • Copy the basic block graph, renaming locals
  • Tune the ordering of optimizations to make this effective (need to run CSE and constant folding after inlining, but maybe before it as well)

Array types

Arrays would be a reference type with runtime sizing.

  • Parse array expressions
  • Extend the type system with array types
  • Compile array expressions
  • Allocate/deallocate memory for arrays (special methods emitted into each compilation)
  • Negative sizes cause termination
  • Reference semantics in parameter passing
  • Range checking
  • Negative size in new causes termination
  • Consider: compile-time constant arrays?
  • Update the language specification
  • Ability to get a raw pointer for interop (to skip reference count updating) - maybe with #70?

Probably can leave multidimensional arrays out for now.

User-defined struct types

  • Write down the design and behavior for these: how should value-typed structs and reference-typed fields play together?
  • Implement parsing of struct declarations
  • Implement parsing of struct instance methods
  • Implement parsing of field references
  • Add a semantic compilation pass for user-defined types
  • Compile field assignments and references
  • Compile instance methods (big design question: should this be a reference or value?)
  • Emit zero initialization if necessary
  • Emit the other new opcodes

Assess ExpressionCompiler performance

Tracking issue for this TODO comment:

// TODO: Benchmark whether it would be beneficial to pass the unchanging parameters to the private methods

Once #17 is implemented, it should be quite simple to benchmark ExpressionCompiler without depending on MethodCompiler and friends. Investigate whether parameter passing (lots of parameters, deep call trees) impacts perf, and whether collecting parameters to an readonly struct speeds things up (and potentially simplifies the code). There might be other low-hanging fruits as well.

Need to benchmark several scenarios:

  • simple integer literal
  • a "common" expression
  • a very complex expression

Warn for constant condition in if/while statements

For example:

if (true) { } // Warning: condition is always true
else { } // Warning: unreachable code
  • Warn if an if statement has constant condition
  • Warn if a while statement has constant condition
  • Consider a while statement returning if the condition is always true and the block is returning (no self-contradictory nagging)

Improve compiler concurrency

  • File level parallelism in parsing and/or other phases.
  • Modules can concurrently progress to different phases as long as their dependencies are OK.
  • Optimization phases should run in parallel where sensible.
  • Parallelism should be restricted (but not disabled if possible) when determistic compilation or debug dumping is enabled.

This should be done mostly after modules and some basic optimization phases are in place, to have a realistic benchmark.

Consider a code coverage badge

The Azure Pipelines CI build collects coverage data but does not merge it. It would be nice to see the coverage in the repo. Not very high priority, of course.

Support locals on stack

Support method call parameters and spilled locals on stack. This is also related to #19. This requires small changes to the register allocator and an additional code path for each instruction in the emitter.

Consider the short-circuiting Boolean operators

As part of #36, the operators && and || are parsed but not enabled in semantic compilation. I left them disabled because they would introduce control flow to expression compilation. I'm not convinced that the added complexity is worth it, even though the operators are useful. This is not a practical language, after all.

The question therefore is:

  • should these operators be left out,
  • or should they be implemented?

Either answer will close this issue.

Function calls

  • Parse parameter lists in function declarations
  • Compile parameter lists
  • Parse function calls
  • Compile function calls

I do think that the language can do (at least for now) without overloading. This makes parameter type checking simple.

Should be done before #6 because this complicates register allocation.

Length property on arrays

Related to #15. Parse property (member field access) syntax and compile/emit a Length field on arrays. The biggest task is refactoring the parser to support member access.

SSA-based common subexpression elimination

  • Replace assignments where the right-hand side has already been seen with rewritten local references
  • Make sure that this is not too aggressive (only reference variables on executed paths...)
  • Consider whether repeated method calls with same arguments may be elided too (for some very simple definition of pure method)

Implement Windows x64 code generation

The last chunk of the initial vertical slice.

  • Lower the IR to an architecture-specific format
  • Implement a trivial register allocator for initial bringup
  • Implement a Portable Executable emitter
  • Implement an emitter that supports fixups and optionally produces a disassembly stream
  • Emit "good enough" assembly code for the IR
  • Implement a proper register allocator based on SSA guarantees (followup issue: #54)
  • Look into producing PDB symbols (issue: #55)
  • Implement a simple integration test suite
  • Implement the IR ops not covered by PR #53
  • Implement more Functional test cases
  • Benchmark the backend (extend the end-to-end test, add one for register allocator)
  • Fix up the hexadecimal style

Merge SSA conversion into semantic compilation

The algorithm is designed for on-the-fly SSA conversion, but I did not want to do so in the initial implementation for debuggability. Once the compilation process has stabilized more, SsaConverter should be removed and the algorithm implemented in SemanticCompiler. This removes unnecessary allocations from building the IR twice and should give some nice time savings.

Streaming peephole optimization

Currently, the lowering phase converts high IR to low IR and then the peephole optimizer is run as a separate pass. The optimizer could be converted to a streaming LIR writer: each time the lowering phase would add an instruction, the writer would check for pattern matches.

This could remove unncecessary array shuffling without making the code much more complicated.

Error recovery in parser

Currently the parser stops parsing a file at first error. If in need of a research project, consider implementing enough error recovery to e.g. continue at the next statement/block/function. There are some TODO comments in the parser related to this.

Add LIR operations with immediates

Several common x64 instructions support an immediate, but we emit temporary loads instead:

  • Arithmetic ops (add, sub, imul)
  • cmp
  • Shifts

Either add new LowOp versions or update the existing ones to contain the immediate in their Data field (with the right local set to -1).

Related to #52.

Multiple integer types

  • Add signed and unsigned integer types of various sizes to parser and type system
  • Extend ExpressionCompiler to handle the types; needs some rule design
  • Design and implement the syntax for casts
  • Extend codegen to do proper truncation/extension/whatever

Floating point data type(s)

Initially, the language could do with just double. Conversions are non-trivial.

  • Extend the parser to handle floating-point literals
  • Extend ExpressionCompiler to evaluate floating-point expressions
  • Extend codegen to allocate SIMD registers / emit SIMD code (non-vectorized) / pass floating-point parameters

Implement break and continue keywords

Would be quite useful additions to while loops (and #29 too). They were initially left out as they slightly complicate the compilation.

  • continue is the easy one, as we know the basic block to jump to,
  • break needs to jump forward to a block not yet emitted.

Implement a proper front-end

Replace the quick and dirty front-end with a real, tested command line parser. It should be able to

  • compile a specified module (default: current directory),
  • display context of diagnostic messages,
  • produce debug output for specified methods.

SSA-based constant folding

  • Fold expressions where values are constants stored in variables (ExpressionCompiler can not do this)
  • For branch conditions that become constants, perform simple dead code elimination

Investigate ImmutableList performance

The codebase uses System.Collections.ImmutableList<T> quite a lot, but almost always just builds the list with a builder and then leaves it alone. Consider wrapping List<T> in a struct and surfacing only the enumerator instead. (Can this be a near-zero cost abstraction, or at least better than passing IReadOnlyLists around? I'd like to have devirtualized calls if feasible.)

Note that the usage patterns may be slightly different once optimization passes are implemented.

Improve user-facing diagnostics

  • Consider displaying the context (the line of code) of diagnostics
  • Consider displaying diagnostics in a streaming fashion as they are emitted
  • Write Compiling [ModuleName]... to indicate progress

Add [EntryPoint] attribute

  • Implement parsing of attributes (initially parameterless only)
  • Implement attributes in declaration compilation
  • Assert that the main module includes a single entry point

Add a tutorial doc

Add a short introduction to using Clé to the documentation site. Also fix up the download link mess.

Emit debug symbols

Emit PDB symbols containing method names. More debug info can be considered later, but the compiler pipeline does not really surface e.g. variable names or sequence points.

More expressions

  • Extend lexer to accept two-character symbols like &&
  • Parse relational and Boolean / bit manipulation operators
  • Extend ExpressionCompiler to compile these expressions

This includes introducing types more properly to expression compilation and opens room for some nice refactoring.

Disable package cache on CI builds

No need to spend over a minute caching packages on a VM that will be destroyed immediately afterwards.

I accidentally committed the change on master and due to not enforcing branch protections on administrators (now fixed), it got through without a PR (6f4aa93). It successfully disabled the cache but the build step spends more time restoring packages now. Keeping this issue open to track the overall build time across more runs.

Add a performance test suite

As the compiler supports more complex programs, add some performance tests. They should have exactly equivalent implementations in other languages too (C++, C#, Python?). Ideally the tests would be run on every PR but reliability might be an issue.

Some test ideas:

Implement more peephole optimizations

Some optimizations that could be applied even in non-optimizing builds, since the peephole optimizer covers up a lot of inefficient lowering:

  • Replace "load immediate to %2, do arithmetic with %1 and %2" with "do arithmetic with %1 and imm"
  • Replace multiplication and unsigned division with a power of 2 with shifts
  • Replace "move %1 to %3, do arithmetic with %2 and %3" with "do arithmetic with %2 and %1" where possible
  • Replace "jcc next_block, jmp somewhere" with "jncc somewhere, jmp next_block" so that the latter jump can be elided in codegen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.