etaoins / llambda Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 6.0 5.46 MB

Scheme R7RS frontend for LLVM

License: Other

Scala 54.46% Scheme 14.00% Shell 0.07% CMake 0.20% C++ 30.59% C 0.02% Python 0.41% LLVM 0.22% Vim Script 0.03%

llambda's People

Contributors

Stargazers

Watchers

Forkers

8l abduld iuliandumitru bef0 esovm alxbnct

llambda's Issues

Fix debug information on LLVM 3.5

llc 3.5 crashes when attempting to compile programs produced with llambda -g. Running under Valgrind reveals invalid reads in the DWARF generation code.

Fix typegen to deal with World&

Currently typegen is generating a World* argument for the procedure typedef because it doesn't understand references. It either needs to learn about references or special case the world reference

Replace test-util.scm with partial SRFI-78 implementation

It should be possible to implement SRFI-78 without comprehension support. This should be sufficient to replace the functionality of test-util.scm in a more standardized way

Don't allow redefinitions in llambda dialect

Redefining top-level bindings in R7RS is allowed to silently succeed. In the case of storage locations the second (define) will actually implicitly create a mutable. This needs to be maintained for R7rS compatibility but should be disabled in the "llambda" dialect as its error prone and confusing.

Support vector types

(Vector) and (Vectorof) type constructors from Typed Racket should be supported. These are unstable types due to vectors being immutable; however they are useful for typing arguments and can be used to eliminate bounds checking in some instances.

Retype procedures to use native arguments

If a procedure does nothing with a boxed argument except unboxing it then that argument should be rewritten to be of the unboxed type. This allows for more efficient calls and reduces unnecessary boxing and unboxing.

Runtime support for circular lists

Certain R7RS procedures such as (write) and (list?) are required to handle lists of infinite length. Even without parser support for datum labels these can be easily created at runtime by mutating an existing list.

Add function types

First-class function types should be added as subtypes of the existing type. They do not need to be testable at runtime; we can require all function types to statically satisfied at compile time.

If possible a type-specific trampoline should be generated for procedures passed as a typed function. This will make higher-order functional code more efficient as it can reduce the overhead of calling through the default trampoline.

Remove shadow stack entry before tail calls

Currently tail recursive Scheme calls that use GC will have unbounded stack usage due to the shadow stack. It should be possible to remove our shadow stack entry before the tail call. This should allow tail recursion to work in bounded space in more situations.

Implement value cloning

To support cross-world communication we first need to support cloning of arbitrary values. We should introduce a new (llambda clone) library which exports a (clone) procedure for cloning values in the current world. This can be used to build functional tests that ensure cloning works as expected before building communication and concurrency primitives on top of it.

Support recursive record types

Record type fields should be allowed to reference the parent record type. This is difficult because the record type itself contains the record field types. Using the recursive type infrastructure would be helpful but this might involve supporting type references in union types.

Add JUnit style asserts to Scheme tests

Now that error/exit are supported it's possible to terminate execution early. It should be possible to write a simple JUnit-style test system with (assert-true), (assert-equal) etc inside test-util

Support (guard)

The example implementation of (guard) makes extensive use of (call/cc) that could probably be replaced by either (call/ec) or explicit dynamic state switching for better performance. It also uses multiple return values which would require #37 but it's unclear how essential that is to the implementation.

Fix intersection of indeterminate types

If two types don't have a strict super/subtype relationship they will currently intersect to the empty type. For example, the intersection of (Pairof <any> (Listof <any>)) and (Listof <number>) results in (U)where it should ideally be (Pairof <number> (Listof <number>)).

This may have to be resolved on a per-type basis. The list example could be resolved by intersecting the car and cdr of a pair separately.

Support for typed definitions

Add support for Typed Racket style definitions. At least (lambda:), (define:) and (let:) should be supported.

Generate stable IR for union type checks

Union types have their member types checked in an undefined order. This should be made to be consistent between compiler runs, at least for common cases such as proper lists.

Support case insensitive parsing

This is specified by R7RS mainly for R5RS compatibility

Support datum labels for shared data

Datum labels for circular data are problematic until infinite lists are supported. However, simple shared data can be supported by only making the datum label visible until its inner data has been parsed.

Generated LLVM IR is unstable between runs

Generated LLVM IR varies greatly between runs due to codegen depending on collection and operations with undefined order. GC code is particularly fragile. This makes comparing generated code for differences needlessly difficult.

Include file and line information in runtime errors

_lliby_signal_error should be extended to accept a file and line number to include in its error messages

Implement (cond-expand)

Fix runtime threading with GCC

GCC/libstdc++ requires -pthread to be passed to enable C++11 thread support. cmake 2.8 doesn't appear to reflect that when FindThreads is used. As a result the runtime will crash when it first attempts to use C++11 threads.

Categorize runtime exceptions

We only support the exception classes required by R7RS: (file-error?) and (read-error?). This isn't very precise, in particular for functional tests which can currently only assert that any error should occur during a test. We should:

Choose a logical group of error classes and ensure the compile-time exceptions conform to them. Racket seems to distinguish range, syntax, type, arguments and arity errors which could be a good start.
Add our own error classes that are harmonized with the ones in the compiler and add custom classes and predicates for those errors.
Change the functional test framework to recognize the error classes we introduce regardless if they're raised at compile or run time
Evaluate the existing functional tests and use the more specific error expectations where possible

Typed (case-lambda)

Instead of using the existing R7RS (case-lambda) macro (case-lambda) should become a primitive expression that explicitly tracks the signature of each case. This provides a number of benefits:

The sanity of the (case-lambda)'s arities can be checked at declaration time with useful diagnostics
The arity of (case-lambda)applications can be validated at compile time
(case-lambda:) with type feedback can be implemented using the existing typed lambda infrastructure
Code generation and closure handling could be improved, particularly with mutable pairs

Implement (letrec)

This needs to properly catch uses of uninitialized values as required by R7RS. Preferably this should also optimize the same as a normal (let) where possible.

Group functional tests in to a single executable

Currently every functional test is contained in a separate executable. It should be possible to combine the (expect)-style tests in to a single executable and only split it up in the case of failure.

Support full call/cc

This should be possible, albeit very difficult, with stack copying and meticulous handling of GC roots and dynamic stack entries of captured continuations. If #10 is implemented this does not need to be fast, but it is required to be a full-fledged Scheme implementation;

Support for dynamically growing GC heap

We should switch to a segmented stack approach where we initially have a small heap and incrementally add new segments as allocations occur. Ideally the first segment would live in the same page-sized allocation as the World structure

Support for multiple return values

Multiple return types can be an additional "meta" return type on top of the existing unit/single return system. Internally it could return a vector and reuse the typed vector machinery from #36.

Support garbage collection

A simple garbage collector should be implemented. The stdlib should be GC safe and register/root all the pointers it uses before potentially entering the GC

Support user-defined type constructors

It should be possible to define type constructors in the same way Typed Racket does. This would allow type constructors to be defined in the library instead of requiring hardcoded support in the frontend.

Allow custom error messages for (cast) and (ann)

(cast) and (ann) should accept a third parameter indicating which message to produce when the cast fails.

Allow square bracketed lists

Like in Racket we should allow lists to be delimited with [] in addition to the normal (). This gives the developer more options to visual distinguish program parts. In particular, Typed Racket uses it for type annotations by convention.

This would ideally be disabled in the R7RS dialect but that's not strictly required.

Constrain types with equivalence procedures

(eq?)/(eqv?)/(equal?) should intersect the types of the operands when it returns true. This should allow things like (cond) to properly propagate type information.

Improve trampoline error reporting

Generated trampolines currently produce generic type cast failure messages when the wrong number of arguments are passed to the trampoline. This should be replaced with a human-readable message.

Proper support for Llambda libraries

Right now linking to liblliby is hardcoded in to the compiler and the native function definitions are all included in the compiler itself. One approach would be:

The runtime would be split in to libllcore and libllstd. libllcore would contain the GC, bindings, World support, helper procedures for quasiquotation etc. libllstd would contain the R7RS runtime functions.
libllcore would be unconditionally linked by the compiler as liblliby currently is
A directive would be added to (llambda nfi) to require linking against arbitrary libraries. this would be used by the stdlib Scheme library to pull in libllstd.
The stdlib Scheme library would be moved out of the compiler proper and live near its C++ implementation. However this is accomplished should become a generalized pattern for Llambda libraries. The only thing special about libllstd would be its default inclusion in the library search path.

Cleaner support for defining procedures in libraries

Currently there are two problems with defining Scheme procedures in libraries like (scheme base):

The InferArgumentTypes unit test assumes there are no planned functions besides the __lambda_exec and the tested procedure
Unused procedures are emitted by codegen which will result in clutter if many are defined.

Rename the "unspecific" value to "unit"

The references to unspecific values in R7RS are conceptually similar to unit values in other languages (Haskell, ML, Scala, etc). The current name of "unspecific" is a side effect of the language in the Scheme report and not very clear for outside developers.

Rename the type to and the value to #!unit

Support (plambda)

(plambda) is an deprecated form from Typed Racket that allows polymorphic (lambda)s to be declared. At the moment out frontend only supports polymorphic procedures by type annotating (define) even though the planner supports polymorphic procedures generally.

The current recommended Racket syntax for polymorphic lambdas depends on reader extensions which we have no plans to support.

(pcase-lambda) could be done as part of this work if it's trivial.

Convert trivial call/cc in to return

When optimization is enable we should convert trivial (non-capturing, never used as value) continuations to simple early returns. Full continuations would be needlessly slow for this purpose and the planner output already has (untested) support for early returns

Generalize support for inline data

Strings, symbol, vectors and bytevectors should support inline data for small values. This will require assistance from typegen to do properly. Inline records should be ported to this new system.

Support (case-lambda) tail recursion

R7RS requires (case-lambda) to be tail recursive. However, our new (case-lambda) generation has a few issues:

We always generate (case-lambda) with the same signature as TopProcedureType regardless of the types of the clause lambdas. This usually means we'll need to allocate a multiple values list at return which prevents tail recursion. It also uses the C calling convention which isn't required to support tail calls.
Clause lambdas don't have their recursive self values passed so they need to call through a cell which isn't supported
We never produce fixed arguments for (case-lambda)'s signature which means the a tail caller will always need to allocate a rest argument list. This doesn't prevent tail recursion but makes it potentially expensive

Integer overflow is undefined

Integer overflow should either silently convert to inexact or raise an error. Currently the behaviour is undefined and varies depending where the overflow occurred (reducer, generated code or runtime)

Investigate parser performance

Currently the compiler spends most of its time parsing Scheme code. As the standard library grows and incorporates more functionality implemented in Scheme this will become worse.

At the same time support for datum labels and case sensitivity directives should be considered.

It would be interesting to use a common definition that could generate code both for Scala and for C++. This could be used to implement (read).

Allowed typed rest arguments

Allow type rest argument for Scheme procedures. Ideally the Typed Racket syntax can be reused.

Import functional tests from Llispy

Liispy should be reviewed for valid functional tests to import in to Llambda

Support (parameterize)

Support (parameterize) before working on the garbage collector to ensure they interact properly

Allow optional typing for normal forms in llamda dialect

In the llambda Scheme dialect we should allow optional type specifiers in (define), (lambda) etc. to make it easier to opt-in to typing. This shouldn't require importing (llambda typed) although it won't be very useful without the builtin types it defines.

The (define:), (lambda:) etc. forms should still require types on all of their arguments and remain usable from the R7RS dialect

Remove shadowstack usage

shadowstack is not thread safe. It should be possible to implement something almost identical ourselves using the world pointer. It's not worth the build system overhead of making this a full LLVM plugin. Instead, just directly manipulate the world ourselves as we do for allocations.

Support source level debugging

Now that LLVM IR had proper metadata support we should:

Pass the source location through the planner
Add DWARF metadata helpers in LLVM IR
Generate DWARF metadata if -g is passed to the compiler

Simply being able to produce readable, line-numbered backtraces through Scheme functions would be a huge win. Any other DWARF/GDB features that could be supported would be bonus functionality.