Git Product home page Git Product logo

inko's People

Contributors

apahl avatar awoo-civ avatar bartlomieju avatar brettcannon avatar dundargoc avatar dusty-phillips avatar jc00ke avatar jinyus avatar matheusrich avatar mbarbar avatar nickforall avatar rex4539 avatar thaodt avatar uasi avatar yorickpeterse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inko's Issues

Thread-local Heaps

Currently all objects are allocated on a global heap. While this is a rather simple setup it also poses a serious problems: one thread allocating objects at a high rate could potentially block other threads. In an early setup Aeon had thread-local heaps but I removed them due to it being a bit too much of a pain in the setup of that time. At some point Aeon should move back to allocating memory on thread-local heaps by default.

Setup

Each thread has its own eden/young/mature heap and objects created in a thread are stored in those heaps. There's also a single global heap that doesn't separate objects into eden/young/mature generations. The reason for this is that global objects usually stick around for most (if not all) of the process' lifetime.

Regular objects can be allocated on the global heap using set_global_object which otherwise behaves identical to set_object. Thread-local objects added to a global object (e.g. a local integer being stored in a global array) are moved to the global heap. At any given point in time an object can only exist in either the global or a thread-local heap.

GC

To further ensure one thread can't impact the performance of other threads by merely allocating objects there should be some form of thread-local garbage collection. There are basically 3 ways of doing so:

  • Run an in-thread GC cycle, blocking the thread while the GC runs
  • Run 1 GC thread for every regular thread. This could potentially remove the need for blocking threads at the cost of running two times the amount of threads
  • Run a fixed number of GC threads with the amount being configurable

Using merely a single GC thread for all regular threads could result in the GC not releasing objects fast enough, thus resulting in an increase of memory usage.

Remove the need for allocating integers

Integers being such a fundamental and commonly used type should not require allocations, or at least not 32 bytes opposed to the actual 8 bytes a 64 bits integer would take up.

Type System

Aeon will be gradually typed on compiler level while the actual virtual machine is dynamically typed. Besides this there are still a lot of things to think about, such as:

  • Support for algebraic data types?
  • Syntax for type hinting
  • Syntax for specifying the types of collections (e.g. Array<String>)
  • A proper type inference system

Use regular arrays and integers for strings

Instead of using a dedicated ByteArray type strings will just be regular arrays of integers. This makes it possible to re-use generic array VM instructions, opposed to having to duplicate these for a ByteArray type.

File instructions

The following instructions should be added for working with files, stdout, stderr, etc.

Files

  • file_open: opens a file with a certain mode: dfbf164
  • file_write: writes N bytes to a file: 730a70d
  • file_read: reads N bytes from a file: c2caf3b
  • file_read_line: reads an entire line: 7706e03
  • file_flush: flushes the file: 868de26
  • file_size: returns the size of the file in bytes: 583d471
  • file_seek: db7c3fb

stdout/stderr/etc

Bytecode compiler

A bytecode compiler has to be implemented. This compiler takes an AST of a file and generates the corresponding bytecode. The bytecode is then cached in a file (see #9) which is then fed to the VM.

TODO

This is not a final list and is expected to change somewhat as I continue work on the compiler:

  • Add support for all language features in the ModuleBody pass
  • Rework type inference. This is currently a separate pass so methods can refer to each other regardless of order. This however leads to quite a bit of duplication logic and makes it hard to figure out what the right types are when dealing with raw instructions. Getting this right is crucial to ensure good type analysis for some of the core objects.
    • Preferably this pass is merged into the ModuleBody pass and referring to methods-defined-later-but-on-the-same-receiver are handled in a different way.
    • At the same time keeping this in a separate pass makes the ModuleBody pass simpler
    • A downside of having this in a separate pass is that you can refer to a constant before it's defined, leading to runtime errors
    • Solution: hoisting of constant, object, trait, and method definitions: 8b2e75c
  • Remove TIR::Builder once all code has been ported over to the multi-pass compiler
  • Remove TypeLookup#initialize_type
  • Move the instruction helpers from CodeObject (e.g. set_attribute) into the ModuleBody pass. The separation leads to a rather clunky API
  • Add an optimisation pass with some basic optimisations
    • Constant folding
    • Tail-call optimisation
      • This will be required to implement loops without having to mess with raw instructions and hard-coded instruction offsets for the various goto VM instructions
    • Inlining methods that simply return instance attributes
      • This ensures simple getter methods don't introduce additional overhead

Use regular arrays and integers for strings

Instead of using a dedicated ByteArray type strings will just be regular arrays of integers. This makes it possible to re-use generic array VM instructions, opposed to having to duplicate these for a ByteArray type.

Method Overloading and Name Mangling

Aeon should support the overloading and delegating of method calls based on the type signatures. For example, the following code is valid:

def add(left: Integer, right: Integer) -> Integer { }
def add(left: Float, right: Float) -> Float {}

add(10, 20)
add(10.5, 20.5)

Because the VM stores methods using their name the compiler should use some form of name mangling based on the type signature. For example, the method slot names in the above example could be addIntegerIntegerINteger and addFloatFloatFloat. In other words:

slot name = method name + argument types + return type

This does require storing the actual method name ("add") somewhere separately so it can be used for error messages and the likes.

Supporting method overloading does mean you can't just grab a method by its name (during runtime) as there can be multiple methods mapped to the same name. This is only the case if a method is overloaded. If a user tries to grab an overloaded method by its name without specifying the argument types and return type, an error should be raised. Thus:

method('add') # => error
method('add', [Integer, Integer], Integer) # => gives back the Integer based "add" method

Alternatively this could return some kind of "OverloadedMethod" object which behaves like a regular Method object while providing access to the various overloading Method objects.

Traits

Traits will be implemented in the runtime (instead of solely in the compiler), with the compiler having special knowledge of how they're used. That is, the compiler will know that certain methods create traits while others apply them to classes/other traits. The syntax is something I'm still not entirely sure about.

Boolean type/prototype

Certain instructions will produce boolean values (e.g. integer_lt). To support this there should be instructions for setting the prototype of these boolean values as well as instructions for creating true/false values. Both true and false should have their own prototype. Thus the instructions are as following:

Just In Time Compilation

One day Aeon will feature a JIT compiler, probably based on LLVM. The JIT will operate using a set of special VM instructions that can be used to specify the types of values and other information useful to the JIT. Ideally as much of the JIT is implemented in Aeon itself.

Bytecode compiler

A bytecode compiler has to be implemented. This compiler takes an AST of a file and generates the corresponding bytecode. The bytecode is then cached in a file (see #9) which is then fed to the VM.

TODO

This is not a final list and is expected to change somewhat as I continue work on the compiler:

  • Add support for all language features in the ModuleBody pass
  • Rework type inference. This is currently a separate pass so methods can refer to each other regardless of order. This however leads to quite a bit of duplication logic and makes it hard to figure out what the right types are when dealing with raw instructions. Getting this right is crucial to ensure good type analysis for some of the core objects.
    • Preferably this pass is merged into the ModuleBody pass and referring to methods-defined-later-but-on-the-same-receiver are handled in a different way.
    • At the same time keeping this in a separate pass makes the ModuleBody pass simpler
    • A downside of having this in a separate pass is that you can refer to a constant before it's defined, leading to runtime errors
    • Solution: hoisting of constant, object, trait, and method definitions: 8b2e75c
  • Remove TIR::Builder once all code has been ported over to the multi-pass compiler
  • Remove TypeLookup#initialize_type
  • Move the instruction helpers from CodeObject (e.g. set_attribute) into the ModuleBody pass. The separation leads to a rather clunky API
  • Add an optimisation pass with some basic optimisations
    • Constant folding
    • Tail-call optimisation
      • This will be required to implement loops without having to mess with raw instructions and hard-coded instruction offsets for the various goto VM instructions
    • Inlining methods that simply return instance attributes
      • This ensures simple getter methods don't introduce additional overhead

Re-investigate use of outer locking of structures

Currently structures such as Object use internal RwLocks for mutable fields. While this makes usage of these structs easier it comes at a memory usage cost. Using internal locks means having to use an RwLock for every mutable field. Using a single outer RwLock would cut down the size of Object from 88 bytes to 56 bytes.

String Instructions

  • string_to_lower: 43baf0a
  • string_to_upper: b5213fb
  • string_to_bytes: should return the bytes of a string preferably without allocating memory for every fixnum: c5471a5
  • string_from_bytes: ff8f4ef
  • string_equals: 35efc54
  • string_length: returns the amount of characters: 0e307ef
  • string_size: returns the amount of bytes: 5e1ce42

More will be added when needed/when I can think of them.

Parser setup

Once the VM is in a decent shape it will be time to write a parser for the Aeon syntax. Currently there aren't any decent parser generators in Rust that don't require a nightly build. Either I'll have to write my own or wait for a decent library to show up.

Use an Arc for CallFrame structs

Since Aeon will allow call frames to be used as individual objects (from multiple threads) they should be wrapped in an Arc. This should also remove the need for using Box in CallFrame#set_parent.

Garbage Collection

Aeon will feature a precise, generational, concurrent garbage collector. This GC will be used for the young/mature generations. The eden generation will probably use a copying collector, although I'm not entirely sure just yet.

Heap wise there would be the following heaps:

  • Eden, divided into survivor 1 and survivor 2
  • Young
  • Mature
  • Permanent (= pinned objects, e.g. classes)

Parser setup

Once the VM is in a decent shape it will be time to write a parser for the Aeon syntax. Currently there aren't any decent parser generators in Rust that don't require a nightly build. Either I'll have to write my own or wait for a decent library to show up.

Method Overloading and Name Mangling

Aeon should support the overloading and delegating of method calls based on the type signatures. For example, the following code is valid:

def add(left: Integer, right: Integer) -> Integer { }
def add(left: Float, right: Float) -> Float {}

add(10, 20)
add(10.5, 20.5)

Because the VM stores methods using their name the compiler should use some form of name mangling based on the type signature. For example, the method slot names in the above example could be addIntegerIntegerINteger and addFloatFloatFloat. In other words:

slot name = method name + argument types + return type

This does require storing the actual method name ("add") somewhere separately so it can be used for error messages and the likes.

Supporting method overloading does mean you can't just grab a method by its name (during runtime) as there can be multiple methods mapped to the same name. This is only the case if a method is overloaded. If a user tries to grab an overloaded method by its name without specifying the argument types and return type, an error should be raised. Thus:

method('add') # => error
method('add', [Integer, Integer], Integer) # => gives back the Integer based "add" method

Alternatively this could return some kind of "OverloadedMethod" object which behaves like a regular Method object while providing access to the various overloading Method objects.

Object Instructions

The following instructions should be added for working with generic objects:

  • object_attrs: returns the attribute names of an object
  • object_methods: returns the method names of an object
  • object_prototype: returns the prototype of an object
  • object_add_method: adds a new method to the object
  • object_remove_method: removes a method from an object
  • object_get_method: returns a method of an object

These instructions will require that methods, attributes, and prototypes are somehow represented using RcObject values instead of the raw VM structures.

Use an Arc for CallFrame structs

Since Aeon will allow call frames to be used as individual objects (from multiple threads) they should be wrapped in an Arc. This should also remove the need for using Box in CallFrame#set_parent.

Examples

Created by: tbu-

Some example code or VM bytecode snippets would be cool. :)

Remove the need for allocating integers

Integers being such a fundamental and commonly used type should not require allocations, or at least not 32 bytes opposed to the actual 8 bytes a 64 bits integer would take up.

Thread-local Heaps

Currently all objects are allocated on a global heap. While this is a rather simple setup it also poses a serious problems: one thread allocating objects at a high rate could potentially block other threads. In an early setup Aeon had thread-local heaps but I removed them due to it being a bit too much of a pain in the setup of that time. At some point Aeon should move back to allocating memory on thread-local heaps by default.

Setup

Each thread has its own eden/young/mature heap and objects created in a thread are stored in those heaps. There's also a single global heap that doesn't separate objects into eden/young/mature generations. The reason for this is that global objects usually stick around for most (if not all) of the process' lifetime.

Regular objects can be allocated on the global heap using set_global_object which otherwise behaves identical to set_object. Thread-local objects added to a global object (e.g. a local integer being stored in a global array) are moved to the global heap. At any given point in time an object can only exist in either the global or a thread-local heap.

GC

To further ensure one thread can't impact the performance of other threads by merely allocating objects there should be some form of thread-local garbage collection. There are basically 3 ways of doing so:

  • Run an in-thread GC cycle, blocking the thread while the GC runs
  • Run 1 GC thread for every regular thread. This could potentially remove the need for blocking threads at the cost of running two times the amount of threads
  • Run a fixed number of GC threads with the amount being configurable

Using merely a single GC thread for all regular threads could result in the GC not releasing objects fast enough, thus resulting in an increase of memory usage.

Garbage Collection

Aeon will feature a precise, generational, concurrent garbage collector. This GC will be used for the young/mature generations. The eden generation will probably use a copying collector, although I'm not entirely sure just yet.

Heap wise there would be the following heaps:

  • Eden, divided into survivor 1 and survivor 2
  • Young
  • Mature
  • Permanent (= pinned objects, e.g. classes)

File instructions

The following instructions should be added for working with files, stdout, stderr, etc.

Files

  • file_open: opens a file with a certain mode: dfbf164
  • file_write: writes N bytes to a file: 730a70d
  • file_read: reads N bytes from a file: c2caf3b
  • file_read_line: reads an entire line: 7706e03
  • file_flush: flushes the file: 868de26
  • file_size: returns the size of the file in bytes: 583d471
  • file_seek: db7c3fb

stdout/stderr/etc

Error Handling

Certain instructions such as file_open and file_flush can produce errors that should be handled by Aeon code, opposed to instructing the VM to terminate (this should only happen for VM errors). For this to work these operations will need to set either an OK or Error kind of object, much like Rust's Result enum. The Error object would have to contain the following information:

  • A message indicating the error
  • An array containing all the VM call frames as Aeon objects (instead of CallFrame structs). This can basically be an Array of Arrays in the format [[name, file, line], ...].

Errors have to be proper Aeon objects, even though they won't directly be exposed to the language (instead there will be some sort of proper Result enum).

Probably the most simple form is to either set whatever an instruction would produce upon success (true, number of bytes written, etc), or an error object. Using the instruction is_error the compiler can then generate code for handling instruction errors. For this to work the following instructions should be available:

  • is_error: sets a slot to true if a given slot contains an error
  • error_message: extracts the message
  • error_backtrace: simply extracts the Array of call frames and stores it somewhere

Because these error objects are internal objects (they should not be directly exposed) there's no need for another prototype. However, a new enum member for ObjectValue will be required.

String Instructions

  • string_to_lower: 43baf0a
  • string_to_upper: b5213fb
  • string_to_bytes: should return the bytes of a string preferably without allocating memory for every fixnum: c5471a5
  • string_from_bytes: ff8f4ef
  • string_equals: 35efc54
  • string_length: returns the amount of characters: 0e307ef
  • string_size: returns the amount of bytes: 5e1ce42

More will be added when needed/when I can think of them.

Error Handling

Certain instructions such as file_open and file_flush can produce errors that should be handled by Aeon code, opposed to instructing the VM to terminate (this should only happen for VM errors). For this to work these operations will need to set either an OK or Error kind of object, much like Rust's Result enum. The Error object would have to contain the following information:

  • A message indicating the error
  • An array containing all the VM call frames as Aeon objects (instead of CallFrame structs). This can basically be an Array of Arrays in the format [[name, file, line], ...].

Errors have to be proper Aeon objects, even though they won't directly be exposed to the language (instead there will be some sort of proper Result enum).

Probably the most simple form is to either set whatever an instruction would produce upon success (true, number of bytes written, etc), or an error object. Using the instruction is_error the compiler can then generate code for handling instruction errors. For this to work the following instructions should be available:

  • is_error: sets a slot to true if a given slot contains an error
  • error_message: extracts the message
  • error_backtrace: simply extracts the Array of call frames and stores it somewhere

Because these error objects are internal objects (they should not be directly exposed) there's no need for another prototype. However, a new enum member for ObjectValue will be required.

Object Instructions

The following instructions should be added for working with generic objects:

  • object_attrs: returns the attribute names of an object
  • object_methods: returns the method names of an object
  • object_prototype: returns the prototype of an object
  • object_add_method: adds a new method to the object
  • object_remove_method: removes a method from an object
  • object_get_method: returns a method of an object

These instructions will require that methods, attributes, and prototypes are somehow represented using RcObject values instead of the raw VM structures.

Bytecode serialization

VM bytecode has to be serialized to (and loading from) a file somehow. This allows the compiler to cache generated bytecode and load this into the VM.

  • Parser: f8d2731
  • Parser tests
  • Parser documentation/specification

Enum Support

Aeon as a language should support enums. Enums will be regular objects like any
other, instead of being some kind of special compile time construct. The
compiler should be able to understand enums and validate their use (e.g. all
variants being matched when pattern matching).

Syntax

Enums are defined using the following syntax:

enum EnumName {
    Variant1
    Variant2

    def method1() {}
    def method2() {}
}

The following alternatives are also valid:

enum EnumName {
    Variant1
    Variant2
}

enum EnumName { Variant1 Variant2 }

Enum variants can also wrap a value using the following syntax:

enum EnumName {
    SomeVariant(String)
}

Enums can also use generics:

enum Result<Ok_T, ERR_T> {
    Ok(Ok_T)
    Error(ERR_T)
}

This leads to the following (simplified) grammar:

const = [A-Z][a-zA-Z0-9_]

types = '<' const (',' const)* '>'

enum  = 'enum' const types '{' enum_body '}'

enum_body = (const | method_def)*

Commas should not be required after enum variants as they are already separated
by whitespace (unless any conflicts arise when writing the actual parser).

Runtime Implementation

Enum classes/methods are defined in an "enum" module provided by the core
library. At the very least this module should provide the following classes:

  • enum.Enum
  • enum.Variant
  • enum.TypedVariant

Every enum is an instance of enum.Enum, every variant is an instance of an
intermediate class (generated by the compiler) that extends enum.Variant (or
enum.TypedVariant which in turn extends enum.Variant). This is needed as
otherwise code such as the following would not work:

let value = Result::Ok

value.instance_of?(Result::Ok).if_true { ... }

If Result::Ok were to be just an instance of enum.Variant there would be no way
to distinguish between a Result::Ok and a Foo::Bar variant as both would be
instances of the exact same class.

The classes for enum variants are anonymous and generated automatically for the
compiler. For the sake of clarity this document will refer to these classes as
the variant name suffixed by "Class". For example, consider the following enum:

enum Result {
    Ok
    Error(String)
}

The compiler would generate the following classes:

class Result::OkClass : enum.Variant {}

class Result::ErrorClass : enum.TypedVariant {
    def ()(value: String) -> Result::ErrorInstance {
        Result::ErrorInstance.new(value)
    }
}

class Result::ErrorInstance : Result::ErrorClass {
    @value: String

    def construct(value: String) {
        @value = value
    }
}

These would then be set using the following pseudo code:

Result::Ok    = Result::OkClass.new
Result::Error = Result::ErrorClass.new

When wrapping a value in a Result::Error the return value would be an instance
of Result::ErrorInstance. Because this class extends Result::ErrorClass (which
is stored in Result::Error) code such as the following will work perfectly:

let result = Result::Error("yay this works")

result.match(Result::Error) |value| { ... }

This is identical to the following:

let result = Result::ErrorInstance.new("yay this works")

result.match(Result::ErrorClass) |value| { ... }

Methods can also be defined for an enum, for example:

enum Result {
    Ok
    Error

    def ok?()    { ... }
    def error?() { ... }
}

The compiler injects these methods into the class definition of every enum
variant. Class methods can be defined using the following syntax:

enum Result {
    Ok
    Error

    def self.some_method() { ... }
}

Pattern Matching

Pattern matching is done by calling the match method on an enum value. For
example:

enum Letter { A B C }

let letter = Letter::A

letter.match(Letter::A) { ... }
      .match(Letter::B) { ... }
      .match(Letter::C) { ... }

An underscore can be used to match any remaining variants:

letter.match(Letter::A) { ... }
      .match(_)         { ... }

For this to work an underscore should be some kind of special "anything"
type/object that "tricks" match into thinking that whatever the current variant
is matches. The compiler should also prevent the following from being possible:

letter.match(_)         { ... }
      .match(Letter::A) { ... }

This is invalid because the 2nd match would never be reached. This is also
invalid as all arms have already been matched:

letter.match(Letter::A) { ... }
      .match(Letter::B) { ... }
      .match(Letter::C) { ... }
      .match(_)         { ... }

The type passed to match can only be one of the enum variants. This is not
possible:

letter.match(Letter::A)    { ... }
      .match(Option::None) { ... }

When matching a typed variant the wrapped value is exposed as an argument to the
supplied callback function:

let result = Result::Error("nay!")

result.match(Result::Error) (message) { ... }
      .match(_) { ... }

This will most likely require generating a match method for variants that wrap
a value. How exactly this will work is something I'm not sure of just yet.

Compiler Rules

The compiler should enforce the following rules:

  • All enum variants must be matched when using match, or a wildcard must be
    used (a single underscore) for the last match call
  • A wildcard match may only be used as the last possible match
  • Values with the type being a specific enum may only be set to one of the
    variants of said enum

And probably more, but this is all I can think of for now.

TODO

  • Define method signatures for the match method
  • Define layout for the Enum/Variant classes

Enum Support

Aeon as a language should support enums. Enums will be regular objects like any
other, instead of being some kind of special compile time construct. The
compiler should be able to understand enums and validate their use (e.g. all
variants being matched when pattern matching).

Syntax

Enums are defined using the following syntax:

enum EnumName {
    Variant1
    Variant2

    def method1() {}
    def method2() {}
}

The following alternatives are also valid:

enum EnumName {
    Variant1
    Variant2
}

enum EnumName { Variant1 Variant2 }

Enum variants can also wrap a value using the following syntax:

enum EnumName {
    SomeVariant(String)
}

Enums can also use generics:

enum Result<Ok_T, ERR_T> {
    Ok(Ok_T)
    Error(ERR_T)
}

This leads to the following (simplified) grammar:

const = [A-Z][a-zA-Z0-9_]

types = '<' const (',' const)* '>'

enum  = 'enum' const types '{' enum_body '}'

enum_body = (const | method_def)*

Commas should not be required after enum variants as they are already separated
by whitespace (unless any conflicts arise when writing the actual parser).

Runtime Implementation

Enum classes/methods are defined in an "enum" module provided by the core
library. At the very least this module should provide the following classes:

  • enum.Enum
  • enum.Variant
  • enum.TypedVariant

Every enum is an instance of enum.Enum, every variant is an instance of an
intermediate class (generated by the compiler) that extends enum.Variant (or
enum.TypedVariant which in turn extends enum.Variant). This is needed as
otherwise code such as the following would not work:

let value = Result::Ok

value.instance_of?(Result::Ok).if_true { ... }

If Result::Ok were to be just an instance of enum.Variant there would be no way
to distinguish between a Result::Ok and a Foo::Bar variant as both would be
instances of the exact same class.

The classes for enum variants are anonymous and generated automatically for the
compiler. For the sake of clarity this document will refer to these classes as
the variant name suffixed by "Class". For example, consider the following enum:

enum Result {
    Ok
    Error(String)
}

The compiler would generate the following classes:

class Result::OkClass : enum.Variant {}

class Result::ErrorClass : enum.TypedVariant {
    def ()(value: String) -> Result::ErrorInstance {
        Result::ErrorInstance.new(value)
    }
}

class Result::ErrorInstance : Result::ErrorClass {
    @value: String

    def construct(value: String) {
        @value = value
    }
}

These would then be set using the following pseudo code:

Result::Ok    = Result::OkClass.new
Result::Error = Result::ErrorClass.new

When wrapping a value in a Result::Error the return value would be an instance
of Result::ErrorInstance. Because this class extends Result::ErrorClass (which
is stored in Result::Error) code such as the following will work perfectly:

let result = Result::Error("yay this works")

result.match(Result::Error) |value| { ... }

This is identical to the following:

let result = Result::ErrorInstance.new("yay this works")

result.match(Result::ErrorClass) |value| { ... }

Methods can also be defined for an enum, for example:

enum Result {
    Ok
    Error

    def ok?()    { ... }
    def error?() { ... }
}

The compiler injects these methods into the class definition of every enum
variant. Class methods can be defined using the following syntax:

enum Result {
    Ok
    Error

    def self.some_method() { ... }
}

Pattern Matching

Pattern matching is done by calling the match method on an enum value. For
example:

enum Letter { A B C }

let letter = Letter::A

letter.match(Letter::A) { ... }
      .match(Letter::B) { ... }
      .match(Letter::C) { ... }

An underscore can be used to match any remaining variants:

letter.match(Letter::A) { ... }
      .match(_)         { ... }

For this to work an underscore should be some kind of special "anything"
type/object that "tricks" match into thinking that whatever the current variant
is matches. The compiler should also prevent the following from being possible:

letter.match(_)         { ... }
      .match(Letter::A) { ... }

This is invalid because the 2nd match would never be reached. This is also
invalid as all arms have already been matched:

letter.match(Letter::A) { ... }
      .match(Letter::B) { ... }
      .match(Letter::C) { ... }
      .match(_)         { ... }

The type passed to match can only be one of the enum variants. This is not
possible:

letter.match(Letter::A)    { ... }
      .match(Option::None) { ... }

When matching a typed variant the wrapped value is exposed as an argument to the
supplied callback function:

let result = Result::Error("nay!")

result.match(Result::Error) (message) { ... }
      .match(_) { ... }

This will most likely require generating a match method for variants that wrap
a value. How exactly this will work is something I'm not sure of just yet.

Compiler Rules

The compiler should enforce the following rules:

  • All enum variants must be matched when using match, or a wildcard must be
    used (a single underscore) for the last match call
  • A wildcard match may only be used as the last possible match
  • Values with the type being a specific enum may only be set to one of the
    variants of said enum

And probably more, but this is all I can think of for now.

TODO

  • Define method signatures for the match method
  • Define layout for the Enum/Variant classes

Re-investigate use of outer locking of structures

Currently structures such as Object use internal RwLocks for mutable fields. While this makes usage of these structs easier it comes at a memory usage cost. Using internal locks means having to use an RwLock for every mutable field. Using a single outer RwLock would cut down the size of Object from 88 bytes to 56 bytes.

Just In Time Compilation

One day Inko will feature a JIT compiler, probably based on LLVM. The JIT will operate using a set of special VM instructions that can be used to specify the types of values and other information useful to the JIT. Ideally as much of the JIT is implemented in Inko itself.

Module System

Aeon should have a proper module system, opposed to top-level constants being globally available (like Ruby). A file by default is the module code resides in. For example, for the file foo.aeon the module is called "foo". One should also be able to declare modules by calling a module method and passing it a closure of sorts:

module foo {

}

Contents of modules can be imported into another module and optionally renamed. If two modules define the same names in the same scope the programmer is forced to alias one of the two. Importing a module is done using import which takes a module name or pattern to import. Some examples:

import aeon::collections::set                    # content available as "set::Set"
import aeon::collections::set::*                 # content available as "Set"
import aeon::collections::set::Set as SetClass   # content available as "SetClass"
from aeon::collections::set import Set, Foo, Bar

Whether import and from will be methods or special compiler instructions is something I have to think about. Ideally this is done on runtime level, though this would mean not being able to use the above syntax (due to special keywords such as as, instead the syntax would be as following:

import 'aeon::collections::set'
import 'aeon::collections::set::*'
import 'aeon::collections::set::Set': 'SetClass'
from('aeon::collections::set').import('Set', 'Foo', 'Bar')

or:

import 'aeon::collections::set'
import 'aeon::collections::set::*'
import 'aeon::collections::set::Set': 'SetClass'
import 'aeon::collections::set', 'Set', 'Foo', 'Bar'

The compiler should be able to understand modules, aliasing, etc, and provide errors in case two or more modules conflict.

Ideally modules also support associating versions and loading in newer versions of the same module (replacing the older one). This would allow for hot reloading of code in a similar vain to Erlang. This will most likely not be available in the first release.

Type System

Aeon will be gradually typed on compiler level while the actual virtual machine is dynamically typed. Besides this there are still a lot of things to think about, such as:

  • Support for algebraic data types?
  • Syntax for type hinting
  • Syntax for specifying the types of collections (e.g. Array<String>)
  • A proper type inference system

Bytecode serialization

VM bytecode has to be serialized to (and loading from) a file somehow. This allows the compiler to cache generated bytecode and load this into the VM.

  • Parser: f8d2731
  • Parser tests
  • Parser documentation/specification

Traits

Traits will be implemented in the runtime (instead of solely in the compiler), with the compiler having special knowledge of how they're used. That is, the compiler will know that certain methods create traits while others apply them to classes/other traits. The syntax is something I'm still not entirely sure about.

Boolean type/prototype

Certain instructions will produce boolean values (e.g. integer_lt). To support this there should be instructions for setting the prototype of these boolean values as well as instructions for creating true/false values. Both true and false should have their own prototype. Thus the instructions are as following:

Examples

Some example code or VM bytecode snippets would be cool. :)

Module System

Aeon should have a proper module system, opposed to top-level constants being globally available (like Ruby). A file by default is the module code resides in. For example, for the file foo.aeon the module is called "foo". One should also be able to declare modules by calling a module method and passing it a closure of sorts:

module foo {

}

Contents of modules can be imported into another module and optionally renamed. If two modules define the same names in the same scope the programmer is forced to alias one of the two. Importing a module is done using import which takes a module name or pattern to import. Some examples:

import aeon::collections::set                    # content available as "set::Set"
import aeon::collections::set::*                 # content available as "Set"
import aeon::collections::set::Set as SetClass   # content available as "SetClass"
from aeon::collections::set import Set, Foo, Bar

Whether import and from will be methods or special compiler instructions is something I have to think about. Ideally this is done on runtime level, though this would mean not being able to use the above syntax (due to special keywords such as as, instead the syntax would be as following:

import 'aeon::collections::set'
import 'aeon::collections::set::*'
import 'aeon::collections::set::Set': 'SetClass'
from('aeon::collections::set').import('Set', 'Foo', 'Bar')

or:

import 'aeon::collections::set'
import 'aeon::collections::set::*'
import 'aeon::collections::set::Set': 'SetClass'
import 'aeon::collections::set', 'Set', 'Foo', 'Bar'

The compiler should be able to understand modules, aliasing, etc, and provide errors in case two or more modules conflict.

Ideally modules also support associating versions and loading in newer versions of the same module (replacing the older one). This would allow for hot reloading of code in a similar vain to Erlang. This will most likely not be available in the first release.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.