dinfuehr / dora Goto Github PK

Dora VM

License: MIT License

Rust 96.53% Ruby 0.82% Shell 0.19% Java 0.92% Go 0.05% JavaScript 1.25% Batchfile 0.01% Raku 0.06% Python 0.01% Perl 0.08% TypeScript 0.08%

rust dora jit compiler aarch64 assembly x86-64

dora's People

Contributors

Stargazers

Watchers

dora's Issues

Do not create constructors for class definitions without ()

Currently

class Foo

allows creating an instance with

Foo()

this should be disallowed, in favor of requiring

class Foo()

as non-instantiable classes are very useful and required for bottom types like Unit and Nothing.

1.not(): thread 'main' panicked at 'unknown intrinsic IntNot'

I assume this should be the same as !1 (which works)?

Adding a reference link for dora

Hii @dinfuehr,

From the code this seems like great project, I wanted to learn more about this dora language could you add a relevant link to your README.md please?

Introduce Hash trait, immutable flag on Class and runtime magic to cache hashcodes

Add a Hash trait:

trait Hash {
  fun hash() -> Int;
}

Add is_immutable to Class and let the compiler set it based on the following rules:

An open class is mutable.
A var field makes the class is mutable.
A let field with a primitive type is immutable.
A let field with a reference type which is itself immutable is immutable.

Use one of the unused lower bits of the vtable pointer of an instance (we align by 8 bytes, so we have 3 bits to spare) as a "hash is cached bit".
Move the layout of an immutable class C for which sizeOf<C> % 8 >= 4 by 4 bytes, freeing up 4 bytes after between the instance header and the first field in the new layout.
Emit additional code for fun hash: Check whether the "hash is cached bit" is set and ...

if true, return the value of the 4 bytes in the beginning.
if false, compute the hashcode, stores the result in the first 4 bytes and set the "hash is cached bit"

Implement `cast`?

I think this is a very useful thing for pointer operations e.g:

let array = cast<Array<Int>>(my_pointer)

Or to get value from pointer:

let ptr = cast<long>(pointer + offset);
let integer = cast<Int>(ptr);

macos test - stuck on starting your workflow

Multiple people seem to have this problem. I guess we will just have to wait till it gets fixed, we could stop the macOS test for now, if it annoys you :)

https://github.community/t5/GitHub-Actions/Stuck-on-quot-starting-your-workflow-run-quot/td-p/31702/page/2

Implement Sortable for Float/Double

Abstract get and set functions into Get and Set trait

This would require supporting traits with generics such as

trait Get<T, R> {
  fun get(value: T) -> R;
}
trait Set<T, R> {
  fun set(at: T, value: R);
}

and impls like

impl Get<int, R> for Array<R> { ... }
impl Set<int, R> for Array<R> { ... }

Calling function on value that doesn't implement what a trait impl defines: Terminated by signal SIGSEGV (Address boundary error)

fun hash() is not implemented (yet) for String, so this should be a compile-time error, not a crash:

fun main() {
  // “target/debug/dora tests/array15…” terminated by signal SIGSEGV (Address boundary error)
  hash::<String>("foo");
}

trait Hash {
  fun hash() -> Int;
  // fun hashTo(hasher: Hasher);
}

impl Hash for String {
  fun hash() -> Int = self.hash(); // method does not exist
}

fun hash<T : Hash>(val: T) -> Int = val.hash();

Do not use RWX memory

ProtType::Executable => libc::PROT_READ | libc::PROT_WRITE | libc::PROT_EXEC in os/mem.rs should be libc::PROT_READ | libc::PROT_EXEC due to https://en.wikipedia.org/wiki/W%5EX

running ./test doesn't run all tests

like the tests in dora-parser

panic with 'no machine mode for ().' on reference equality

Consider this class:

class Foo(i: int) {}

Works:

let f1 = Foo(1);
let f2 = Foo(2);
assert(f1 !== f2);

Panics:

assert(Foo(1) !== Foo(2));

Implement some basic file IO functionality

What would be interesting is a simple File I/O-API such that we could do more interesting stuff with Dora.

That would probably also allow us to run more benchmarks from the computer language benchmark game.

io.dora would probably a good starting point to add some additional IO operations.

Stack overflow for "string".toString()

stack overflow
16000: toString() -> String: 30
15999: toString() -> String: 30
15998: toString() -> String: 30
...

Support trait bounds on instance functions ...

... such that an existing type originating from an outer scope can be further constrained:

class List[T](...) {
 // this method should convey an additional constraint that `T` needs to be sortable:
  fun sort() { ... }
}

works.

Prior Art

Existing solutions seem to be rather poor/ad-hoc/special-cased:

Java: Hoist the method from an instance method to a static method.
In this example we add the constraint Comparable to a List, to be able to sort it:
static <T extends Comparable<? super T>> void sort(List<T> list)
Scala: Add an implicit parameter Ordering which knows how to sort the list's element type.
def sorted[B >: A](implicit ord: Ordering[B]): List[A]
Rust: Add the constraint Ord in a where clause:
pub fn sort(&mut self) where T: Ord

I feel that none of those approaches is appealing – either they are workarounds (Java), expose a lot of unnecessary machinery (Scala), or feel like an afterthought (Rust).

While constraints can be added on typeclass implementations, like in Rust, I feel that it's worthwhile to be able to be implement typeclasses directly inside the class definition to avoid Rust's spread of implementation pieces across different places in the code (see #56).

The Bigger Picture

It's important to remember that the core Issue is purely a syntactic one:

We need a way to distinguish between introducing a type parameter and referring to it.

Traditionally, type parameter declarations are binary, were the left part introduces a new type parameter, while the right part refers to an existing type:

[T : Sortable[T]]
 |      |
 |     Refers to existing types
 |
Introduces new type parameter

Other Approaches

Let's consider a List[T] that should support a method sort only if the contained value of T supports the typeclass Sortable.

(In the next examples I added SomeUnrelatedTypeclass as a class-level constraint to demonstrate the syntax in a bigger context. I also elided the constructor parameters of List.)

Starting point – this doesn't work, because we haven't added the constraint which means we cannot use the methods provided by the typeclass:

class List[T : SomeUnrelatedTypeclass](...) {
  fun sort() { ... x.sortsBefore(y) ... }
}

Option: Introduce introduction-site syntax to distinguish introducing a type parameter from referring to it:

class List[type T : SomeUnrelatedTypeclass](...) {
  fun sort[T : Sortable[T]]() { ... x.sortsBefore(y) ... }
}

Bad: Increases the cost to define type parameters (which happens more often than adding additional constraints later)
Bad: Poor readability.

Option: Lambda-style definition-site type intro

class List[T | T : SomeUnrelatedTypeclass](...) {
  fun sort[T : Sortable[T]]() { ... x.sortsBefore(y) ... }
}

Bad: Type intros without bounds look ugly: [T|T] (or [T|]?)

Option: Prefix operators to distinguish between defining new/referring to existing one:

class List[T :: SomeUnrelatedTypeclass]() {
  fun sort[T : Sortable[T]]() { ... x.sortsBefore(y) ... }
}

Option: Introduce referral-site syntax prefix # to distinguish referring to a type parameter from introducing one:

class List[T](...) {
  fun sort[#T : Sortable[T]]() { ... x.sortsBefore(y) ... }
}

Bad: If we use # to refer to existing types, shouldn't the line be [#T : #Sortable[#T]]?

Option: Introduce referral-site syntax self[] to distinguish referring to a type parameter from introducing one:

class List[T](...) {
  fun sort[self[T] : Sortable[T]]() { ... x.sortsBefore(y) ... }
}

Bad: Special case?

Option: Introduce referral-site syntax prefix Self:: to distinguish referring to a type parameter from introducing one:

class List[T](...) {
  fun sort[Self::T : Sortable[T]]() { ... x.sortsBefore(y) ... }
}

Bad: Special case?

Option: Curly braces for constraints?

class List[T](...) {
  fun sort{T : Sortable[T]}() { ... x.sortsBefore(y) ... }
}

Bad: Overloads {}.
Bad: Inconsistent: Cell[T : Stringable] vs. Cell[T]{T : Stringable}?

Option: Outlaw the shadowing of type parameters; if the outer context scope has a type parameter T, everything inside that scope refers to that T.

Bad: Breaks copy-pastability.

"Direct" implementation of traits

I think it would make sense to allow classes (and structs) to directly implement traits, such that

class Foo(...) {
  fun compareTo(rhs: Foo) -> Int = ...
}

impl Comparable for Foo {
  fun compareTo(other: Foo) -> Int = self.compareTo(other);
}

can be written as

class Foo(...) <keyword> Comparable {
  fun compareTo(rhs: Foo) -> Int = ...
}

In addition to that, I think it would make sense that impls – whose fun-to-be-implemented exists with the exact signature in the class – do not require writing down the function if the classes' implementation matches:

class Foo(...) {
  fun compareTo(rhs: Foo) -> Int = ...
}

impl Comparable for Foo {} // no definition of compareTo necessary.

Extend if-statements to if-expressions

Allow

let i = if false { 23; } else { 42; }

as a less imperative version of

var i = 0;
if false {
  i = 23;
} else {
  i = 42;
}

Get rid of procedure syntax

Currently we have two different kinds of functions:

fun function(...): Something { ... }

and

fun procedure() { ... }

I propose removing the second syntax ("procedure syntax") in favor of

fun procedure(): Unit { ... }

for multiple reasons:

It makes the language more regular.
I think it discourages programming without side-effects if side-effecting functions get better syntax than functions without side-effects.

Module design

As promised, here is my design proposal for modules:

Modules are intended to act as

package declarations, like package foo.bar; or similar in traditional languages
the "holders" of static functions in classes (with the effect of being able to drop the @static annotation)
the "holders" of free-standing functions

There is only one keyword module, which introduces modules of two flavors, namely as

an "open" module
a "closed" module

Outline

// file Cell.dora
module foo::bar; // 1. open module

class Cell[T](let value: T) { .... }

module Cell { // closed module
  // 2. this function would have lived as `@static fun ofOne() ...` in `class Cell`
  fun ofOne() -> Cell[Int] = Cell[Int](1);
}

// 3. a free-standing function, now namespaced in `foo::bar`
fun defaultCell() -> Cell[Int] = Cell::ofOne();

Our module foo::bar declaration (1.) at the top declares an open module.

The module declaration takes effect on everything defined in the file.
Instead of Cell living in a global namespace, it now lives under foo::bar::Cell.

Open means that users can add to this module simply by creating a new file, adding the same module declaration at the top and defining additional classes, modules and functions:

// file Person.dora
module foo::bar;
class Person(let name: String, let age: Int) { ... }

Unlike our module declaration, modules declared inside a file are closed modules.

This means that in our case (2.), the only way to add another element to our module Cell is to edit the file "Cell.dora" and add another member:

// modified file Cell.dora
module foo::bar;
...
module Cell {
  fun ofOne() -> Cell[Int] = Cell[Int](1);
  fun ofTwo() -> Cell[Int] = Cell[Int](2); // added another function
}
...

Free-standing functions (3.) look the same, but the module declaration has the same effect on them as they have on classes and modules in the file:

Given the module declaration foo::bar, the full path of defaultCell becomes foo::bar::defaultCell() – the module foo::bar is the "holder" of the free-standing function.

Reasoning

The two points important for me are a) the unification of "packages" &"modules", giving them the same rules and semantics, and b) establishing a clear separation between elements that exist "per-class" and elements that exist "per-instance" of a class.

With (closed) modules rules become very simple:

Everything within a class exists once for each instance of that class.
Everything within a module exists once.

This also eliminates much of the confusion a user might have regarding generics between static and non-static functions (type parameters are only in scope for non-static functions; a type parameters declared on a function with the same name as a type parameter on a class shadows the class' type parameter if it is a non-static function, while for a static function the type parameter of the class is simply not in scope, etc. etc.) and reduces many of the hurdles beginners face like "why can't my static function not access other functions/values despite all of them being in the same class?" (answer: they are not static).

Open Questions

how to represent in the AST?
how to deal with @static methods in traits?
free-standing functions are not strictly necessary, should we dropp or constrain them?

References

todo: lessons learned from other programming languages with similar features

Module initialization proposal

How to initialize modules?

Semantics

General

Modules are holders for static functions and values (and variables), serving as a replacement for global/free-standing/static members.
Modules are instances, they have vtables, can be passed around as values and may extend classes and support traits.

Initialization

Initialization a module means initializing its members.
Initialized modules and their values/variables are never garbage collected.
Modules should be initialized on their first use, such that loading some library does not require initializing and retaining arbitrary stuff that is kept forever, if it's not even used.

Requirements

Modules should only be initialized on first access.
Modules should only ever be initialized once.
Module members (except variables) should be able to be treated as effectively constant from an optimization POV.

Problems

Guarding every use of a module with a check whether it needs to be initialized is undesirable, because that check would exist for the rest of the application’s run at every use-site (imagine the module is accessed from within a loop, as in the example below).
Guarding every use of a module with a check is undesirable because it acts as an optimization/inlining barrier and would require these techniques to be more sophisticated to optimize through modules.

module Foo {
  let bar: String = "baz";
  fun qux(x: Int) -> Int = x * 2;
  var gof: Double = 123.45;
}

while condition {
  // Foo members (bar, gof) initialized at first use of Foo:
  somethingWith(Foo.bar); 
}

Idea

Instead of emitting code that checks for initialization at every use-site, let the uninitialized module access the zero page, trap it, trigger the initialization and resume execution.

A page fault is massively slower than a null-check, but the difference is that the page fault happens at most once and does not add code to every use-site, while the null check would need to be done at every access.

Additional thought: Pretty much all modules of the standard library should be immutable, such that instead of shipping the standard library as source code or bytecode, it should be possible to memory-map an fully initialized image of the standard library. The module design should enable this.

"not-a-number".{parseInt,parseLong} returns 0

We should probably return an Option instead.

Implement rounding of floating point numbers

This is probably a not-so-easy task given the amount of possible rounding modes and the poor support for rounding on x86, see https://gcc.godbolt.org/z/zEvearfc8.

For completeness, here is a list of signatures for possible rounding methods for Float64:

                                        C                x86                   arm64
fun roundAwayFromZero(): Float64;
fun roundToZero(): Float64;        // trunc             RC 0b11       frintz ("toward zero")
fun roundUp(): Float64;            // ceil              RC 0b10       frintp ("toward plus infinity")
fun roundDown(): Float64;          // floor             RC 0b01       frintm ("toward minus infinity")

fun roundHalfEven(): Float64;      // rint/nearbyint¹   RC 0b00       frintn ("to nearest with ties to even")
fun roundHalfOdd(): Float64;
fun roundHalfUp(): Float64;        // round¹            <magic>       frinta ("to nearest with ties to away")
fun roundHalfDown(): Float64;

working on this issue does not require implementing all of them, even a single one is fine!

¹ with FE_TONEAREST

Identify all public non-total functions and consider replacements

We should list all functions that throw exceptions on failure and think how we can make them not do that.

In some cases it might be as easy as change the return from T|throws to Option[T] or Result[T].

In other cases it might be necessary to think of how to provide the functionality in a different way (e. g. encouraging the use of iterators or higher-order functions instead of indexing into an array).

The goal of this issue is to gather the insight first, such that considerations can cover the complete list of functions in question.

The overarching motivation is to reduce the number of control flow constructs a user has to keep in mind when reading code.

Crash on armv7

Two crates needed fixing to see this :)

Running target/debug/dora-fb285658be299cc5

running 210 tests
test codegen::buffer::tests::test_backward ... ok
test codegen::buffer::tests::test_forward ... ok
test codegen::buffer::tests::test_define_label_twice ... ok
test codegen::buffer::tests::test_label_undefined ... ok
test codegen::buffer::tests::test_backward_with_gap ... ok
test codegen::buffer::tests::test_label ... ok
test codegen::buffer::tests::test_forward_with_gap ... ok
Process didn't exit successfully: `/tmp/dora-rust-master/target/debug/dora-fb285658be299cc5` (signal: 4)

Rerunning in gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb59ff410 (LWP 22446)]
0xb6fd6000 in ?? ()
(gdb) bt
#0  0xb6fd6000 in ?? ()
#1  0x7f5b5874 in dora::codegen::codegen::tests::run<i32> (code=...) at src/codegen/codegen.rs:248
#2  0x7f5b9410 in dora::codegen::codegen::tests::test_add () at <std macros>:354
#3  0x7f66aa7c in boxed::F.FnBox$LT$A$GT$::call_box::h209496025283757317 ()
#4  0x7f66d278 in sys_common::unwind::try::try_fn::h3608748770378271944 ()
#5  0x7f694ec0 in sys_common::unwind::try::inner_try::h4ca2448085e879bcx3s ()
#6  0x7f66d580 in boxed::F.FnBox$LT$A$GT$::call_box::h14964998841116667363 ()
#7  0x7f69abd4 in sys::thread::Thread::new::thread_start::h6b39f0bbcb58105dNox ()
#8  0xb6e7efbc in start_thread (arg=0xb59ff410) at pthread_create.c:314
#9  0xb6e0320c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

I was using the latest nightly on armv7 Linux.

If a nil error is raised, then show which value caused it

This could be done by retaining the path and position information.
If a nil error occurs, then the runtime could read in the file and point to the exact position in it.

Related proposal in Java: http://openjdk.java.net/jeps/8220715

Crash in typeck when calling method on Unit

Vec[Int]().push(1).length(); fails with:

thread 'main' panicked at 'neither object nor trait object: Unit', dora/src/typeck/lookup.rs:81:13
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:77
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1052
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1428
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:204
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:224
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:472
  11: rust_begin_unwind
             at src/libstd/panicking.rs:380
  12: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:334
  13: dora::typeck::lookup::MethodLookup::method
             at dora/src/typeck/lookup.rs:81
  14: dora::typeck::expr::TypeCheck::check_expr_call_method
             at dora/src/typeck/expr.rs:1341
  15: dora::typeck::expr::TypeCheck::check_expr_call
             at dora/src/typeck/expr.rs:1039
  16: <dora::typeck::expr::TypeCheck as dora_parser::ast::visit::Visitor>::visit_expr
             at dora/src/typeck/expr.rs:2149
  17: dora_parser::ast::visit::walk_stmt
             at ./dora-parser/src/ast/visit.rs:249
  18: <dora::typeck::expr::TypeCheck as dora_parser::ast::visit::Visitor>::visit_stmt
             at dora/src/typeck/expr.rs:2180
  19: dora::typeck::expr::TypeCheck::check
             at dora/src/typeck/expr.rs:43
  20: dora::typeck::check
             at dora/src/typeck.rs:39
  21: dora::semck::check
             at dora/src/semck.rs:109
  22: dora::driver::start::start
             at dora/src/driver/start.rs:47
  23: dora::run
             at dora/src/lib.rs:62
  24: dora::main
             at dora/src/main.rs:6
  25: std::rt::lang_start::{{closure}}
             at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/rt.rs:67
  26: std::rt::lang_start_internal::{{closure}}
             at src/libstd/rt.rs:52
  27: std::panicking::try::do_call
             at src/libstd/panicking.rs:305
  28: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:86
  29: std::panicking::try
             at src/libstd/panicking.rs:281
  30: std::panic::catch_unwind
             at src/libstd/panic.rs:394
  31: std::rt::lang_start_internal
             at src/libstd/rt.rs:51
  32: std::rt::lang_start
             at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/rt.rs:67
  33: main
  34: __libc_start_main
  35: _start

Drop break and continue control-flow keywords

Extracted from #39:

I'd like to get rid of those two control-flow keywords. Reasoning:

The alternative of using an additional method feels only a slightly bit clunkier to write and vastly easier to read and understand when coming back after a month.¹
In my experience the implementation complexity and the mental complexity for users has never been worth the "convenience".
There aren't much reasons for break and continue to exist in general – it feels like this is something that got copied from C and keeps getting copied without anyone questioning it much. I certainly think that if break and continue didn't exist today, we wouldn't invent it.

Prior art:

One language that did away with continue is Scala; and in the last 10 years I never heard a single complaint about it. (break in Scala is done as a library (that throws exceptions) – I rather not go this way either.)

¹ I ported Java's java.time implementation from Java to Scala and had to deal with a lot of breaks and continues, and the replacements were never too bad. Replacing breaks is almost trivial – continue is a bit more involved, but barely ever used.

The biggest trouble was always understanding what the break or continue was trying to do in the first place – a problem we won't have without break and continue in the first place.

Deal with equality and identity

Currently == is available on all types, === is only available on reference types.

It would make sense to extend the notion of ==='s "reference equality" to a more general identity operation that would work on all types.

This is especially important to properly deal with floating point numbers in generic contexts: Currently, NaN == NaN is correctly returning false, but NaN === NaN (which should return true) cannot be expressed.

Without this, it's not possible to implement simple things like a contains method that can answer whether a value is contained in some array/container/collection.

Add mechanism to inherit method without implementation

Best example is the clone method:

If a class implements it, we ideally want to enforce that each and every subclass overrides that method with its own implementation, overriding the method's return type co-variantly:

class Foo(var x: Int) {
  fun clone() -> Foo = Foo(self.x);
}
class Bar(x: Int, var y: Int) : Foo(x) {
  @override
  fun clone() -> Bar = Bar(self.x, self.y);
}

In practice, this has multiple issues:

Overriding the implementation in subclasses can simply be forgotten, because the compiler does not complain about missing it.
It's easy to forget to also refine the return type, even if the method is implemented in subclasses.

class Foo(var x: Int) {
  fun clone() -> Foo = Foo(self.x);
}
class Bar1(x: Int, var y: Int) : Foo(x) // 1.
class Bar2(x: Int, var y: Int) : Foo(x) {
  @override
  fun clone() -> Foo = Bar(self.x, self.y); // 2.
}

Using a Self type is sometimes considered to be the obvious solution to deal with problem number 2., but poses new problems:

class Foo(var x: Int) {
  fun clone() -> Self = Foo(self.x);
}
class Bar(var x: Int) {
  // inherits clone() -> Self, with Self = Bar
  // problem: inherited implementation does not return Bar, but Foo!
}

Multiple ideas:

One could turn Self types' problems into a solution: The code is broken, so report it as a compiler error and demand that the user supplies a working one.
Have the general rule that methods returning(/using?) Self do not inherit their implementation. I'm not sure it's possible to come up with a rule that is easy to specify and easy to implement and easy to verify and easy to understand, though.
Have some general @noImplementationInheritance modifier that allows inheriting the method signature, but not its implementation, of arbitrary methods.

Drop secondary constructors

Having multiple constructors tends to be a source of huge complexity when it comes to deciding whether a field has been initialized, and combined with chained constructor calls or parent constructor calls they enable writing hard to understand code.

I propose dropping secondary constructors, allowing only zero (for things like Nothing) or one primary constructor. This also discards function overloading as a language concept.

Instead, it encourages the use of static factory methods (which all call the single constructor), which have the huge benefit of allowing individual, descriptive names (while constructors are unnamed).

They are also more flexible in what they can return, as they can not only return T, but also things like Option<T> or Result <T>, while constructors can only return T, making error handling quite awkward.

impl method return type not checked against trait

trait X {
  fun m() -> Bool;
  fun n() -> Bool;
}

class CX() {
  fun m() -> Int = 0;
  fun n() -> Int = 0;
}

impl X for CX {
  fun m() -> Int = self.m();  // rejected:     good, impl checked against class
  fun n() -> Bool = self.n(); // not rejected: bad,  impl not checked against trait
}

Registering an external symbols

This feature allows interfacing with other languages much easier. I know there is loadFunction function for loading extern symbols but it's not easy to use since you need to convert your objects to long type. The easiest way to load external symbols is to use dlsym, that's what cranelift does.

#[cfg(not(windows))]
fn lookup_with_dlsym(name: &str) -> *const u8 {
    let c_str = CString::new(name).unwrap();
    let c_str_ptr = c_str.as_ptr();
    let sym = unsafe { libc::dlsym(libc::RTLD_DEFAULT, c_str_ptr) };
    if sym.is_null() {
        panic!("can't resolve symbol {}", name);
    }
    sym as *const u8
}

And this is how syntax may look for external functions:
extern fun printf(String,...) -> Int;
And this is how static values can be loaded

extern static SOMETHING: long;

Also as I understand Dora calling convention for class methods similar to C++ calling convention and maybe this can be used to interface C++ classes,example:

extern class SomeClass
{
     extern fun something() -> int;
}

FloatCmp/DoubleCmp intrinsic not handled

(5.0).compareTo(7.0)

fails with

thread 'main' panicked at 'unknown intrinsic DoubleCmp',
  dora/src/baseline/codegen.rs:2050:18

because the intrinsic is missing there.

This issue should be tackled after #115 is merged to avoid unnecessary merge conflicts.

Drop nil from the language

This proposal is based on the idea that we can remove and replace nil, which is untyped, can inhabit every reference type and therefore fail in unexpected places, with something safer.

Progress

Replace nil-producing array creation with alternatives.
Have an Option type.
Runtime can represent options as references|nullpointer.
Replace remaining nil-using code with alternatives.

The fundamental consideration is that it is much easier to disallow the entry points through which nils can occur – than to allow them, and then try and make them safe and put the genie back into the bottle.

There are two ways to get a nil in Dora – either explicitly or implicitly. Let's dissect these cases separately:

explicit nils

Explicit nils often occur as initial assignments to var in the source code and are later reassigned. Example:

var str: String = nil;
if condition {
  str = "foo";
} else {
  str = "bar";
}

Disallowing nil in such cases would require initializing them either to some "dummy" value (not that good) or to support if-expressions, such that the code above could be written as:

let str: String = if condition {
    "foo";
  } else {
    "bar";
  }

Note that this would enable us to also replace vars with lets in cases, where the binding was only mutable to allow initialization.

implicit nils

Implict nils arise from creating arrays of reference types with a bare length, like in this example:

let array = Array[String](3);

Here my proposal is to further extend our factory functions (arrayEmpty, arrayFill) and make array construction via a bare length only accessible to these factory functions, such that every case where a user might want to create an unsafe, uninitialized array is catered by a safe function instead.

This means having at least (names are ad-hoc):

fun arrayEmpty[T]() -> Array[T] as a replacement for code like let array = Array[...](0)
fun arrayFill<T>(len: Int, value: T) -> Array[T] as a replacement for code like let array = Array[...](2); array(0) = ...; array(1) = ...
fun arrayGenerate[T](len: Int, func: Int -> T) -> Array[T] as a replacement for code like let array = Array[...](2); someInitFunction(array);
fun arrayDefault[T : Default](len: Int) -> Array[T] as a replacement for getting arrays that are initialized to the T::default for that type. (Probably 0, 0.0 for numbers, "" strings (?) and user-definable for their own types.)
fun arrayZero[T : Zero](len: Int) -> Array[T], where Zero is a built-in trait that is only implemented by primitives and Option and whose core difference to Default is that unlike the latter no function to initialize the individual needs to be run, it's enough to allocate zeroed memory.

If we assume that Option is a struct and we gain a "free" Option[T] for every T where T is not a struct (as a result of dropping nil, because we can represent the first layer of Option as a null pointer), then the last function gets to shine:

class Foo(let bar: Int) {} // Foo does not impl Zero!
// But struct Option does, such that Option::zero == None

// an array just as dense as a Array[Foo] when Dora had nils, because None == <null pointer>:
val probablySomeFoos = arrayZero[Option[Foo]](5);

Rewrite x(y) to x.get(y) and x(y) = z to x.set(y, z)

After the generic syntax overhaul is done, I want to implement the syntactic rule that x(y) desugars to x.get(y) and x(y) = z to x.set(y, z) if x is not a function.

This allows us to not only recover "nice" syntax for arrays ...

  let e = foo.x.get(2);        // replace with let e = foo.x(2);
  foo.x.set(3, 456);          // replace with foo.x(3) = 456;

... but also allows us to use this syntax for lists, maps, strings, etc.

let firstElem = someList(0);
let secondByte = someString(1);
let capital = countriesAndCapitalsMap("Austria"); // "Vienna"

From a typechecking perspective the idea is to probably introduce a Getter and Setter trait that communicate these possibilities. An alternative would be to do a structural check, which might be more flexible.

Implement {Int,Long,Float,Double,Bool}#toStringHexadecimal ...

... similar to {Int,Long,Float,Double,Bool}#toStringBinary (see #121).

Plans for a minor release?

I don't really know what your goals for this project are, but what do you believe is required to release a minor version?

I'm fine if you don't care about this at all, my personal approach is "language quality and language popularity – pick one", so I'd be perfectly fine with keeping this language at ~2 users/contributors. :-)

But if you intend to do a release in the future, would you be interested in brainstorming and identifying what we think we need to ship a 0.0.x release? Maybe also considering none-core things like a website, documentation or IDE support (LSP)?

Implement operators like +=, -=, etc.

@dinfuehr is able to provide further details on this (I guess? :-D)

Dora not compilling on Rust 1.29 nightly

I need help, can't compile dora on nightly toolchain

 rustc --version 
rustc 1.29.0-nightly (960f6046c 2018-07-08)

true.not(): thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0'

This should be the same as !true (which works).

Running tests work, debugging them fails

It's not limited to any specific test.

Here is the stacktrace and video from trying to debug one arbitrary test:

thread 'semck::clsdefck::tests::test_class_definition' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:1165:5
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:76
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:60
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1028
   5: std::io::Write::write_fmt
             at /rustc/8431f261dd160021b6af85916f161a13dd101ca0/src/libstd/io/mod.rs:1412
   6: std::io::impls::<impl std::io::Write for alloc::boxed::Box<W>>::write_fmt
             at src/libstd/io/impls.rs:141
   7: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:64
   8: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   9: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:196
  10: std::panicking::default_hook
             at src/libstd/panicking.rs:207
  11: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:473
  12: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:380
  13: rust_begin_unwind
             at src/libstd/panicking.rs:307
  14: core::panicking::panic_fmt
             at src/libcore/panicking.rs:84
  15: core::result::unwrap_failed
             at src/libcore/result.rs:1165
  16: core::result::Result<T,E>::unwrap
             at /rustc/8431f261dd160021b6af85916f161a13dd101ca0/src/libcore/result.rs:933
  17: dora::test::parse_with_errors
             at dora/src/test.rs:61
  18: dora::semck::tests::ok
             at dora/src/semck.rs:403
  19: dora::semck::clsdefck::tests::test_class_definition
             at dora/src/semck/clsdefck.rs:361
  20: dora::semck::clsdefck::tests::test_class_definition::{{closure}}
             at dora/src/semck/clsdefck.rs:360
  21: core::ops::function::FnOnce::call_once
             at /rustc/8431f261dd160021b6af85916f161a13dd101ca0/src/libcore/ops/function.rs:227
  22: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/8431f261dd160021b6af85916f161a13dd101ca0/src/liballoc/boxed.rs:922
  23: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:80
  24: std::panicking::try
             at /rustc/8431f261dd160021b6af85916f161a13dd101ca0/src/libstd/panicking.rs:271
  25: std::panic::catch_unwind
             at /rustc/8431f261dd160021b6af85916f161a13dd101ca0/src/libstd/panic.rs:394
  26: test::run_test_in_process
             at src/libtest/lib.rs:1626
  27: test::run_test::run_test_inner::{{closure}}
             at src/libtest/lib.rs:1504
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Sort out parsing of negative number literals

I believe that

(-1.0).isNan()

really shouldn't need the parentheses.

Drop @static annotations

Requires resolving #53 and making sure that we have a viable alternative in every case (especially for traits).

Implement additional bit twiddling intrinsics

(This is probably a good beginner issue.)

There are a few operations that operate on integers that can be implemented manually, but could profit a lot from directly using the CPU's specially-built instruction for it.
All operations listed here have a direct counterpart in assembly.

The goal is to provide a method for these operations that make use of the special CPU instruction:

POPCNT — Return the Count of Number of Bits Set to 1 (#131)
LZCNT — Count the Number of Leading Zero Bits (#131)
TZCNT — Count the Number of Trailing Zero Bits (#131)
ROL, ROR — Rotate Left/Rotate Right (#138)
SHLX, SHRX, SARX — Shift without doing stupid shit (#144)

For each of these instructions the following steps can be taken:

Add the method to Int.dora/Long.dora and mark it as @internal.
Add the method to the list of intrinsic functions in prelude.rs.
From there, you can use existing functions as an example of where and how to wire things up in codegen.rs.
Write tests to verify that the function works as expected. Make sure to identify and cover corner cases as well!

Bit Manipulation Instruction Sets gives a good overview of additional operations that could be of interest.

Panic: 'fp-register accessed as gp-register'

fun main() {

  // works
  0.0 == 0.0;
  // BUG: thread 'main' panicked at 'fp-register accessed as gp-register.', 
  0.0.equals(0.0);
  // works
  0.0 === 0.0;
  // BUG: thread 'main' panicked at 'fp-register accessed as gp-register.', 
  0.0.identicalTo(0.0);
}

Incorrect floating point comparisons involving NaN

I believe all of these asserts should be correct, as all comparisons involving NaN should return false

    let nan = 0.0 / 0.0;
    assert(!(nan < 0.0)); // fail
    assert(!(nan > 0.0));
    assert(!(nan < nan)); // fail
    assert(!(nan > nan));

The first and the third fail though.

Report file names on errors

One very important idea would be to report file names for errors.

In the past this wasn't necessary since there was only one file possible.

Reproduce initialization issue in Throwable

Mentioned in: #152 (comment)

Problem appears when trying to replace

  var backtrace: Array[Int] = nil;
  var elements: Array[StackTraceElement] = nil;

with empty arrays as in https://github.com/dinfuehr/dora/pull/128/files#diff-fdd12e40f618e7564137f156b2f8ba69.

It seems that the variables are not initialized on instance creation.

Consistent naming for types

Currently types are written with an uppercase letter at the start, with the exception of primitive types.

I propose that all types should start with an uppercase letter for increased consistency.

This means aligning the naming of primitive types and renaming int, long etc. to Int, Long etc. A diverse set of languages ranging from Haskell, Scala, Kotlin, Ceylon and others have shown that this increased consistency comes without drawbacks.

Additionally, with special types like Unit, Nothing it's not immediately clear whether they should be considered primitive or reference types (let alone structs), so having a consistent naming scheme allows us to not worry about awkward casing choices and avoids requiring in elaborate language lawyering to explain the casing decision.