shadelessfox / lang Goto Github PK

View Code? Open in Web Editor NEW

2.0 3.0 0.0 537 KB

A toy language I'm making in my spare time.

Java 93.10% AGS Script 6.90%

lang's People

Contributors

Stargazers

Watchers

lang's Issues

Add variadic-length function arguments

Currently RuntimeFunction can only have a fixed set of user-defined arguments and, in contrast, NativeFunction can have arbitrary amount of arguments, but because of that it's lacking sanity checks for passed arguments before calling its prototype.

Proposal

Add 'is variadic' flag to Function class
Add special syntax to mark argument as variadic (like ... in Java?)

Better API for writing native functions

Currently adding a native function is a painful and error-prone process:

lang/src/main/java/com/shade/lang/Launcher.java

Lines 108 to 113 in 0ea3e03

 setAttribute("panic", new NativeFunction(this, "panic", (machine, args) -> { 

 String message = (String) ((Value) args[0]).getValue(); 

 boolean recoverable = (Integer) ((Value) args[1]).getValue() != 0; 

 machine.panic(message, recoverable); 

 return null; 

 }));

Proposal

Implement API that will support registering native functions in way like this:

@Function(name = "panic", arguments = {"condition", "message"})
public void panic(Machine machine, ScriptObject condition, ScriptObject message) {
    ...
}

Add unwrap operator

An unwrap (?) operator should return early from the function if supplied expression is resulted in none. This could help avoiding writing if x != none everywhere.

Example:

def read_contents() {
    let file = File.open('test.bin')?; # may return early if File.open resulted in `none`
    let data = file.read()?; # may return early if File.read resulted in `none`
    return data;
}

Alternate syntax: The try keyword can be used for unwrapping as an unary operator.

Add subscript operator

This type of operator is suitable for accessing array elements by index, or dictionary values by its keys. Although this is not very useful because of the lack of any collections in the language itself, it will allow to implement such in the future.
Proposed syntax is:

expression[expression]

Range syntax (#16) is also can be reused to allow subscripting the range of element:

expression[expression .. expression]

Allow to specify item for import statement

Currently we can only make qualified import, e.g. import module to access its members. We can avoid typing module.attribute by allowing item import, to import only specified set of members of target module. Consider the following example:

def main() {
    import builtin;
    builtin.println('hello');
    builtin.panic('bad mood', true);
}

Of course we can alias module or introduce local variable, but it would be nicer to do the following:

def main() {
    import builtin { println, panic };
    println('hello');
    panic('bad mood', true);
}

Nerd note: It would also be faster in contrast with regular import, because we're not required to make lookup of desired attribute.

Text internalization

It's nice to have all panic messages and other error messages in plain English, but it would be nice to be translatable into foreign languages.

Proposal: Extract all the strings from source codes store them within ResourceBundle, then lookup when needed.

Add classes

As the title says, need to add support for classes declaration and its instantiation among with bound functions.

Add boolean type

Currently all conditional expressions (such as: 1 == 1) simply results in 1 in case of success, and 0 in case of failure. This is not ideal, so we need implement type boolean that consists of two states: true and false.

Debloat assembler

Currently Assembler class is filled with bunch of useful and not very useful things that are not related to emitted assembly itself: source location information for tracing/debugging and guard regions. These things can be moved to Context class.

Separate compiler and interpreter

Currently compiler-related code depends on runtime-related code and vice-versa. They must not know anything about each other.
The big deal is that compiler creates and popularizes classes, functions and modules that is part of runtime and because of this #4 is hard to implement properly. Instead compiler should only emit IR of a file (module) that will be loaded and interpreted by the interpreter itself, without mutual dependence.

Implement range-based loops

Currently only infinite and conditional loops are supported. Of course it's enough to make things like:

let index = 0;
loop while index < 100 {
    ...
    index += 1;
}

but with range-based loops this can be shortened to:

for index in 0..100 {
    ...
}

Compiler optimizations

Add support for implementing optimizers for AST or emitted bytecode.

This may affect the immutability of all existing AST nodes because of self-transformation requirement. There's workaround by not modifying nodes but creating a new, but this can have an impact on the performance and memory consumption.

Better interoperability API

We've implemented basic API for writing native functions without the need to assign attributes and other stuff manually in #27.
This API is still bad because we don't support automatic type matching and the bridge mechanism itself uses Java Reflection API that is not very performance-friendly. Also error reporting is ugly and sometimes confuses debugger.

Proposal:

Use ASM library to automatically generate bridge methods on-the-fly during module initialization. We can embed early method signature verification, runtime type checks, casting and unwrapping into these bridge methods.

http://asm.ow2.io/

Add debug logging

Add VM flag ash.debug to enable debug logging to easily trace error causes.

Add logging to Optimizer (trace all transformers being executed and their result)
Add logging to Assembler (trace generated bytecode, guards, constants)
Add logging to Machine itself (trace loading modules, import resolution, panic & stack unwinding)

Panic error recovery

Implement mechanism to recovery from panic.

Add arrays

Implement array type as a native value type.

Proposal

Implement ArrayType with index operator (#19)
Add syntax for making user-defined array literals (maybe [1, 2, 3]?)

Add none type

We need some type that will represent an invalid value or something like that. This is suitable for functions because currenly we're always returning 0 from functions if no user-defined return statement was specified.

Fallback binary operators invocation to function calls

Fallback invocation of binary operators for non-primitive types to its instance special functions, e.g. a + b into a.<add>(b)

Multithreading support

Currently Machine is an interpreter itself. By the nature of multithreading, each thread must run its own interpreter, so we need to distinguish between Machine and Interpreter.

A Machine is a singleton instance of language's virtual machine that manages all resources and threads.
An Interpreter must execute code chunks and must not know about environment it executes in.

So single Machine may have multiple Interpreters running in separate threads to achieve multithreading.

Unicode & Hexadecimal escapes in strings

Add support for unicode \u{FFFF} and hexadecimal \x{FF} character inscapes inside string.

Debugger suite

Implement debugging tools for interpreter

Add lambda functions

Implement lambda functions that can capture variables (locals) from outer scope.

Reuse syntax of standard function declaration but without name:

let add = def(a, b) { return a + b; }
assert add(5, 7) == 12;

Also since lambdas are expressions, IIFE (Immediately-invoked function expression) is supported:

(def() { import builtin; return builtin.print; })('hello');

Enhance number literals parsing

Enhance tokenizer to recognize floating-point numbers, scientific notation and digit separator.

Add binary, octal and hexadecimal number literals

Allow to specify numbers in source code using base of 2, 8 and 16.

Proposal:

Introduce prefixes 0b for base 2 (binary), 0o for base 8 (octal), 0x for base 16 (hexadecimal) respectively.
(Optional): Introduce hexadecimal floating-point literals

REPL/CLI interface

Add fancy CLI interface with REPL shell

Support combining 'not' keyword with other operators

Currently there's only one operator that can be combined with not: is. By supporting combinations, instead of having

if not (value is Class) { ... }

we can achieve more English-friendly sentence:

if value is not Class { ... }

String interpolation

Add support for string interpolation, e.g. expressions inside string literals: 'hello, \{username}'.

Bytecode verifier is failing with try-catch statements

A bytecode verification feature was added in aa6aab1, but the code we're emitting for try-catch statements (#22, a9e3cbe) is not valid. Consider following code snippet:

try {
    return 1;
} finally {
    return 2;
}

It will generate following bytecode:

00000000 01 00 00 PUSH         2
00000003 19       RETURN       
00000004 01 00 01 PUSH         1
00000007 19       RETURN       
00000008 01 00 00 PUSH         2
0000000b 19       RETURN       
0000000c 01 00 00 PUSH         2
0000000f 19       RETURN       
00000010 07 00    GET_LOCAL    0
00000012 01 00 02 PUSH         none
00000015 1d       CMP_EQ       
00000016 1b 00 03 JUMP_IF_TRUE 3 (#L01)
00000019 07 00    GET_LOCAL    0
0000001b 2a       THROW

Notice how weird it is and how many times the body of finally is repeated. The verifier itself is complaining about jump at 16 that is pointing literally nowhere (beyond the chunk itself), this should not happen.

Add try-finally statement

The try-finally statement is useful in situations where there's need to cleanup resources after they were used.
For example: a socket was opened and read() was called inside try block. If panic was happened, socket will stay opened, but with finally block socket can be closed despite the fact if function was completed abruptly or not. Also it can be combined with recover, making combinations such as try-recover-finally possible.

Better module importing

Currently only loaded modules can be imported.

We need to implement module import & loading from path, e.g.

import std.path;

will lookup path from registered roots inside interpreter that corresponds to

{root}/std/path[.ext]

and then parse and load it into interpreter.

This also means that:

A support for fully-qualified symbols must be added: a.b.c
Root registration functionality must be added to the public API.

Immutable attributes

Currently all attributes are mutable without ability to make them immutable. Immutable attributes is useful for builtin types with predefined attributes, such as length of string.

Add Iterator object

Currently for-loop's range (start..end) is limited to integer number literals only due to explicit ascending/descending check at compile-time:

lang/src/main/java/com/shade/lang/parser/Parser.java

Lines 149 to 156 in eb35c29

 // TODO: This will fail because number can be either integer or float. 

 // This could be fixed in the future with addition of range objects 

 // that will assert type at run-time, but currently it will just explode 

 // if floating-point value was used. 

 int rangeBegin = (Integer) expect(Number).getNumberValue(); 

 boolean rangeInclusive = expect(Range, RangeInc).getKind() == RangeInc; 

 int rangeEnd = (Integer) expect(Number).getNumberValue(); 

 boolean rangeDescending = rangeEnd < rangeBegin;

As of #9, we now have objects, so we can make range an object and compute its bounds at run-time.

Proposal

Implement Range class that will store the starting and ending indices.
Implement Iterator class to be used as an interface by other classes allowing iterating through them.

Incremental code caching

Implement caching for code chunks into intermediate binary format to avoid excess parsing for better performance after first execution.

This will also require verifying the loaded bytecode and the loader mechanism itself.

Add assignments operators

Add assignments statement operators like +=, -=, *=, /= and etc.

Add loops & range statements

Add loops and range statements like while and for.

Allow to continue panic after handling it

Currently after recovering from panic we cannot continue it, e.g. throw it further.

We can implement rethrow statement only to be able to continue panicking, or implement throw statement to support throwing with custom objects.

Top-level bytecode execution

Currently the only executable unit is the function. We can allow executing bytecode at module level to:

Create functions at run-time rather than at compile-time
Create classes at run-time and resolve inheritance at run-time rather than at compile-time (this is a very bad limitation)
Resolve imports during execution rather than during module loading
Allow arbitrary statements to be placed at top-level (will introduce cycling include problem) ???

Proposal:

Introduce two following opcodes to create functions and classes at run-time (1, 2):
1.1. MAKE_FUNCTION %const(name) %const(chunk)
1.2. MAKE_CLASS %const(name) %imm8(basenum)
Make module emit executable code (3)
Provide API to execute only one call frame (and children call frames) and return its result

Replace the 'try-unwrap' operator with '?'

In #29 we've added a try operator that propagates none value of expression further to caller if it is evaluated to none. Because this is an unary operator and parser forbids chaining without evaluating primary expression (such as (expr)), it is impossible to make stuff like try try try fail1().fail2().fail3() (for a good reason). Also by reusing try keyword we're making parsing harder, because it is the starting keyword of both try-recover-finally statement and try-unwrap expression.

Proposal:

We should replace the prefix try with postfix ? that comes after expression it should unwrap: fail1()?.fail2()?.fail3()

Loop labels

Add labels that can be used with continue and break inside loops to allow breaking outer loops.

Example:

outer:
loop {
    loop {
      break outer;
    }
}

Recoverable parsing

Currently parser will fail after encountering any syntax error. We can make it try to recover from invalid state and continue parsing, collecting all the syntax errors to be reported later.

Extension for classes

Note: This will primarily help to lighten native types.

Extensions, also known as traits in some languages, can help with extending existing types with user-defined functionality.
For example, let's take native value called StringValue. Currently all functions and attributes must be added to it via native interface without ability to make it at runtime because all native types are immutable by default. But with extensions it is possible to provide basic native interface for StringValue that will feature, for example, only length property and index access to its contents and all std-related functions such as lower, upper, trim and etc can be implemented by the user itself.

Example

File 'string.ash':

class String {
    constructor(self, value) {
        self.value = value;
    }
}

File 'string_ext.ash':

extension String {
    def trim(self) { ... }
    def upper(self) { ... }
    def lower(self) { ... }
}

File 'test.ash':

import string;
import string_ext; # Add special explicit syntax to use extensions?

def main() {
    let str = new String('hello');
    assert str.upper() == 'HELLO';
}

Add soft keywords

Add context-dependent keywords that can be either keywords or symbols.

Call super class' overridden functions

Currently we only have a syntax to call super class' constructor (#9, a6a2769) and user is allowed to overload any super class's functions by replacing it, making calling original implementation impossible.

Draft:

class A {
    def get_value(self) { return 5; }
}

class B : A {
    def get_value(self) { super.get_value() + 7; }
}

Problems:

Currently an instance stores bound leaf attributes, so we will need to lookup super class and call its function with current class' instance as an owner.
Need a solution for calling function when multiple base classes are present.

Use visitor pattern for compiler

Currently every AST node has its own implementation of method compile that emits some bytecode. Also there's accept method in Node class that allows it to be transformed into another node with new operands which is used during AST optimization. We can reuse this approach and emit bytecode using specialized visitor, therefore cleaning up AST-related model.

	setAttribute("panic", new NativeFunction(this, "panic", (machine, args) -> {
	String message = (String) ((Value) args[0]).getValue();
	boolean recoverable = (Integer) ((Value) args[1]).getValue() != 0;
	machine.panic(message, recoverable);
	return null;
	}));

	// TODO: This will fail because number can be either integer or float.
	// This could be fixed in the future with addition of range objects
	// that will assert type at run-time, but currently it will just explode
	// if floating-point value was used.
	int rangeBegin = (Integer) expect(Number).getNumberValue();
	boolean rangeInclusive = expect(Range, RangeInc).getKind() == RangeInc;
	int rangeEnd = (Integer) expect(Number).getNumberValue();
	boolean rangeDescending = rangeEnd < rangeBegin;

shadelessfox / lang Goto Github PK

lang's People

Contributors

Stargazers

Watchers

lang's Issues

Proposal

Proposal

Proposal:

Proposal

Proposal

Proposal:

Proposal:

Example:

Example

File 'string.ash':

File 'string_ext.ash':

File 'test.ash':

Draft:

Problems:

Recommend Projects

Recommend Topics

Recommend Org