shadelessfox / lang Goto Github PK
View Code? Open in Web Editor NEWA toy language I'm making in my spare time.
A toy language I'm making in my spare time.
Currently RuntimeFunction
can only have a fixed set of user-defined arguments and, in contrast, NativeFunction
can have arbitrary amount of arguments, but because of that it's lacking sanity checks for passed arguments before calling its prototype.
Function
class...
in Java?)Currently adding a native function is a painful and error-prone process:
lang/src/main/java/com/shade/lang/Launcher.java
Lines 108 to 113 in 0ea3e03
Implement API that will support registering native functions in way like this:
@Function(name = "panic", arguments = {"condition", "message"})
public void panic(Machine machine, ScriptObject condition, ScriptObject message) {
...
}
An unwrap (?
) operator should return early from the function if supplied expression is resulted in none
. This could help avoiding writing if x != none
everywhere.
Example:
def read_contents() {
let file = File.open('test.bin')?; # may return early if File.open resulted in `none`
let data = file.read()?; # may return early if File.read resulted in `none`
return data;
}
Alternate syntax: The try
keyword can be used for unwrapping as an unary operator.
This type of operator is suitable for accessing array elements by index, or dictionary values by its keys. Although this is not very useful because of the lack of any collections in the language itself, it will allow to implement such in the future.
Proposed syntax is:
expression[expression]
Range syntax (#16) is also can be reused to allow subscripting the range of element:
expression[expression .. expression]
Currently we can only make qualified import, e.g. import module to access its members. We can avoid typing module.attribute
by allowing item import, to import only specified set of members of target module. Consider the following example:
def main() {
import builtin;
builtin.println('hello');
builtin.panic('bad mood', true);
}
Of course we can alias module or introduce local variable, but it would be nicer to do the following:
def main() {
import builtin { println, panic };
println('hello');
panic('bad mood', true);
}
Nerd note: It would also be faster in contrast with regular import, because we're not required to make lookup of desired attribute.
It's nice to have all panic messages and other error messages in plain English, but it would be nice to be translatable into foreign languages.
Proposal: Extract all the strings from source codes store them within ResourceBundle
, then lookup when needed.
As the title says, need to add support for classes declaration and its instantiation among with bound functions.
Currently all conditional expressions (such as: 1 == 1
) simply results in 1
in case of success, and 0
in case of failure. This is not ideal, so we need implement type boolean that consists of two states: true
and false
.
Currently Assembler
class is filled with bunch of useful and not very useful things that are not related to emitted assembly itself: source location information for tracing/debugging and guard regions. These things can be moved to Context
class.
Currently compiler-related code depends on runtime-related code and vice-versa. They must not know anything about each other.
The big deal is that compiler creates and popularizes classes, functions and modules that is part of runtime and because of this #4 is hard to implement properly. Instead compiler should only emit IR of a file (module) that will be loaded and interpreted by the interpreter itself, without mutual dependence.
Currently only infinite and conditional loops are supported. Of course it's enough to make things like:
let index = 0;
loop while index < 100 {
...
index += 1;
}
but with range-based loops this can be shortened to:
for index in 0..100 {
...
}
Add support for implementing optimizers for AST or emitted bytecode.
This may affect the immutability of all existing AST nodes because of self-transformation requirement. There's workaround by not modifying nodes but creating a new, but this can have an impact on the performance and memory consumption.
We've implemented basic API for writing native functions without the need to assign attributes and other stuff manually in #27.
This API is still bad because we don't support automatic type matching and the bridge mechanism itself uses Java Reflection API that is not very performance-friendly. Also error reporting is ugly and sometimes confuses debugger.
Use ASM library to automatically generate bridge methods on-the-fly during module initialization. We can embed early method signature verification, runtime type checks, casting and unwrapping into these bridge methods.
Add VM flag ash.debug
to enable debug logging to easily trace error causes.
Implement mechanism to recovery from panic.
Implement array type as a native value type.
ArrayType
with index operator (#19)[1, 2, 3]
?)We need some type that will represent an invalid value or something like that. This is suitable for functions because currenly we're always returning 0
from functions if no user-defined return
statement was specified.
Fallback invocation of binary operators for non-primitive types to its instance special functions, e.g. a + b
into a.<add>(b)
Currently Machine
is an interpreter itself. By the nature of multithreading, each thread must run its own interpreter, so we need to distinguish between Machine
and Interpreter
.
A Machine
is a singleton instance of language's virtual machine that manages all resources and threads.
An Interpreter
must execute code chunks and must not know about environment it executes in.
So single Machine
may have multiple Interpreters
running in separate threads to achieve multithreading.
Add support for unicode \u{FFFF}
and hexadecimal \x{FF}
character inscapes inside string.
Implement debugging tools for interpreter
Implement lambda functions that can capture variables (locals) from outer scope.
Reuse syntax of standard function declaration but without name:
let add = def(a, b) { return a + b; }
assert add(5, 7) == 12;
Also since lambdas are expressions, IIFE (Immediately-invoked function expression) is supported:
(def() { import builtin; return builtin.print; })('hello');
Enhance tokenizer to recognize floating-point numbers, scientific notation and digit separator.
Allow to specify numbers in source code using base of 2, 8 and 16.
Proposal:
0b
for base 2 (binary), 0o
for base 8 (octal), 0x
for base 16 (hexadecimal) respectively.Add fancy CLI interface with REPL shell
Currently there's only one operator that can be combined with not
: is
. By supporting combinations, instead of having
if not (value is Class) { ... }
we can achieve more English-friendly sentence:
if value is not Class { ... }
Add support for string interpolation, e.g. expressions inside string literals: 'hello, \{username}'
.
A bytecode verification feature was added in aa6aab1, but the code we're emitting for try-catch
statements (#22, a9e3cbe) is not valid. Consider following code snippet:
try {
return 1;
} finally {
return 2;
}
It will generate following bytecode:
00000000 01 00 00 PUSH 2
00000003 19 RETURN
00000004 01 00 01 PUSH 1
00000007 19 RETURN
00000008 01 00 00 PUSH 2
0000000b 19 RETURN
0000000c 01 00 00 PUSH 2
0000000f 19 RETURN
00000010 07 00 GET_LOCAL 0
00000012 01 00 02 PUSH none
00000015 1d CMP_EQ
00000016 1b 00 03 JUMP_IF_TRUE 3 (#L01)
00000019 07 00 GET_LOCAL 0
0000001b 2a THROW
Notice how weird it is and how many times the body of finally
is repeated. The verifier itself is complaining about jump at 16
that is pointing literally nowhere (beyond the chunk itself), this should not happen.
The try-finally
statement is useful in situations where there's need to cleanup resources after they were used.
For example: a socket was opened and read()
was called inside try
block. If panic was happened, socket will stay opened, but with finally
block socket can be closed despite the fact if function was completed abruptly or not. Also it can be combined with recover
, making combinations such as try-recover-finally
possible.
Currently only loaded modules can be imported.
We need to implement module import & loading from path, e.g.
import std.path;
will lookup path from registered roots inside interpreter that corresponds to
{root}/std/path[.ext]
and then parse and load it into interpreter.
This also means that:
a.b.c
Currently all attributes are mutable without ability to make them immutable. Immutable attributes is useful for builtin types with predefined attributes, such as length
of string.
Currently for-loop
's range (start..end
) is limited to integer number literals only due to explicit ascending/descending check at compile-time:
lang/src/main/java/com/shade/lang/parser/Parser.java
Lines 149 to 156 in eb35c29
Range
class that will store the starting and ending indices.Iterator
class to be used as an interface by other classes allowing iterating through them.Implement caching for code chunks into intermediate binary format to avoid excess parsing for better performance after first execution.
This will also require verifying the loaded bytecode and the loader mechanism itself.
Add assignments statement operators like +=
, -=
, *=
, /=
and etc.
Add loops and range statements like while
and for
.
Currently after recovering from panic we cannot continue it, e.g. throw it further.
We can implement rethrow
statement only to be able to continue panicking, or implement throw
statement to support throwing with custom objects.
Currently the only executable unit is the function. We can allow executing bytecode at module level to:
MAKE_FUNCTION %const(name) %const(chunk)
MAKE_CLASS %const(name) %imm8(basenum)
In #29 we've added a try
operator that propagates none
value of expression further to caller if it is evaluated to none
. Because this is an unary operator and parser forbids chaining without evaluating primary expression (such as (expr)
), it is impossible to make stuff like try try try fail1().fail2().fail3()
(for a good reason). Also by reusing try
keyword we're making parsing harder, because it is the starting keyword of both try-recover-finally
statement and try-unwrap
expression.
We should replace the prefix try
with postfix ?
that comes after expression it should unwrap: fail1()?.fail2()?.fail3()
Add labels that can be used with continue
and break
inside loops to allow breaking outer loops.
outer:
loop {
loop {
break outer;
}
}
Currently parser will fail after encountering any syntax error. We can make it try to recover from invalid state and continue parsing, collecting all the syntax errors to be reported later.
Note: This will primarily help to lighten native types.
Extensions, also known as traits in some languages, can help with extending existing types with user-defined functionality.
For example, let's take native value called StringValue
. Currently all functions and attributes must be added to it via native interface without ability to make it at runtime because all native types are immutable by default. But with extensions it is possible to provide basic native interface for StringValue
that will feature, for example, only length
property and index
access to its contents and all std-related functions such as lower
, upper
, trim
and etc can be implemented by the user itself.
class String {
constructor(self, value) {
self.value = value;
}
}
extension String {
def trim(self) { ... }
def upper(self) { ... }
def lower(self) { ... }
}
import string;
import string_ext; # Add special explicit syntax to use extensions?
def main() {
let str = new String('hello');
assert str.upper() == 'HELLO';
}
Add context-dependent keywords that can be either keywords or symbols.
Currently we only have a syntax to call super class' constructor (#9, a6a2769) and user is allowed to overload any super class's functions by replacing it, making calling original implementation impossible.
class A {
def get_value(self) { return 5; }
}
class B : A {
def get_value(self) { super.get_value() + 7; }
}
Currently every AST node has its own implementation of method compile
that emits some bytecode. Also there's accept
method in Node
class that allows it to be transformed into another node with new operands which is used during AST optimization. We can reuse this approach and emit bytecode using specialized visitor, therefore cleaning up AST-related model.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.