fascinatedbox / lily Goto Github PK

View Code? Open in Web Editor NEW

1.1K 33.0 38.0 11.62 MB

Interpreted language focused on expressiveness and type safety.

Home Page: http://lily-lang.org

License: MIT License

C 96.03% Python 0.71% CMake 0.65% C++ 1.21% Lua 0.72% Ruby 0.67%

embeddable interpreter language lily scripting-languages

lily's People

Contributors

Stargazers

Watchers

lily's Issues

Need a rich comparator for all classes

This is largely to allow for object == object. But this will also allow lists to compare by value, instead of by reference (because [1] != [1] makes no sense to me). This will need to have some sort of strategy for protecting against circular references.

Internal types need a rewrite WRT inner value handling

Here's how things are done right now:

Objects carry a signature and their inner value, but no flags. This is why objects check for 'object->sig' before 'object->sig->cls->is_refcounted'. This is a problem because everything else already has a signature made for it.
Lists carry flags and a value. Things like subscript and subscript assign are complicated because they don't have a register into which to put their values. If they did, subscript assignment could be looked at as a generic assignment from a list value to a right value. Subscript is that but in reverse (from list, to the other register).
Hashes have flags, but they're for the value. The key has no flags, because the key is not allowed to be nil. Again, subscript and subscript assign would be easy, but there's no registers to work with.

Put simply, the problem is that code reuse is impossible, which is a shame because there are a lot of common things that these types do.

The solution is as follows:

lily_value will become something else, like lily_value_union.
lily_vm_register will become the new lily_value
A new function will be made called generic_assign, or something similar. This will handle assigning from one lily_value to another.
Objects will now have a lily_value inside them, and use the SYM_IS_NIL flag to check if values are nil, like other type handling does.
Lists will now have an array of lily_value inside them. This will allow list subscript and subscript assign to use generic assignment.
Hash elements will now have a lily_value elem_key, and a lily_value elem_value.

Why this needs to be done:

Code reuse. Lots and lots of code reuse. This means that there will be fewer bugs that are triggered in strange situations.
Fewer headaches trying to reason about subscript assignment and subscript. These two things have been a huge problem since lists were introduced.
I would like to see o_get_const to set a flag (which will be called VAL_IS_PROTECTED) when a value is loaded (such as a literal string). The vm would check for either this or SYM_IS_NIL when deciding if something should be ref'd/deref'd. This is important both for reducing unnecessary ref/deref calls, as well as allowing (in the future) multiple threads to share the pool of literals without worrying about a thread race from unnecessary ref/deref calls. This would come at hardly any performance cost, since SYM_IS_NIL is already checked a lot.
Future builtin classes would have to do some gymnastics or include extra stuff for doing common functions, but with flag+value instead of a register.

Lambdas via {|args| stmt }

I've changed my mind as to how I want to do this since I first opened this bug. To begin with, I think the language should adopt support for anonymous blocks which would start with { and end with }. An anonymous block would be a series of statements (so if and the like would be allowed), with a known single result.

I like the idea of lambdas not specifying the types of their arguments when possible, but requiring those types in cases where no determination can be made about the types of the arguments.

I'm also coming to like Ruby's syntax for lambdas: {|args| => expression}. Or at least the idea is borrowed from Ruby. In the case of a lambda, only one expression would be allowed (and it would be the value returned if a value is needed). Multi-line lambdas would work by making the expression a block (I believe that Java or maybe Scala does something to this effect), with the block being able to do all of the nice stuff like if's, loops, etc.

This requires a few things to be done, however:

The emitter needs a way to say something like "Attempt to emit this so I can figure out what type you want (for the lambda's args). When you're done, rewind the code state".
The ast needs to stop using ast->result for values except for tree_local_var. This prevents the emitter from running an expression twice.
The parser/emitter needs a way to create new callable blocks without a name (but with some sort of uniqueness, like 'lambda_%d' % counter.

Add a file class with basic file operations.

The interpreter is in need of a file class with some basic file operations. This would allow print and printfmt to finally start being phased out in favor of some actual IO abilities. Here's what it could start off with:

(static) function file::open(str filename, str mode): file
function file::close():nil
function file::write(str text):nil
function file::readlines():list[str]

Obviously, this would just be a starting point. The main point of this is to get a cool file class started so Lily can be more useful.

== and != for list and hash.

lists and hashes that have the same type should be comparable using '==' and '!=' . This would be a comparison by pointer, not by value.

str::trim(str self):str

This would remove whitespace from both sides of a given string, returning the result. This shouldn't be too hard, because there's already an lstrip and an rstrip to work with.

Strings should be subscriptable

What the title says. A string should be subscriptable (and maybe subscript assignable too). Using Python as a model, a subscript of a string should return a string. If the string has a utf-8 sequence, then the returned string should have that entire sequence (rather than returning a single utf-8 sequence point).

Strings will be adjusted to carry an 'is_ascii' part. If a string has no utf-8 parts, then subscripts can be done as a single simple index. Otherwise, subscripts will need to loop over the string to make sure that they return the entire utf-8 sequence if they need to.

Allow static hash creation.

Lists can currently be created in a static manner, such as this:

list[integer] l = [1, 2, 3]

but the same is not available for hashes. Hash creation would need key and value pairs, rather than plain values. These pairs would be split up by arrows. It would look something like this:

hash[str, integer] h = ["a" => 123, "b" => 456, "c" => 789]

Hashes are a bit complicated though, because duplicate keys can exist in a static hash. In this case, the interpreter will pick the right-most value given. This is similar to how Python, Ruby, and Lua all handle this sort of situation. Additionally, static hashes must ensure that the key given does not default to object (objects are not valid keys), and that it is itself a valid key type. The values should be able to default to object though.

Unary fails with deep subscripts, calls

Note: The fix seems to be wrapping the absorb merge in unary with saving the active and restoring the active. Absorb merge thinks it has the real current, so it updates current and messes up deep subscripts.

Also, the absorb should work for all things, not just subscript.

'@(object: 1 + 1)' -> Invalid operation: object + integer

This is caused by binary taking over the typecast's values. I think this can be solved by making tree_typecast a tree that can be entered. This would keep binary ops from reparenting things.

README-stdlib.md

I need a readme that describes the built-in commands. This will be nice for me, as well as for any new people trying to figure out what the language is capable of.

Dig through lily_cls_str.c and lily_cls_list.c. For the globals, poke at lily_seed_symtab.h (I think). This is easy, and also really important.

lexer: lily_lexer should use a char * for iteration, not subscripts.

lily_lexer within lily_lexer.c is currently full of this sort of thing:

ch = lexer->lex_buffer[lex_bufpos];

This is far more complicated than it needs to be. It should instead have ch as a pointer to a location within the lexer's lex_buffer, by just doing ch++ to increase as needed.

This can start off by just being in lily_lexer, and possibly used throughout the code in lexer later. This will make lexer simpler, and hopefully cut down the size of lily_lexer.

Multiple runs are horribly broken :(

When the vm is done running, it clears values in main which is...wrong. So wrong. This clearing needs to be done after the final vm pass.

Package set needs to account for compound ops

I'll need to add something like sys::intmax to actually test it though. Might as well add sys::intmin too, if I decide to do that. Anyway, the problem is that this:

sys::intmax -= 1

will fail because the emitter's package assign doesn't handle compound ops.

Lexer and parser token-related info should be autogenerated

Currently, adding a new token or changing tokens around is a -huge- hassle. A few steps need to be taken:

Is the new token in a group? Okay, add that and change all the offsets.
It isn't? Okay, write a special case for it. Yay!
Make sure the token is inserted in the right place. Good luck remembering that.
Did the token not get added to the end? Great, now find that position in is_start_val and bin_op_for_token in parser and add the right value.
Cross fingers, and hope that the token string has been added to the right place. Token printing currently runs off of "I hope this is in the right spot" to make sure the tokens are printed out right.

I've tried this before, but generally lose interest because most of the lexer issues have been gone for a while now. However, I'm starting to think that this is important because it's really hard to modify the lexer to add new tokens/new binary ops. lily_lexer (the call) has become a messs of sorts as well. It would be nice to have something autogenerated to take care of adding new tokens, ensuring the tokens have the right string, and more.

str -> string rename (internal issue)

Need a cool readme

A readme that explains how to use the language would be a nice start. Some documentation and demos would add to that.

integer::to_string():string

This would convert an integer to a string representation, because why not?

`list[object] x = [1, 2, 3]` should work.

I consider this a bug because the left side is a list of object, and lists of object are supposed to be able to take in any type. Emitter currently will report that list[integer] cannot be assigned to list[object] and give up.

Emitter needs to do a new failsafe check that goes something like this:

Is the left side a list of object? If yes, check if the right side is an ast of type tree_list.
Go back into the right tree again. Move the code emitting position to before the o_build_list was written. This should be 'm->pos -= ast->args_collected + 4;', or something similar.
Go through each of the tree elements and do an object conversion on them.
Write a new o_build_list, and indicate that the caller can run as normal.

If this is done for assignment, then it also needs to be done for values passed to methods/functions, and possibly when returning a value from a method. It must be all or none to keep the language consistent.

Join methods and functions into just functions (internal issue)

So...methods and functions are essentially the same thing: The first is a native block of code that can be called, and the second is a foreign block of code that can be called. Due to silly decisions in the past, these two have been split up. But they really shouldn't. First, the terms are wrong, but also:

Functions need to put their arguments in a list and "unbox" them. Since they don't, it's currently impossible to pass varargs from native code to foreign code.

Functions hack around not using the stack (see vm's err_function).

Lots of duplicate code to work with methods/functions.

Using the proper terms for things is nice.

...

It's not so much that this is difficult, but more that it involves carefully slashing out a bunch of code (yay) without breaking anything (boo). This should have been fixed earlier.

Create a garbage collector

It's finally time. Lily has been able to make circular references work due to a lot of trickery and (in one case), a completely blind fix. This has worked so far because lists are fairly basic to traverse (simply go over all elements and check for finding an already-found value). This must change if Lily is to get user-defined classes in later, because user-defined classes will be very expensive to traverse.

The solution is the one that I haven't wanted to do: Write a garbage collector. I have a bit of it started, but here's what I intend to do:

Anything that can be circular should have a gc entry attached to it. The gc entry will, in turn, have the object inside of it. These gc entries will be chained together.
Anything not circular will be refcounted. Entries with a gc pointer on them may be deref'd to oblivion, and symtab will need to update the gc pointer to let it know that the associated value is not NULL and non-reachable.
The gc will be split into two generations, with thresholds for each.

I have never made a gc before, so I'm going to make a total guess of how it should work. Wish me luck! :)

No more literals in code (internal issue)

When I converted the vm from using addresses to using code, I left the code typed as uintptr_t. Yes, that's right. The actual container for the bytecode is typed as uintptr_t *. At best, that's 8 bytes wide. This was necessary in the past, because addresses of values were left in the bytecode. Now, only literals are written into the bytecode.

I'd like to make code into a short int (2 bytes). Since the smallest method has 8 slots for code, this would mean a reduction of 48 bytes.

Anyway, this will require that literals get a positional id (easy, because symtab used to do that), and that literals get their own section in the vm as an indexable array. This requires a bit of tinkering around with debug (show), changing how vm gets literals, and making literals get an id. So there's a fair amount to change, but most of the changes are pretty basic.

ast: Need ast dumping utilities.

Debugging ast problems is currently much harder than it needs to be. Since asts are internal, there's no 'show' command that works for them. The ast pool handles subexpressions by saving the current/root trees to the saved trees inside ast pool. This works, but it's also hard to actually figure out what the current state of the ast pool is.

The ast pool should have a dumping function that is only enabled when a debug flag is set when compiling. This would dump out all trees, their types, and their subtrees. This would make it easy to determine if problems are occurring within the ast pool or if they come from elsewhere.

Postfix typecasts

Typecasts currently look like this:

object o = 10
integer a = @(integer: o)

Simple enough, but what about when some more complicated stuff is thrown in?

list[integer] integer_list = [1, 2, 3]
object v = 10
list[object] obj_list = [v, lsi]
list[integer] integer_list_two = @(list[integer]: obj_list[1])

I don't like this. If someone is reading left to right, they read the cast, then have to read the operation to find out what's being cast. This will only get more complicated if, in the future, it becomes more common to chain method calls. What I'd like to see looks like this:

object o = 10
integer a = o.@(integer)

Simple. 'o is cast to an integer'. The use of @(...) makes it so the syntax is still unambiguous. Now, for the other example:

This:
@(list[integer]: obj_list[1])
Becomes:
obj_list[1].@(list[integer])

In this version, a person can read it as a series of operations from left to right.
"Take object list, subscript the first value, then cast it as a list of integer."
seems more natural than:
"Cast to list of integer the value of subscripting object list at the first value."

I've tagged this as medium difficulty because it involves reworking the parser a bit. However, it may be possible to have ast and emitter unchanged. If they did need to be changed, it would be a simple act of pulling arguments from different locations. The vm and debug would be unchanged, so this won't be hard.

sanity.ly will need to be updated, but that's just a search-and-replace job.

Crasher! Setting a global object to a string

Here's the code:

object o = 10
method m():nil { o = "10" }
m()
show o

This causes a crash in debug because o_set_global just checks for stuff being refcounted, and thus places a string where the object's value should be (instead of inside the object). This results in show crashing because the object is laid out improperly.

o_set_global needs to account for objects, and o_get_global MAY need to account for that as well. I haven't tested pulling values from a global object, but I can see that it might fail.

Static hash values should be able to default to object.

They currently don't, but they should since that's what lists do when a list has different types.

blastmaster: A typical run should not aft all files

Blastmaster currently runs lily_aft for all files that it's given. This means that all tests are checked for a memory failure at any particular point. However, I've never found any bugs through doing aft on the failing files. If the failing tests result in memory-related issues, then I usually find out through sanity.ly (if it's something in the vm) or in hello_world.ly (if it's when booting up the interpreter).

This may require making a new dir for aft tests (though as of now, I would only want to put sanity.ly in there, since it's the most comprehensive test).

This would make blastmaster go faster, which is also nice.

Convert "<@lily" to "<?lily"

str lib: In-place modify if self has 1 ref and it's not a literal.

In theory, there should be circumstances where the string library has a 'self' (the first string passed) that has one reference and isn't a literal (aka protected). In this case, the string library should do in-place modifications to avoid creating a new resulting string. This idea came from reading some Perl documentation a while ago. It makes sense: If something has one ref, then an in-place modification won't hurt anything else.

hash::keys[A, B](hash[A, B] => list[A])

This is a function that would return a list of all keys currently in the hash. When combined with list::apply, this can be rather powerful:

hash[string, string] config_dict = ["abc" => "123", "def" => "456"]

function print_hash_key(string key)
{
printfmt("hash key %s is %s.\n", key, config_dict[key]
}

config_dict.keys().apply(print_hash_key)

This is probably not the best example though, because show already can print out hashes rather nicely!

1+1 does not work.

I've known about this for a long time, and I keep forgetting to fix it. The problem is pretty simple: The lexer sees '1+1' as two tokens: '1' (integer), and '+1' (integer).

To fix this:

Lexer's state needs to hold the sign of the last integer/number scanned, with it set to '\0' if the last integer/number didn't have a sign.
Parser's binary op needs to check for tk_integer and tk_number. If it finds one of these and the lexer has a sign, then the + or - is part of a binary op, and the integer/number doesn't have a sign. Push a binary op, and push a fixed integer/number.
The parser must also check that stripping the sign does not result in an overflow (-int_min as a positive is higher than +int_max).

apache: Bind server's POST vars as server::post::*

Essentially the same thing as binding the GET vars. Internally, apache stores the POST vars as...some sort of struct in an array, I think. It's a bit more complicated than server::get, and so should be done after it.

Show namespace info for functions

There are two areas where this is an issue:

In emitter, when there is an error calling a function
In vm, when printing a stack trace.

This will require adding a class name to each function, and then having emitter/vm print it as needed.

number -> double

This conversion was suggested by a couple people (at least, I think) when I asked for feedback on the language. I think it's a fair complaint, so I'll get to it. I think this will be easier than converting strings, because the number class isn't specifically used that often (whereas strings had their own library to convert over).

object -> any

show should be able to dump packages

This would be enormously helpful in the future when trying to debug what's inside of a package and/or just know what all is inside of a package.

Create a sys package to wrap argv

This involves creating a few things, so I'll start off at the beginning.

First, a sys package. A package should be thought of as a closed namespace, in that new vars cannot be added to it. However, vars can be taken from it and assigned to/from. sys itself won't be assignable to or from.

Vars inside a package will be accessed through 'package::var', meaning that the new token '::' will be made.

This calls for binding argv, which will be available as sys::argv. The type for it will be list[str] (similar to how Python binds argv as a list of strings).

apache: Bind server's GET vars as server::get::*

The module currently binds the server's environment as server::env, which has type 'hash[str, str]'. However, GET vars are more complicated:

Consider the following query strings:

?a = 1

a could be either a string or an integer. So hash[str, str] is limiting here.

?b[0] = 10

I'd like to see this work, but hash[str, str] won't support it. I could do hash[str, object], but that means accesses to GET vars become something like 'server::get["b"].@(hash[str,str])'. I don't like that.

As an alternative, I'd like to have GET vars created as values within a 'get' package that' within the 'server' package. The benefit of this approach is that vars of the appropriate type can be easily created.

Caveats:

The module shall not create GET variables where the names are not valid identifiers.
The module shall not create a var as two different types: If it has already been declared as a string, then creating it as a hash shall cause an error.
(So '?a=1&a[0]=1' would fail).
Initially, support for hashes shall be one level deep. So '?a[0][0]=1' shall fail. This may be extended to two levels later. The reason for this is to keep someone from attacking the server with a query like '?a[0][0][0][0][0][0]...' causing the interpreter to create an excessively deep hash.
Hash values will always be created, even if all elements are consecutive. Similarly, even if a value seems like it could be an integer/number, it will be created as a string. My thinking here is that I don't want to stump people by surprising them (oh, in this special case it makes an integer instead of a str). Hash values are also easier to add members to since they're internally a linked list.
Values created should never go anywhere other than server::get::*.

Line numbers are wrong

I noticed this when handling #11 and viewing the stack traces generated. Almost all line numbers printed are wrong. I took a peek at debug, and it seems to be off on the numbers as well.

My guess is that it's one (or more) of the following:

Debug is guessing the numbers wrong, since I don't entirely trust the branching.
Ast is being fed wrong numbers (lexer issue?), and passing those on, but vm is not always fixing line numbers.
vm isn't fixing the line numbers of the top of the stack when raising.

This is obviously -really- bad, so it's getting fixed next. I hate not being able to rely on line numbers in a stack trace, and debug being wrong also sucks.

Create a tuple class

It's time to make a tuple class because lists and hashes are rather restricted without one.

Tuple will take an arbitrary number of extra types within it (obviously), and be declared like lists/hashes (tuple[<types] name).

Tuple will initially only support subscripting by literal index so that the emitter can do type verification. The emitter will also be responsible for ensuring sane indexes into a tuple. Later on, I intend to allow tuples to be subscripted arbitrarily into an any.

Tuple values will be creatable through a new syntax: <[ ... ]>

Tuple is important because it's necessary for representing stack trace: Stack trace should be a file name, a function name, and a line number. So the entire stack trace could be represented via the type list[tuple[string, string, integer]].

Difficulty: Medium, because the framework for lists and hashes has been well tested. This will involve adding a new subscriptable entry, checking for a literal index (the index tree should be tree_readonly), and the vm part which should be super easy (thanks to lily_assign_value).

isnil()

There needs to be a call to determine if a given value has been set to nil. 1 if nil, 0 otherwise. Simple enough.

Need another comment roundup.

The comments need to be looked over again. I did this once a very long time ago. Here are the problems that I'd like to see resolved:

Almost every C function should have a comment saying not only what it does, but the params it takes.
I'd like to see functions grouped by what they do, but I'm becoming less particular about that as the codebase grows.
Some of the big comment blocks at the top of files need updating, because things have changed.
Comments relating to older stuff need to go away.
Some of the functions have a group prefix (like ast), but others don't. There needs to be consistency, but it should be a small prefix, since many function names are already long as-is.

The first one is the most important one. I'm not really sure about the necessity of the others.

str::strip(str self, str tostrip):str

This should do lstrip and rstrip on 'self' with 'tostrip', returning the result. Not too hard, because both lstrip and rstrip have already been written.

Add tests for cliexec

lily_cliexec isn't tested at all. It needs at least some basic tests to ensure that it's working properly.

Allow deeper packages (such as server::get::*) to work.

I did a pretty poor job of implementing packages initially. Package support only works one level deep (Allowing sys::argv to work), but doesn't go any deeper. The opcodes implementing package get and set expect a global variable for the package (incorrect in this case). Emitter's merge functions don't account for deep packages either.

I intend to use the vm's o_get_item/o_set_item for getting/setting values within deeper packages.

Parser: Join the multiline and non-multiline handling code.

Parser has far, far too much very similar code for handling single-line if statements and multi-line if statements. This code needs to be merged together because it's only going to become a bigger problem.

string::format(string self, any args...):string

Lily's version of sprintf. Should support everything that printfmt does, and maybe replace printfmt in the future. But for now, just having a method to do sprintf-like stuff would be awesome. In this case, self would also be the format string. Ex:

str x = "%d%d%d".format(123, 456, 789)

This is set to medium because it involves creating a new command that's fairly complex.

Redo lexer's file handling

The lexer should, instead of switching around an internal state, instead be creating a lexer file object that wraps either a file or a string. This would come with a reader function depending on the file is.

I'm imagining a struct like this:
struct lily_lexer_entry_t {
FILE _f;
char *str;
int hit_eof;
int (_readline_fn)(struct lily_lexer_entry_t)();
int line_num;
...
}

The aim of this is to make each lexer entry independent of the last one. This is important for adding package loading in the future if the lexer starts off reading a string. Also, a repl-like invocation might make use of this.

list::apply(method(T):nil):nil

The list class needs to be able to call a given method for each element within it. This is VERY hard, because it involves creating a way to call the vm again from within a function. This needs to ensure that the list doesn't get deleted as it works, resulting in it stepping over invalid elements. This is very hard because it involves vm and vm internals knowledge.

The signature for the method isn't that hard to craft though.

I have the code for list::apply done "for the most part". I wrote this before I rewrote lily_value to carry sig+flags, so it's somewhat out of date. I made this work once, but I never committed it because I had to do some nasty hacks to get it working.

The sys package needs a "not assignable" flag.

This flag would be checked before doing an assignment. The reason for this is that I'd like to add more packages. However, as it stands, I can't let sys get assigned to another package, because package access is done by raw index (for speed). It's also...really not something I want to encourage (Imagine including a package which overwrites sys and it DOESN'T have argv. No, just no).

This will allow sys to not be assignable, and it's part of making packages able to be passed to show.

fascinatedbox / lily Goto Github PK

lily's People

Contributors

Stargazers

Watchers

Forkers

lily's Issues

Recommend Projects

Recommend Topics

Recommend Org