fascinatedbox / lily Goto Github PK
View Code? Open in Web Editor NEWInterpreted language focused on expressiveness and type safety.
Home Page: http://lily-lang.org
License: MIT License
Interpreted language focused on expressiveness and type safety.
Home Page: http://lily-lang.org
License: MIT License
This is largely to allow for object == object. But this will also allow lists to compare by value, instead of by reference (because [1] != [1] makes no sense to me). This will need to have some sort of strategy for protecting against circular references.
Here's how things are done right now:
Put simply, the problem is that code reuse is impossible, which is a shame because there are a lot of common things that these types do.
The solution is as follows:
Why this needs to be done:
I've changed my mind as to how I want to do this since I first opened this bug. To begin with, I think the language should adopt support for anonymous blocks which would start with { and end with }. An anonymous block would be a series of statements (so if and the like would be allowed), with a known single result.
I like the idea of lambdas not specifying the types of their arguments when possible, but requiring those types in cases where no determination can be made about the types of the arguments.
I'm also coming to like Ruby's syntax for lambdas: {|args| => expression}. Or at least the idea is borrowed from Ruby. In the case of a lambda, only one expression would be allowed (and it would be the value returned if a value is needed). Multi-line lambdas would work by making the expression a block (I believe that Java or maybe Scala does something to this effect), with the block being able to do all of the nice stuff like if's, loops, etc.
This requires a few things to be done, however:
The interpreter is in need of a file class with some basic file operations. This would allow print and printfmt to finally start being phased out in favor of some actual IO abilities. Here's what it could start off with:
(static) function file::open(str filename, str mode): file
function file::close():nil
function file::write(str text):nil
function file::readlines():list[str]
Obviously, this would just be a starting point. The main point of this is to get a cool file class started so Lily can be more useful.
lists and hashes that have the same type should be comparable using '==' and '!=' . This would be a comparison by pointer, not by value.
This would remove whitespace from both sides of a given string, returning the result. This shouldn't be too hard, because there's already an lstrip and an rstrip to work with.
What the title says. A string should be subscriptable (and maybe subscript assignable too). Using Python as a model, a subscript of a string should return a string. If the string has a utf-8 sequence, then the returned string should have that entire sequence (rather than returning a single utf-8 sequence point).
Strings will be adjusted to carry an 'is_ascii' part. If a string has no utf-8 parts, then subscripts can be done as a single simple index. Otherwise, subscripts will need to loop over the string to make sure that they return the entire utf-8 sequence if they need to.
Lists can currently be created in a static manner, such as this:
list[integer] l = [1, 2, 3]
but the same is not available for hashes. Hash creation would need key and value pairs, rather than plain values. These pairs would be split up by arrows. It would look something like this:
hash[str, integer] h = ["a" => 123, "b" => 456, "c" => 789]
Hashes are a bit complicated though, because duplicate keys can exist in a static hash. In this case, the interpreter will pick the right-most value given. This is similar to how Python, Ruby, and Lua all handle this sort of situation. Additionally, static hashes must ensure that the key given does not default to object (objects are not valid keys), and that it is itself a valid key type. The values should be able to default to object though.
Note: The fix seems to be wrapping the absorb merge in unary with saving the active and restoring the active. Absorb merge thinks it has the real current, so it updates current and messes up deep subscripts.
Also, the absorb should work for all things, not just subscript.
This is caused by binary taking over the typecast's values. I think this can be solved by making tree_typecast a tree that can be entered. This would keep binary ops from reparenting things.
I need a readme that describes the built-in commands. This will be nice for me, as well as for any new people trying to figure out what the language is capable of.
Dig through lily_cls_str.c and lily_cls_list.c. For the globals, poke at lily_seed_symtab.h (I think). This is easy, and also really important.
lily_lexer within lily_lexer.c is currently full of this sort of thing:
ch = lexer->lex_buffer[lex_bufpos];
This is far more complicated than it needs to be. It should instead have ch as a pointer to a location within the lexer's lex_buffer, by just doing ch++ to increase as needed.
This can start off by just being in lily_lexer, and possibly used throughout the code in lexer later. This will make lexer simpler, and hopefully cut down the size of lily_lexer.
When the vm is done running, it clears values in main which is...wrong. So wrong. This clearing needs to be done after the final vm pass.
I'll need to add something like sys::intmax to actually test it though. Might as well add sys::intmin too, if I decide to do that. Anyway, the problem is that this:
sys::intmax -= 1
will fail because the emitter's package assign doesn't handle compound ops.
Currently, adding a new token or changing tokens around is a -huge- hassle. A few steps need to be taken:
I've tried this before, but generally lose interest because most of the lexer issues have been gone for a while now. However, I'm starting to think that this is important because it's really hard to modify the lexer to add new tokens/new binary ops. lily_lexer (the call) has become a messs of sorts as well. It would be nice to have something autogenerated to take care of adding new tokens, ensuring the tokens have the right string, and more.
A readme that explains how to use the language would be a nice start. Some documentation and demos would add to that.
This would convert an integer to a string representation, because why not?
I consider this a bug because the left side is a list of object, and lists of object are supposed to be able to take in any type. Emitter currently will report that list[integer] cannot be assigned to list[object] and give up.
Emitter needs to do a new failsafe check that goes something like this:
If this is done for assignment, then it also needs to be done for values passed to methods/functions, and possibly when returning a value from a method. It must be all or none to keep the language consistent.
So...methods and functions are essentially the same thing: The first is a native block of code that can be called, and the second is a foreign block of code that can be called. Due to silly decisions in the past, these two have been split up. But they really shouldn't. First, the terms are wrong, but also:
Functions need to put their arguments in a list and "unbox" them. Since they don't, it's currently impossible to pass varargs from native code to foreign code.
Functions hack around not using the stack (see vm's err_function).
Lots of duplicate code to work with methods/functions.
Using the proper terms for things is nice.
...
It's not so much that this is difficult, but more that it involves carefully slashing out a bunch of code (yay) without breaking anything (boo). This should have been fixed earlier.
It's finally time. Lily has been able to make circular references work due to a lot of trickery and (in one case), a completely blind fix. This has worked so far because lists are fairly basic to traverse (simply go over all elements and check for finding an already-found value). This must change if Lily is to get user-defined classes in later, because user-defined classes will be very expensive to traverse.
The solution is the one that I haven't wanted to do: Write a garbage collector. I have a bit of it started, but here's what I intend to do:
I have never made a gc before, so I'm going to make a total guess of how it should work. Wish me luck! :)
When I converted the vm from using addresses to using code, I left the code typed as uintptr_t. Yes, that's right. The actual container for the bytecode is typed as uintptr_t *. At best, that's 8 bytes wide. This was necessary in the past, because addresses of values were left in the bytecode. Now, only literals are written into the bytecode.
I'd like to make code into a short int (2 bytes). Since the smallest method has 8 slots for code, this would mean a reduction of 48 bytes.
Anyway, this will require that literals get a positional id (easy, because symtab used to do that), and that literals get their own section in the vm as an indexable array. This requires a bit of tinkering around with debug (show), changing how vm gets literals, and making literals get an id. So there's a fair amount to change, but most of the changes are pretty basic.
Debugging ast problems is currently much harder than it needs to be. Since asts are internal, there's no 'show' command that works for them. The ast pool handles subexpressions by saving the current/root trees to the saved trees inside ast pool. This works, but it's also hard to actually figure out what the current state of the ast pool is.
The ast pool should have a dumping function that is only enabled when a debug flag is set when compiling. This would dump out all trees, their types, and their subtrees. This would make it easy to determine if problems are occurring within the ast pool or if they come from elsewhere.
Typecasts currently look like this:
object o = 10
integer a = @(integer: o)
Simple enough, but what about when some more complicated stuff is thrown in?
list[integer] integer_list = [1, 2, 3]
object v = 10
list[object] obj_list = [v, lsi]
list[integer] integer_list_two = @(list[integer]: obj_list[1])
I don't like this. If someone is reading left to right, they read the cast, then have to read the operation to find out what's being cast. This will only get more complicated if, in the future, it becomes more common to chain method calls. What I'd like to see looks like this:
object o = 10
integer a = o.@(integer)
Simple. 'o is cast to an integer'. The use of @(...) makes it so the syntax is still unambiguous. Now, for the other example:
This:
@(list[integer]: obj_list[1])
Becomes:
obj_list[1].@(list[integer])
In this version, a person can read it as a series of operations from left to right.
"Take object list, subscript the first value, then cast it as a list of integer."
seems more natural than:
"Cast to list of integer the value of subscripting object list at the first value."
I've tagged this as medium difficulty because it involves reworking the parser a bit. However, it may be possible to have ast and emitter unchanged. If they did need to be changed, it would be a simple act of pulling arguments from different locations. The vm and debug would be unchanged, so this won't be hard.
sanity.ly will need to be updated, but that's just a search-and-replace job.
Here's the code:
object o = 10
method m():nil { o = "10" }
m()
show o
This causes a crash in debug because o_set_global just checks for stuff being refcounted, and thus places a string where the object's value should be (instead of inside the object). This results in show crashing because the object is laid out improperly.
o_set_global needs to account for objects, and o_get_global MAY need to account for that as well. I haven't tested pulling values from a global object, but I can see that it might fail.
They currently don't, but they should since that's what lists do when a list has different types.
Blastmaster currently runs lily_aft for all files that it's given. This means that all tests are checked for a memory failure at any particular point. However, I've never found any bugs through doing aft on the failing files. If the failing tests result in memory-related issues, then I usually find out through sanity.ly (if it's something in the vm) or in hello_world.ly (if it's when booting up the interpreter).
This may require making a new dir for aft tests (though as of now, I would only want to put sanity.ly in there, since it's the most comprehensive test).
This would make blastmaster go faster, which is also nice.
In theory, there should be circumstances where the string library has a 'self' (the first string passed) that has one reference and isn't a literal (aka protected). In this case, the string library should do in-place modifications to avoid creating a new resulting string. This idea came from reading some Perl documentation a while ago. It makes sense: If something has one ref, then an in-place modification won't hurt anything else.
This is a function that would return a list of all keys currently in the hash. When combined with list::apply, this can be rather powerful:
hash[string, string] config_dict = ["abc" => "123", "def" => "456"]
function print_hash_key(string key)
{
printfmt("hash key %s is %s.\n", key, config_dict[key]
}
config_dict.keys().apply(print_hash_key)
This is probably not the best example though, because show already can print out hashes rather nicely!
I've known about this for a long time, and I keep forgetting to fix it. The problem is pretty simple: The lexer sees '1+1' as two tokens: '1' (integer), and '+1' (integer).
To fix this:
Essentially the same thing as binding the GET vars. Internally, apache stores the POST vars as...some sort of struct in an array, I think. It's a bit more complicated than server::get, and so should be done after it.
There are two areas where this is an issue:
This will require adding a class name to each function, and then having emitter/vm print it as needed.
This conversion was suggested by a couple people (at least, I think) when I asked for feedback on the language. I think it's a fair complaint, so I'll get to it. I think this will be easier than converting strings, because the number class isn't specifically used that often (whereas strings had their own library to convert over).
This would be enormously helpful in the future when trying to debug what's inside of a package and/or just know what all is inside of a package.
This involves creating a few things, so I'll start off at the beginning.
First, a sys package. A package should be thought of as a closed namespace, in that new vars cannot be added to it. However, vars can be taken from it and assigned to/from. sys itself won't be assignable to or from.
Vars inside a package will be accessed through 'package::var', meaning that the new token '::' will be made.
This calls for binding argv, which will be available as sys::argv. The type for it will be list[str] (similar to how Python binds argv as a list of strings).
The module currently binds the server's environment as server::env, which has type 'hash[str, str]'. However, GET vars are more complicated:
Consider the following query strings:
?a = 1
a could be either a string or an integer. So hash[str, str] is limiting here.
?b[0] = 10
I'd like to see this work, but hash[str, str] won't support it. I could do hash[str, object], but that means accesses to GET vars become something like 'server::get["b"].@(hash[str,str])'. I don't like that.
As an alternative, I'd like to have GET vars created as values within a 'get' package that' within the 'server' package. The benefit of this approach is that vars of the appropriate type can be easily created.
Caveats:
I noticed this when handling #11 and viewing the stack traces generated. Almost all line numbers printed are wrong. I took a peek at debug, and it seems to be off on the numbers as well.
My guess is that it's one (or more) of the following:
This is obviously -really- bad, so it's getting fixed next. I hate not being able to rely on line numbers in a stack trace, and debug being wrong also sucks.
It's time to make a tuple class because lists and hashes are rather restricted without one.
Tuple will take an arbitrary number of extra types within it (obviously), and be declared like lists/hashes (tuple[<types] name).
Tuple will initially only support subscripting by literal index so that the emitter can do type verification. The emitter will also be responsible for ensuring sane indexes into a tuple. Later on, I intend to allow tuples to be subscripted arbitrarily into an any.
Tuple values will be creatable through a new syntax: <[ ... ]>
Tuple is important because it's necessary for representing stack trace: Stack trace should be a file name, a function name, and a line number. So the entire stack trace could be represented via the type list[tuple[string, string, integer]].
Difficulty: Medium, because the framework for lists and hashes has been well tested. This will involve adding a new subscriptable entry, checking for a literal index (the index tree should be tree_readonly), and the vm part which should be super easy (thanks to lily_assign_value).
There needs to be a call to determine if a given value has been set to nil. 1 if nil, 0 otherwise. Simple enough.
The comments need to be looked over again. I did this once a very long time ago. Here are the problems that I'd like to see resolved:
The first one is the most important one. I'm not really sure about the necessity of the others.
This should do lstrip and rstrip on 'self' with 'tostrip', returning the result. Not too hard, because both lstrip and rstrip have already been written.
lily_cliexec isn't tested at all. It needs at least some basic tests to ensure that it's working properly.
I did a pretty poor job of implementing packages initially. Package support only works one level deep (Allowing sys::argv to work), but doesn't go any deeper. The opcodes implementing package get and set expect a global variable for the package (incorrect in this case). Emitter's merge functions don't account for deep packages either.
I intend to use the vm's o_get_item/o_set_item for getting/setting values within deeper packages.
Parser has far, far too much very similar code for handling single-line if statements and multi-line if statements. This code needs to be merged together because it's only going to become a bigger problem.
Lily's version of sprintf. Should support everything that printfmt does, and maybe replace printfmt in the future. But for now, just having a method to do sprintf-like stuff would be awesome. In this case, self would also be the format string. Ex:
str x = "%d%d%d".format(123, 456, 789)
This is set to medium because it involves creating a new command that's fairly complex.
The lexer should, instead of switching around an internal state, instead be creating a lexer file object that wraps either a file or a string. This would come with a reader function depending on the file is.
I'm imagining a struct like this:
struct lily_lexer_entry_t {
FILE _f;
char *str;
int hit_eof;
int (_readline_fn)(struct lily_lexer_entry_t)();
int line_num;
...
}
The aim of this is to make each lexer entry independent of the last one. This is important for adding package loading in the future if the lexer starts off reading a string. Also, a repl-like invocation might make use of this.
The list class needs to be able to call a given method for each element within it. This is VERY hard, because it involves creating a way to call the vm again from within a function. This needs to ensure that the list doesn't get deleted as it works, resulting in it stepping over invalid elements. This is very hard because it involves vm and vm internals knowledge.
The signature for the method isn't that hard to craft though.
I have the code for list::apply done "for the most part". I wrote this before I rewrote lily_value to carry sig+flags, so it's somewhat out of date. I made this work once, but I never committed it because I had to do some nasty hacks to get it working.
This flag would be checked before doing an assignment. The reason for this is that I'd like to add more packages. However, as it stands, I can't let sys get assigned to another package, because package access is done by raw index (for speed). It's also...really not something I want to encourage (Imagine including a package which overwrites sys and it DOESN'T have argv. No, just no).
This will allow sys to not be assignable, and it's part of making packages able to be passed to show.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.