velipso / sink Goto Github PK
View Code? Open in Web Editor NEWMinimal programming language for embedding small scripts in larger programs
License: BSD Zero Clause License
Minimal programming language for embedding small scripts in larger programs
License: BSD Zero Clause License
Seems kind of silly to create temporary lists for situations like this:
def add a, b
return a + b
end
say add 1, 2
The compiler should recognize that add
takes exactly two parameters, and ensure the correct number are passed. Perhaps instead of OP_CALL
, have OP_CALL0
, OP_CALL1
, etc, up to say, 15, and then provide a general-purpose OP_CALLVAR
or something.
This will take extra logic to figure out @
-calling, but that logic already exists for opcode commands, so we just need to re-purpose it.
Need to clean up the JavaScript API, and might as well use TypeScript while were at it.
Putting JavaScript implementations of new features on hold until finishing the C implementation, then porting to TypeScript in one go.
Performing permutation and combination of list elements. Kind of useful.
Should there be tasks?
This can be implemented in the host system, but there is a benefit of implementing it directly in sink: we can share the same compile-time file loading.
Something like this isn't working:
file1.sink
declare foo1 'foo1'
delcare foo2 'foo2' # yes, this is intentionally misspelled
declare foo3 'foo3'
say 'hello'
file2.sink
include 'foo'
say 'hi'
$ sink -D foo file1.sink file2.sink
The error in file1.sink
doesn't cause stop the include process and halt the script.
This seems like it will be a normal occurrence:
namespace AnythingButItHasToBeUnique
include 'somefile.sink'
end
using AnythingButItHasToBeUnique
The idea here is that 'somefile.sink'
might include definitions that were already included. By putting it in its own namespace, the second include will still work. However, we want access to the new definitions inside of 'somefile.sink'
without having to type the wrapping namespace (that was only there to allow the multiple includes).
Currently we have:
include foo 'foo.sink'
To be short for:
namespace foo
include 'foo.sink'
end
I think the next step is to allow something like:
include . 'foo.sink'
To be short for:
namespace SomeUniqueNamespaceThatCanNeverHaveCollisions
include 'foo.sink'
end
using SomeUniqueNamespaceThatCanNeverHaveCollisions
Where the unique namespace isn't even addressable in any way.
I think we can remove SINK_PANIC
entirely from the C implementation.
If allocation fails during object creation, then simply return NULL
and let the user figure out what to do.
If allocation fails during context execution, then abort with an error that the context is out of memory.
This compiles, then complains about invalid program code:
declare foo
say foo 1
namespace foo
include 'shell'
end
This doesn't work because program_genBody
doesn't correctly handle AST_INCLUDE
's :-(.
The C implementation of strings is pretty dumb and straight forward (and slow).
We should probably implement a form of rope under the hood. Notice that node.js's v8 already does in some form, because ./perf/cat.sink
runs significantly faster in the JS implementation.
See: https://en.wikipedia.org/wiki/Rope_(data_structure)
We can likely keep the C API the same, and just use ropes internally, and flatten ropes when sink_caststr
is called.
I think, with the new precedence rules that put equals at the very bottom, we can now implement multiple lvalue assignment, like:
var x, y
x, y = 1, 2
x, y = y, x
# etc
Lots of things can fall under this:
say 1 + 2
var x = 10
say x + 20
say ((1 + 2) * 3) - 4
say {1, 2, 3} * 4
say {{1, 2, 3}, 2, 3}[0][1] + 4
x = 1
say {1, 2, 3}[x]
etc, etc,
Not sure how far down this rabbit hole I want to go.
This should work:
enum foo = 5 + 5
etc...
The current C implementation copies over the string table into the context during context_new
(via sink_str_newblob
) -- this is unnecessary and should be avoided (especially now since embed
can mean large strings).
Instead, it should allocate the strings but point the *bytes
pointer to the string table (which will need 0
-terminated bytes). Then it should never free the strings, and simply let the program
free the string table when it's freed.
(I won't implement this now, but I wanted to document the idea I just had.)
The "problem" is that there are only 256 registers per frame. However, there are 256 frame levels... Most likely, a user will run out of registers before frame levels. One way to perform a trade-off is to hard-code a different ratio in the bytecode (i.e., 1024 registers per frame with 64 frame levels, etc), but there is another way:
We can transform code like this:
def foo
var a, b, c, ... more than 256 variables....
<body>
end
To this:
def foo
var a, b, c, ... 256 variables...
def foo_2
var aa, bb, cc, ... 256 variables...
def foo_3
var aaa, bbb, ccc, ... 256 variables...
... more frames as needed...
<body>
end
return foo_3
end
return foo_2
end
This transformation always works, and gives us an additional 256 registers for every new frame.
And, more importantly, I don't have to do anything right now! If register space becomes a problem, adding this feature to the compiler will magically expand register space at the cost of frame levels -- without changing the bytecode or virtual machine.
As it stands, I think 256 registers should be fine... but who knows in the future.
var x = {1, 2, 3, 4}
say (x[1:2] = {5, 6, 7})
Current output: {5, 6}
Desired output: {5, 6, 7}
but not sure if it should be the rvalue, or a copy.
Currently, OP_CAT
works two operands at a time. This could be really inefficient for large concatenations, i.e.,
'a' ~ 'b' ~ 'c' ~ 'd' ~ 'e' ~ 'f' ~ ... ~ 'z'
It will create a ton of temporary strings, first:
'ab' ~ 'c' ~ 'd' ~ ... ~ 'z'
'abc' ~ 'd' ~ 'e' ~ ... ~ 'z'
'abcd' ~ 'e' ~ 'f' ~ ... ~ 'z'
etc...
Might want to fix that.
It would be nice if a runtime error occurred that the opcode could be traced back to a line number and source file. Right now, it just aborts with the error, with no source/line/char information. Kind of annoying.
Seems like this would be useful:
rand.range [start, ]stop[, step]
I.e., a random number 0-9 would be rand.range 10
, etc
After having a body of sink code to test against, I should start testing for what opcodes would produce the best speed improvements.
One idea (that should first be confirmed with real data) is that getting/setting lists at static offsets is something that will happen a lot.
To set a list at position 0 to a value, the current opcode sequence is something like:
r0 = get the list somehow
r1 = NUMP8 0
r2 = get the value somehow
SETAT r0, r1, r2
I wonder if it would be faster to do something like:
r0 = get the list somehow
r1 = get the value somehow
SETAT0 r0, r1
Where I introduce a SETAT0
, SETAT1
, SETAT2
, up until (say) SETAT15
. Likewise for GETAT0
-GETAT15
.
The idea here being that this will be a common operation, especially since lists are our only compound value.
What is the point of @
calling? Seems unnecessary and could be removed.
Functions cannot be stored in variables, so currying is impossible, which would be the primary use case for @
calling... If there is a need for dynamic arguments, just code it as a list in the first place.
I used to have @
calling because it was free, but after issue #17 's changes, it is no longer free. Might as well remove it?
Should probably allow something like this:
var x = {{1, 2, 3}} * 2
say x # {{2, 4, 6}}
Right now, it errors out.
Lots of code like this:
var something = ctx.ops[ctx.pc++];
Without checking that PC hasn't run off the end.
Currently not a problem because all bytecode comes from the code-generator, but once we start saving bytecode to disk, there are problems of corrupt scripts.
Create a new compile-time command (like pick
), called embed
, that embeds the contents of a file as a string.
var foo = embed "somefile.png"
The C implementation uses a goofy SINK_ASYNC
/sink_ctx_asyncresult
/sink_ctx_onasync
mess.
Instead, let's just implement a basic Promise system, just like the TypeScript version. Much easier to explain, and keeps the APIs more similar.
Need a test that checks how slicing and splicing will work past list/string boundaries, due to the JavaScript implementation performing the wrapping and limits automatically, but the C version needing to calculate the results exactly.
Ex:
var x = {1, 2, 3, 4, 5}
say x[-100:1000]
etc
var x = 'hello'
x[1:2] = 'yoyo'
Current result: crash
Desired result: boy it would be nice if x
could be set to 'hyoyolo'
, but not sure if that's correct..?
One annoyance with sink is that there isn't closures. There might be a way to add a pseudo-closure that could be useful. Not sure if I want to do this, but here's the idea.
In order to get around no closures, I end up doing something like:
enum CMD1, CMD2
def onCmd1 a, b
end
def onCmd2 a, b, c, d
end
def dispatch cmd, args
if cmd == CMD1; onCmd1 args[0], args[1]
elseif cmd == CMD2; onCmd2 args[0], args[1], args[2], args[3]
end
abort 'Bad command'
end
The downside is obvious: I have to maintain this wiring code to dispatch messages to the right command.
Suppose, instead, that I introduced two new ideas:
@
unary operator, for converting a function to unique numbereval
command, that takes a function-number, and a list, and dispatches the command for usSomething like this:
def onCmd1 a, b
end
def onCmd2 a, b, c, d
end
var cmd1 = @onCmd1
var cmd2 = @onCmd2
eval cmd1, {1, 2}
eval cmd2, {1, 2, 3, 4}
This is totally doable. We could have a list of "addressable commands" as part of the sink binary in the header. They map function-number to address location.
The var cmd1 = @onCmd1
simply becomes (in bytecode) var cmd1 = 0
(or whatever the function-number becomes).
Then, eval 0, {1, 2}
performs a lookup for the label in the addressable-commands table, checks to make sure the lexical level is okay, unpacks the arguments, and performs the call.
Some issues:
@
native commands?@
opcode commands?Issue 1 and 2 are technical issues, which could be overcome a variety of ways (none obvious now, but doable).
Issue 3 is annoying. Maybe a user tries this:
def adder a, b
def add
return a + b
end
return @add
end
var add12 = adder 1, 2 # so far so good...
eval add12 # now we have an error about lexical level,
# which is probably really confusing
The fact that this is a pseudo-closure is now suddenly annoying.
Think about this.
Seems like the OP_CALL
level and the var_get
frame diff can be calculated more efficiently.
The lexical level of a command is static and doesn't need to be calculated at run-time. Should be able to hard-code it directly (which should also make bytecode validation easier).
opi_abortformat(ctx, "Expecting list for argument %d", index + 1);
=>
opi_abortformat(ctx, "Expecting list for item %d", index + 1);
This way, users can use things other than args (like, decoding values in a list).
If a variable isn't needed, this would be nice:
for: range 5
# loop 5 times
end
I suspect the following code allocates two frame variables:
do
var a = 1
end
do
var b = 2
end
But really, b
could reuse a
's register.
Double check and fix.
Now that there are enums, we should change structured templates from lists of strings to lists of enums.
{ struct.U8, struct.UL16, ... }
Should consider having a basic compression of large strings in the bytecode. Then uncompress upon load.
Browser support for more esoteric math functions has wide-range support now:
log1p
expm1
cosh
sinh
tanh
acosh
asinh
atanh
fround
emod (Euclidian mod)
Also might want to add some extra int
commands:
pop (Hamming weight)
ctz (Count trailing zero)
bswap (Reverse ordering of bytes)
emod (Euclidian mod)
rol (Rotate left)
ror (Rotate right)
Maybe an easy test for native little/big endian... struct.endian
or something.
Maybe some CSV functions in shell? csv.list str
and csv.str list
...
I keep trying to find a way around it, but I can't -- I think sink needs enums, i.e., compile-time number constants.
The problem is that I only have 256 registers per frame, so if I use var
's as pseudo-constants, I run out of register space. I could use strings, but then the C side will need a mapping from strings to numbers. It's just a pain.
Much easier to have compile-time number constants.
If so, will need to change num.tau
, num.pi
, num.e
to use them.
Also:
If we do this (probably) then it would be nice if int.or
, int.and
, etc, take multiple arguments.
int.or SOMETHING, ANOTHER, ONEMORE, FOURTH
etc, etc,
Whether OP_NUM_ADD
, OP_NUM_DIV
, etc, take multiple, is left open.
There appears to be a problem with lexical scoping in relation to loop blocks. Consider the following program:
def calculate rows
var sum = 0
for var row: rows
var largest, smallest
say "begin loop: largest = $largest, smallest = $smallest"
for var number: row
if largest == nil || number > largest
largest = number
end
if smallest == nil || number < smallest
smallest = number
end
end
say "end loop: largest = $largest, smallest = $smallest"
sum += largest - smallest
end
return sum
end
say "result: ${calculate { { 5, 1, 9, 5}, {7, 5, 3}, {2, 4, 6, 8} }}" # returns 24, but should be 18
At the beginning of the first iteration of the outer loop, largest and smallest are both nil. With each iteration of the outer loop, I would expect largest and smallest to be nil. However, what happens instead is that with subsequent iterations of the loop, they retain their values from the previous iteration. I can work around this by assigning nil to these variables when they are declared, but it doesn't seem like I should have to do this.
Create a new opcode for tail recursion and detect it statically.
def factorial a
def step left, total
if left <= 0
return total
end
return step left - 1, total * left
end
return step a, 1
end
int.or
, int.and
, and int.xor
should take multiple arguments:
int.or A, B, C, D, E
etc.
Should there be a way to specify arbitrary metadata associated with a sink script?
Something like:
meta 'key1', 'value1'
meta 'key2', 'value2'
# ...
...that could be queried from the host without executing the script.
REPL:
1: say 'hi
fish: Job 1, './sink' terminated by signal SIGSEGV (Address boundary error)
What about this:
const x = 'hello'
str.split causes Sink to crash when presented with the following input:
var input = "6592822488931338589815525425236818285229555616392928433262436847386544514648645288129834834862363847542262953164877694234514375164927616649264122487182321437459646851966649732474925353281699895326824852555747127547527163197544539468632369858413232684269835288817735678173986264554586412678364433327621627496939956645283712453265255261565511586373551439198276373843771249563722914847255524452675842558622845416218195374459386785618255129831539984559644185369543662821311686162137672168266152494656448824719791398797359326412235723234585539515385352426579831251943911197862994974133738196775618715739412713224837531544346114877971977411275354168752719858889347588136787894798476123335894514342411742111135337286449968879251481449757294167363867119927811513529711239534914119292833111624483472466781475951494348516125474142532923858941279569675445694654355314925386833175795464912974865287564866767924677333599828829875283753669783176288899797691713766199641716546284841387455733132519649365113182432238477673375234793394595435816924453585513973119548841577126141962776649294322189695375451743747581241922657947182232454611837512564776273929815169367899818698892234618847815155578736875295629917247977658723868641411493551796998791839776335793682643551875947346347344695869874564432566956882395424267187552799458352121248147371938943799995158617871393289534789214852747976587432857675156884837634687257363975437535621197887877326295229195663235129213398178282549432599455965759999159247295857366485345759516622427833518837458236123723353817444545271644684925297477149298484753858863551357266259935298184325926848958828192317538375317946457985874965434486829387647425222952585293626473351211161684297351932771462665621764392833122236577353669215833721772482863775629244619639234636853267934895783891823877845198326665728659328729472456175285229681244974389248235457688922179237895954959228638193933854787917647154837695422429184757725387589969781672596568421191236374563718951738499591454571728641951699981615249635314789251239677393251756396"
say (str.split input, "")
Smaller strings are split correctly, so I'm guessing it's something to do with the length of the string.
Toying with the idea that a single compiled sink script could provide all the resources/behavior of a game.
There are a few things I would need:
embed
, but for any large strings, really)Given these features, I could completely eliminate the idea of the host environment requiring a file system -- everything could be driven by a single sink binary file.
Need to serialize and deserialize the program
data structure, and validate the program using the validator.
It's very annoying that the codebase has checkPromise
and isPromise
exposed to the end user.
Another idea is to have async/sync versions of the same function. For example,
export interface inc_st {
f_fstype: fstype_f; // always returns a Promise
f_fstypeSync: fstypeSync_f; // always returns results
f_fsread: fsread_f; // always returns a Promise
f_fsreadSync: fsreadSync_f; // always returns results
user?: any;
}
This way a user always knows what they're getting back. Either a result (using Sync
versions), or a Promise (normal versions). There's no need for checkPromise
or isPromise
.
The only downside is that native functions would have to be implemented in both ways. If you implement a Sync
version, that's easily convertible to async, but if you only have async, then requesting ctx_runSync
would have to abort in error.
I don't think that's that bad.
It pushes the annoying distinction off to the library implementer, instead of the end user.
Search the JavaScript source for fsread
, and you'll see a bunch of spots where I need to Promise-ify the code.
I think I want to include heredocs... the format should be similar to Markdown:
test ```
one
two
three
```
It should start with 3 or more backticks, followed by a newline (or EOL comment), followed by the (possibly tabbed) data, followed by closing backticks of the same count.
The amount of tabs to remove should be determined by how much the closing fence is tabbed. I.e.,
test ```
one
two
three
```
This should only remove four spaces from each line, because the closing fence is 4 spaces over.
The range
command should generate a list of numbers. Special logic should exist to detect for var v: range foo
to skip creating the list.
Something like this:
if 1
say 'hi'
end
Should transform to:
do
say 'hi'
end
I would just do this now, but there is additional complexity dealing with elseif
and else
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.