velipso / sink Goto Github PK

View Code? Open in Web Editor NEW

49.0 3.0 4.0 3.31 MB

Minimal programming language for embedding small scripts in larger programs

License: BSD Zero Clause License

Shell 0.21% JavaScript 37.27% HTML 0.34% C 36.37% TypeScript 25.81% Batchfile 0.01%

programming-language scripting-language language embeddable

sink's People

Contributors

Stargazers

Watchers

Forkers

haifenghuang jddurand cfurter watch-later

sink's Issues

Should provide static binding for local commands with fixed argument length

Seems kind of silly to create temporary lists for situations like this:

def add a, b
    return a + b
end
say add 1, 2

The compiler should recognize that add takes exactly two parameters, and ensure the correct number are passed. Perhaps instead of OP_CALL, have OP_CALL0, OP_CALL1, etc, up to say, 15, and then provide a general-purpose OP_CALLVAR or something.

This will take extra logic to figure out @-calling, but that logic already exists for opcode commands, so we just need to re-purpose it.

Rewrite JavaScript implementation in TypeScript

Need to clean up the JavaScript API, and might as well use TypeScript while were at it.

Putting JavaScript implementations of new features on hold until finishing the C implementation, then porting to TypeScript in one go.

Add `list.permute`, `list.combine` commands

Performing permutation and combination of list elements. Kind of useful.

Tasks

Should there be tasks?

This can be implemented in the host system, but there is a benefit of implementing it directly in sink: we can share the same compile-time file loading.

Doesn't propagate errors during -D include

Something like this isn't working:

file1.sink

declare foo1 'foo1'
delcare foo2 'foo2' # yes, this is intentionally misspelled
declare foo3 'foo3'
say 'hello'

file2.sink

include 'foo'
say 'hi'

$ sink -D foo file1.sink file2.sink

The error in file1.sink doesn't cause stop the include process and halt the script.

Include + using file syntax

This seems like it will be a normal occurrence:

namespace AnythingButItHasToBeUnique
    include 'somefile.sink'
end
using AnythingButItHasToBeUnique

The idea here is that 'somefile.sink' might include definitions that were already included. By putting it in its own namespace, the second include will still work. However, we want access to the new definitions inside of 'somefile.sink' without having to type the wrapping namespace (that was only there to allow the multiple includes).

Currently we have:

include foo 'foo.sink'

To be short for:

namespace foo
    include 'foo.sink'
end

I think the next step is to allow something like:

include . 'foo.sink'

To be short for:

namespace SomeUniqueNamespaceThatCanNeverHaveCollisions
    include 'foo.sink'
end
using SomeUniqueNamespaceThatCanNeverHaveCollisions

Where the unique namespace isn't even addressable in any way.

Remove SINK_PANIC from C impl

I think we can remove SINK_PANIC entirely from the C implementation.

If allocation fails during object creation, then simply return NULL and let the user figure out what to do.

If allocation fails during context execution, then abort with an error that the context is out of memory.

Programs compile if commands are declared but not defined

This compiles, then complains about invalid program code:

declare foo
say foo 1

includes don't work inside bodies

namespace foo
  include 'shell'
end

This doesn't work because program_genBody doesn't correctly handle AST_INCLUDE's :-(.

Optimization: use ropes under the hood

The C implementation of strings is pretty dumb and straight forward (and slow).

We should probably implement a form of rope under the hood. Notice that node.js's v8 already does in some form, because ./perf/cat.sink runs significantly faster in the JS implementation.

See: https://en.wikipedia.org/wiki/Rope_(data_structure)

We can likely keep the C API the same, and just use ropes internally, and flatten ropes when sink_caststr is called.

Multi-assignment

I think, with the new precedence rules that put equals at the very bottom, we can now implement multiple lvalue assignment, like:

var x, y
x, y = 1, 2
x, y = y, x
# etc

Optimization: constant propagation

Lots of things can fall under this:

say 1 + 2
var x = 10
say x + 20
say ((1 + 2) * 3) - 4
say {1, 2, 3} * 4
say {{1, 2, 3}, 2, 3}[0][1] + 4
x = 1
say {1, 2, 3}[x]

etc, etc,

Not sure how far down this rabbit hole I want to go.

Enums should be more flexible about constant numbers

This should work:

enum foo = 5 + 5

etc...

Constant strings shouldn't duplicate string table

The current C implementation copies over the string table into the context during context_new (via sink_str_newblob) -- this is unnecessary and should be avoided (especially now since embed can mean large strings).

Instead, it should allocate the strings but point the *bytes pointer to the string table (which will need 0-terminated bytes). Then it should never free the strings, and simply let the program free the string table when it's freed.

Add frames to commands that need more register space

(I won't implement this now, but I wanted to document the idea I just had.)

The "problem" is that there are only 256 registers per frame. However, there are 256 frame levels... Most likely, a user will run out of registers before frame levels. One way to perform a trade-off is to hard-code a different ratio in the bytecode (i.e., 1024 registers per frame with 64 frame levels, etc), but there is another way:

We can transform code like this:

def foo
    var a, b, c, ... more than 256 variables....
    <body>
end

To this:

def foo
    var a, b, c, ... 256 variables...
    def foo_2
        var aa, bb, cc, ... 256 variables...
        def foo_3
            var aaa, bbb, ccc, ... 256 variables...
            ... more frames as needed...
            <body>
        end
        return foo_3
    end
    return foo_2
end

This transformation always works, and gives us an additional 256 registers for every new frame.

And, more importantly, I don't have to do anything right now! If register space becomes a problem, adding this feature to the compiler will magically expand register space at the cost of frame levels -- without changing the bytecode or virtual machine.

As it stands, I think 256 registers should be fine... but who knows in the future.

Splicing returns wrong value

var x = {1, 2, 3, 4}
say (x[1:2] = {5, 6, 7})

Current output: {5, 6}

Desired output: {5, 6, 7} but not sure if it should be the rvalue, or a copy.

Optimization: Might want concat to operate on multiple values

Currently, OP_CAT works two operands at a time. This could be really inefficient for large concatenations, i.e.,

'a' ~ 'b' ~ 'c' ~ 'd' ~ 'e' ~ 'f' ~ ... ~ 'z'

It will create a ton of temporary strings, first:

'ab' ~ 'c' ~ 'd' ~ ... ~ 'z'
'abc' ~ 'd' ~ 'e' ~ ... ~ 'z'
'abcd' ~ 'e' ~ 'f' ~ ... ~ 'z'
etc...

Might want to fix that.

Mapping of runtime errors to line numbers

It would be nice if a runtime error occurred that the opcode could be traced back to a line number and source file. Right now, it just aborts with the error, with no source/line/char information. Kind of annoying.

Add `rand.range` command

Seems like this would be useful:

rand.range [start, ]stop[, step]

I.e., a random number 0-9 would be rand.range 10, etc

Create opcodes for getting/setting lists at static positions

After having a body of sink code to test against, I should start testing for what opcodes would produce the best speed improvements.

One idea (that should first be confirmed with real data) is that getting/setting lists at static offsets is something that will happen a lot.

To set a list at position 0 to a value, the current opcode sequence is something like:

r0 = get the list somehow
r1 = NUMP8 0
r2 = get the value somehow
SETAT r0, r1, r2

I wonder if it would be faster to do something like:

r0 = get the list somehow
r1 = get the value somehow
SETAT0 r0, r1

Where I introduce a SETAT0, SETAT1, SETAT2, up until (say) SETAT15. Likewise for GETAT0-GETAT15.

The idea here being that this will be a common operation, especially since lists are our only compound value.

Remove `@` calling?

What is the point of @ calling? Seems unnecessary and could be removed.

Functions cannot be stored in variables, so currying is impossible, which would be the primary use case for @ calling... If there is a need for dynamic arguments, just code it as a list in the first place.

I used to have @ calling because it was free, but after issue #17 's changes, it is no longer free. Might as well remove it?

Number operations on nested lists

Should probably allow something like this:

var x = {{1, 2, 3}} * 2
say x # {{2, 4, 6}}

Right now, it errors out.

Need to validate bytecode before running

Lots of code like this:

var something = ctx.ops[ctx.pc++];

Without checking that PC hasn't run off the end.

Currently not a problem because all bytecode comes from the code-generator, but once we start saving bytecode to disk, there are problems of corrupt scripts.

Compile-time `embed` command for embedding files as strings

Create a new compile-time command (like pick), called embed, that embeds the contents of a file as a string.

var foo = embed "somefile.png"

Create a promise object for C implementation

The C implementation uses a goofy SINK_ASYNC/sink_ctx_asyncresult/sink_ctx_onasync mess.

Instead, let's just implement a basic Promise system, just like the TypeScript version. Much easier to explain, and keeps the APIs more similar.

Test for slicing/splicing past boundaries

Need a test that checks how slicing and splicing will work past list/string boundaries, due to the JavaScript implementation performing the wrapping and limits automatically, but the C version needing to calculate the results exactly.

Ex:

var x = {1, 2, 3, 4, 5}
say x[-100:1000]

etc

Splicing strings crashes

var x = 'hello'
x[1:2] = 'yoyo'

Current result: crash

Desired result: boy it would be nice if x could be set to 'hyoyolo', but not sure if that's correct..?

Create `eval` command (pseudo-closures)

One annoyance with sink is that there isn't closures. There might be a way to add a pseudo-closure that could be useful. Not sure if I want to do this, but here's the idea.

Problem

In order to get around no closures, I end up doing something like:

enum CMD1, CMD2

def onCmd1 a, b
end

def onCmd2 a, b, c, d
end

def dispatch cmd, args
    if     cmd == CMD1; onCmd1 args[0], args[1]
    elseif cmd == CMD2; onCmd2 args[0], args[1], args[2], args[3]
    end
    abort 'Bad command'
end

The downside is obvious: I have to maintain this wiring code to dispatch messages to the right command.

Solution?

Suppose, instead, that I introduced two new ideas:

An @ unary operator, for converting a function to unique number
An eval command, that takes a function-number, and a list, and dispatches the command for us

Something like this:

def onCmd1 a, b
end

def onCmd2 a, b, c, d
end

var cmd1 = @onCmd1
var cmd2 = @onCmd2

eval cmd1, {1, 2}
eval cmd2, {1, 2, 3, 4}

This is totally doable. We could have a list of "addressable commands" as part of the sink binary in the header. They map function-number to address location.

The var cmd1 = @onCmd1 simply becomes (in bytecode) var cmd1 = 0 (or whatever the function-number becomes).

Then, eval 0, {1, 2} performs a lookup for the label in the addressable-commands table, checks to make sure the lexical level is okay, unpacks the arguments, and performs the call.

Issues

Some issues:

How to deal with @ native commands?
How to deal with @ opcode commands?
Will users attempt to use these as proper closures and get screwed up?

Issue 1 and 2 are technical issues, which could be overcome a variety of ways (none obvious now, but doable).

Issue 3 is annoying. Maybe a user tries this:

def adder a, b
    def add
        return a + b
    end
    return @add
end

var add12 = adder 1, 2 # so far so good...
eval add12             # now we have an error about lexical level,
                       # which is probably really confusing

The fact that this is a pseudo-closure is now suddenly annoying.

Think about this.

Optimization: Calculate frame diffs differently

Seems like the OP_CALL level and the var_get frame diff can be calculated more efficiently.

The lexical level of a command is static and doesn't need to be calculated at run-time. Should be able to hard-code it directly (which should also make bytecode validation easier).

Change error message in `_arg_` functions from `argument` to `item`

opi_abortformat(ctx, "Expecting list for argument %d", index + 1);

opi_abortformat(ctx, "Expecting list for item %d", index + 1);

This way, users can use things other than args (like, decoding values in a list).

For loop without variables

If a variable isn't needed, this would be nice:

for: range 5
    # loop 5 times
end

Reuse stack frame variables when they go out of scope

I suspect the following code allocates two frame variables:

do
    var a = 1
end
do
    var b = 2
end

But really, b could reuse a's register.

Double check and fix.

Change structured template strings to enums

Now that there are enums, we should change structured templates from lists of strings to lists of enums.

{ struct.U8, struct.UL16, ... }

Store large strings with compression

Should consider having a basic compression of large strings in the bytecode. Then uncompress upon load.

More native functions (num, int, etc)

Browser support for more esoteric math functions has wide-range support now:

log1p
expm1
cosh
sinh
tanh
acosh
asinh
atanh
fround
emod (Euclidian mod)

Also might want to add some extra int commands:

pop (Hamming weight)
ctz (Count trailing zero)
bswap (Reverse ordering of bytes)
emod (Euclidian mod)
rol (Rotate left)
ror (Rotate right)

Maybe an easy test for native little/big endian... struct.endian or something.

Maybe some CSV functions in shell? csv.list str and csv.str list...

Enums

I keep trying to find a way around it, but I can't -- I think sink needs enums, i.e., compile-time number constants.

The problem is that I only have 256 registers per frame, so if I use var's as pseudo-constants, I run out of register space. I could use strings, but then the C side will need a mapping from strings to numbers. It's just a pain.

Much easier to have compile-time number constants.

If so, will need to change num.tau, num.pi, num.e to use them.

Also:

If we do this (probably) then it would be nice if int.or, int.and, etc, take multiple arguments.

int.or SOMETHING, ANOTHER, ONEMORE, FOURTH

etc, etc,

Whether OP_NUM_ADD, OP_NUM_DIV, etc, take multiple, is left open.

Lexical scoping bug with loops when variable declaration occurs before assignment

There appears to be a problem with lexical scoping in relation to loop blocks. Consider the following program:

def calculate rows
    var sum = 0
    for var row: rows
        var largest, smallest
        say "begin loop: largest = $largest, smallest = $smallest"
        for var number: row
            if largest == nil || number > largest
                largest = number
            end
            if smallest == nil || number < smallest
                smallest = number
            end
        end
        say "end loop: largest = $largest, smallest = $smallest"
        sum += largest - smallest
    end
    return sum
end

say "result: ${calculate { { 5, 1, 9, 5}, {7, 5, 3}, {2, 4, 6, 8} }}" # returns 24, but should be 18

At the beginning of the first iteration of the outer loop, largest and smallest are both nil. With each iteration of the outer loop, I would expect largest and smallest to be nil. However, what happens instead is that with subsequent iterations of the loop, they retain their values from the previous iteration. I can work around this by assigning nil to these variables when they are declared, but it doesn't seem like I should have to do this.

Implement tail recursion

Create a new opcode for tail recursion and detect it statically.

def factorial a
  def step left, total
    if left <= 0
      return total
    end
    return step left - 1, total * left
  end
  return step a, 1
end

Bit operators should take multiple arguments

int.or, int.and, and int.xor should take multiple arguments:

int.or A, B, C, D, E

etc.

Metadata

Should there be a way to specify arbitrary metadata associated with a sink script?

Something like:

meta 'key1', 'value1'
meta 'key2', 'value2'
# ...

...that could be queried from the host without executing the script.

Crash in REPL with unclosed string

REPL:

 1: say 'hi
fish: Job 1, './sink' terminated by signal SIGSEGV (Address boundary error)

Constants?

What about this:

const x = 'hello'

Splitting a long string causes Sink to crash

str.split causes Sink to crash when presented with the following input:

var input = "6592822488931338589815525425236818285229555616392928433262436847386544514648645288129834834862363847542262953164877694234514375164927616649264122487182321437459646851966649732474925353281699895326824852555747127547527163197544539468632369858413232684269835288817735678173986264554586412678364433327621627496939956645283712453265255261565511586373551439198276373843771249563722914847255524452675842558622845416218195374459386785618255129831539984559644185369543662821311686162137672168266152494656448824719791398797359326412235723234585539515385352426579831251943911197862994974133738196775618715739412713224837531544346114877971977411275354168752719858889347588136787894798476123335894514342411742111135337286449968879251481449757294167363867119927811513529711239534914119292833111624483472466781475951494348516125474142532923858941279569675445694654355314925386833175795464912974865287564866767924677333599828829875283753669783176288899797691713766199641716546284841387455733132519649365113182432238477673375234793394595435816924453585513973119548841577126141962776649294322189695375451743747581241922657947182232454611837512564776273929815169367899818698892234618847815155578736875295629917247977658723868641411493551796998791839776335793682643551875947346347344695869874564432566956882395424267187552799458352121248147371938943799995158617871393289534789214852747976587432857675156884837634687257363975437535621197887877326295229195663235129213398178282549432599455965759999159247295857366485345759516622427833518837458236123723353817444545271644684925297477149298484753858863551357266259935298184325926848958828192317538375317946457985874965434486829387647425222952585293626473351211161684297351932771462665621764392833122236577353669215833721772482863775629244619639234636853267934895783891823877845198326665728659328729472456175285229681244974389248235457688922179237895954959228638193933854787917647154837695422429184757725387589969781672596568421191236374563718951738499591454571728641951699981615249635314789251239677393251756396"

say (str.split input, "")

Smaller strings are split correctly, so I'm guessing it's something to do with the length of the string.

Expand sink so a single script can provide all resources/behavior for a game

Toying with the idea that a single compiled sink script could provide all the resources/behavior of a game.

There are a few things I would need:

~~Metadata; see: #29~~
~~Tasks; see: #28~~
Embed; see #11
(optional) Compressed strings (for embed, but for any large strings, really)

Given these features, I could completely eliminate the idea of the host environment requiring a file system -- everything could be driven by a single sink binary file.

Save/load bytecode to disk

Need to serialize and deserialize the program data structure, and validate the program using the validator.

Change TS to use async/await entirely

It's very annoying that the codebase has checkPromise and isPromise exposed to the end user.

Another idea is to have async/sync versions of the same function. For example,

export interface inc_st {
  f_fstype: fstype_f; // always returns a Promise
  f_fstypeSync: fstypeSync_f; // always returns results
  f_fsread: fsread_f; // always returns a Promise
  f_fsreadSync: fsreadSync_f; // always returns results
  user?: any;
}

This way a user always knows what they're getting back. Either a result (using Sync versions), or a Promise (normal versions). There's no need for checkPromise or isPromise.

The only downside is that native functions would have to be implemented in both ways. If you implement a Sync version, that's easily convertible to async, but if you only have async, then requesting ctx_runSync would have to abort in error.

I don't think that's that bad.

It pushes the annoying distinction off to the library implementer, instead of the end user.

(JavaScript) Deal with `fsread` returning a Promise

Search the JavaScript source for fsread, and you'll see a bunch of spots where I need to Promise-ify the code.

Heredocs

I think I want to include heredocs... the format should be similar to Markdown:

test ```
    one
    two
    three
    ```

It should start with 3 or more backticks, followed by a newline (or EOL comment), followed by the (possibly tabbed) data, followed by closing backticks of the same count.

The amount of tabs to remove should be determined by how much the closing fence is tabbed. I.e.,

test ```
        one
        two
        three
    ```

This should only remove four spaces from each line, because the closing fence is 4 spaces over.

Create a `range` command

The range command should generate a list of numbers. Special logic should exist to detect for var v: range foo to skip creating the list.

Optimization: static conditionals should modify code

Something like this:

if 1
    say 'hi'
end

Should transform to:

do
    say 'hi'
end

I would just do this now, but there is additional complexity dealing with elseif and else.