tweag / asterius Goto Github PK

View Code? Open in Web Editor NEW

2.0K 72.0 57.0 90.24 MB

DEPRECATED in favor of ghc wasm backend, see https://www.tweag.io/blog/2022-11-22-wasm-backend-merged-in-ghc

Haskell 81.44% C 4.08% Shell 0.33% JavaScript 12.19% Python 0.39% HTML 0.23% Nix 0.77% Starlark 0.56%

webassembly haskell compiler ghc

asterius's People

Contributors

Stargazers

Watchers

asterius's Issues

Proper narrowing for W8/W16 integers

The cmm language has W8/W16/W32/W64 integers, but wasm32 only supports Int32/Int64. Currently the codegen deals with W8/W16 operands in a pretty ad hoc way: it pretends they're all W32. This is problematic when emitting Load/Store instructions, since the unused bits are likely to be corrupted.

A naive fix will cause code bloating in the codegen. Some degree of refactoring (related to type inference for CmmExpr, marshalCmmExpr, etc) is required before fixing this without growing another several hundreds of LoC.

Investigate unaligned load/store instructions in output wasm binary code

In current code generator, when we perform a cmm load/store, we emit one wasm load/store instruction, and memarg has offset and align all set to 0, allowing potentially unaligned load/store at arbitrary address (which is permitted by WebAssembly spec).

To my amazement (or horror), when checking output binary code by the previous binaryen backend, I found the memarg.align were actually set to 3, which leads to unspecified behavior when an unaligned load/store occurs. I might have misused BinaryenLoad/BinaryenStore, and if the binaryen C API's align means something different from memarg.align in WebAssembly spec, documentation should improve a bit on that end.

Investigating fix & impact of this problem right now.

Implement decoupled GHC API for asterius

Currently, we have ghc-toolkit to hijack a part of ghc's post-Core pipeline, and it's implemented by pasting & adapting a part of ghc. We are not fully in control of ghc behavior if we must have a transitive dependency of the host ghc's ghc-the-package.

We need to implement a decoupled GHC API for asterius, satisfying the following goals:

Given the input of a ghc tree, configure options and target package database, builds a Stage-1 ghc into that database
The ghc-* packages are renamed to prevent collision with their counterparts in the host ghc's global package database
Other than building & installing, it supports only building non-Haskell parts (e.g. genprimopcode) then loading the Haskell parts via a ghci session. In that way it'll be possible to use a ghci-based IDE to hack ghc code

How to use it?

Could you please add more instruction in your README (i'm using the docker version).

Thanks!

Investigate a weird input case which crashes node v8-canary

Input (stripped down fib):

foreign import ccall unsafe "print_f64" print_f64 :: Double -> IO ()

main :: IO ()
main = print_f64 $ cos 0.5

This works fine with a release version of node.js, but crashes node.js v8-canary version with exitcode -11. Interesting.

Efficient data marshaling between Haskell/JavaScript

Directly related: #23, #40

The current method of moving a piece of data between Haskell(WebAssembly)/JavaScript is both slow and troublesome:

The set of JSFFI basic types is pretty limited (mainly to make GHC's typechecker happy, we don't have our own yet). So there comes the extra overhead of adding a StablePtr/JSRef entry whenever moving anything that isn't a simple unboxed value.
When the marshaling requires traversal of that value, the "driver" code is written in Haskell (e.g. using foldl' over a String). Even if GHC manages to compile the "driver" to fast worker/wrapper pairs with tight inner loops, there's still overhead.
Imported JavaScript functions are called a lot; there's overhead in crossing wasm/js boundary, and we'd like to call them only a few times with every conversion.
The conversion functions aren't currently compiled & tested in the boot libs; they're pasted into every input project as a home module. Seriously!?

When implementing marshaling logic for bytestring, the above problems rose again, so time to improve the marshaling story, for bytestring, regular Strings, or any other data structure:

The interfaces of the data marshaling functions in Haskell should be in a standalone module in a boot lib, and can be imported into any input home module of ahc-link. We probably should add something like asterius-prim package, similar to ghcjs-prim.
The driver code should really be in JavaScript/WebAssembly, and directly accesses the Haskell heap, abusing knowledge about heap object layouts. Take JSString -> String as an example; previously we issue a JavaScript call when moving every code point to Haskell, but we really should just directly build the fully-evaluated Haskell string on the heap and return it.
[HARD] When passing data into Haskell, we should avoid allocating a StablePtr; it's going to be dereferenced and freed soon anyway. This indicates something like foreign import javascript "f()" f :: SomeHaskellType -> IO SomeHaskellType is possible.

Improve JavaScript generation of ahc-link

Currently, when we need custom behavior of generated wrapper .js files, we can only use the --asterius-instance-callback flag to supply a JavaScript callback which takes an already initialized instance as a parameter. We really should provide a more user-friendly interface.

Some requirements to keep in mind here:

Multiple .js scripts generated from different ahc-link runs may co-exist in a page. Make sure we don't accidentally pollute the global scope.
Compilation/instantiation of an asterius instance should be separated. There're chances to reuse stateless compiled results (wasm modules support structured cloning), e.g. sending to a Worker, storing in IndexedDB, etc.
If we use ES6 modules here, we must implement a fallback flag to generate a "self-contained" .js file.

Help exporting hello_world function from Haskell to JavaScript

First of all, thanks for this great initiative. There is no good compiler of any functionnal language to Wasm and it will be great to have one for Haskell. Unfortunately, I'm still very noob with Haskell, compilers, and all but I try to digg into Wasm. Still, I succeed in using Asterius through Docker, thanks again.

I'm trying to export from Haskell an hello_world function which takes a string as only argument and return an other one. The mult example worked fine. Unfortunately exporting a function which takes a String in arguments always failed, whatever the workaround I tried. I succeed to make it works returning a String but not receiving one.

There is the code I got:

toJSString :: String -> JSRef
toJSString = foldl (\s c -> js_concat s (js_string_fromchar c)) js_string_empty

fromJSString :: JSRef -> String
fromJSString s = [js_string_tochar s i | i <- [0 .. js_length s - 1]]

foreign import javascript "\"\"" js_string_empty
  :: JSRef

foreign import javascript "${1}.concat(${2})" js_concat
  :: JSRef -> JSRef -> JSRef

foreign import javascript "${1}.length" js_length
  :: JSRef -> Int

foreign import javascript "String.fromCodePoint(${1})" js_string_fromchar
  :: Char -> JSRef

foreign import javascript "${1}.codePointAt(${2})" js_string_tochar
  :: JSRef -> Int -> Char

foreign import javascript "console.log(${1})" js_print :: JSRef -> IO ()

foreign export javascript "hello_world" hello_world :: JSRef -> JSRef

hello_world :: JSRef -> JSRef
hello_world name = toJSString ("Hello, " ++ (fromJSString name))

main :: IO ()
main = do js_print 42

But it fails with:

root@b31716e46fdd:~/2048# ahc-link --input mult.hs --export-function=hello_world --asterius-instance-callback=cb
[INFO] Loading boot library store from "/root/.stack-work/install/x86_64-linux/ghc-8.7/8.7.20181027/share/x86_64-linux-ghc-8.7.20181027/asterius-0.0.1/.boot/asterius_lib/asterius_store"
[INFO] Populating the store with builtin routines
[INFO] Compiling mult.hs to Cmm
[INFO] Marshalling from Cmm to WebAssembly
[INFO] Marshalling "Main" from Cmm to WebAssembly
[INFO] Attempting to link into a standalone WebAssembly module
ahc-link: No match in record selector hsTyCon

I tried importing JavaScript callbacks to change signatures, I succeed to pass the compilation step but never succeed to have my String in argument.

Fix leaking `Unique` into serialized object files

Some Uniques are leaked into current Asterius.CodeGen output modules, which is highly dangerous. A proper fix requires a careful redesign of UnresolvedSymbol, make it contain source module and namespace.

Proper Node.js/Haskell communication

Currently, ahc-link has a --run flag which runs generated stub js to load compiled wasm code, given the compilation target is Node.js. The flag is used by all unit tests. And that was the only type of Node.js/Haskell interaction we have: only node exit code could be automatically checked, and one needed to manually read/grep the debug logs. (There could be golden tests to check textual outputs though, but that's still far from satisfaction)

This is the tracking issue for proper Node.js/Haskell communication. Short-term objectives:

Enable passing structured data between node process and asterius processes (ahc-link, unit tests, etc)
The unit tests properly check their outputs. Not by comparing text.
The asterius rts emits structured debug log entries, which can be inspected in asterius.

The heavy work is already mostly finished in the new generation of inline-js, the updated readme page of inline-js explains the motivation behind the rewrite in details.

This work will also be the keystone of future TH/GHCi/Plugins support.

Improve memory trap of debugging mode

We already have a memory trap enabled when ahc-link --debug is enabled, which captures all Load/Store instructions and check for invalid pointers. They have proven to be extremely useful in the arduous debugging marathons.

A current con of memory trap: it's a wired-in function in Asterius.Builtins and only checks for a few known invalid memory segments and each check involves super high overhead by a long chain of nested pointer comparisons. The improved memory trap function should:

Be generated after symbol resolution is complete. We have complete knowledge of what segments in the linear memory are valid and what are not.
Implement read-only segments, mainly for info tables.
Use an efficient data structure (e.g. binary trie) to determine invalid address.

Implement "foreign import javascript wrapper" notion for creating JavaScript callbacks from Haskell closures

Note: this is a medium-priority issue which has a significant influence on user experience. Will be picked up after next announcement.

To convert a dynamically created Haskell closure to a JavaScript callback, currently we rely on two canned rts magic functions: makeHaskellCallback/makeHaskellCallback1, limiting supported Haskell closure type to IO () and JSVal -> IO (). They surely cover a lot of use cases, but we really should support Haskell functions of any arity. The standard C FFI mechanism already has a "foreign import wrapper" notion for this:

foreign import ccall "wrapper" some_callback_factory :: ft -> IO (FunPtr ft)

Where ft can be anything valid in a vanilla foreign import ccall signature. We should steal this syntax to allow users to create a JavaScript callback from Haskell function closures of any arity.

How to implement:

Rewrite all foreign import javascript "wrapper" clauses in our JSFFI preprocessor. This is harder compared to vanilla foreign import javascript; When desugaring, GHC emits code which assumes createAdjustor in RTS to do fancy pointer hacking but we don't really want that, so we need extra logic to handle that. Also FunPtr doesn't make any sense; we want to fetch a JSFunction instead.
Change the serialized JSFFI info to include all "dynamic" foreign type signatures in the current module.
In the linker, dynamically generate all js/wasm magic functions for all collected "dynamic" callback factories.

Implement support for C structs in rts

The cmm code emitted by ghc assumes knowledge of several C structs in rts, most notably:

BaseReg
bdescr
StgTSO
StgStack
Capability

and so on. The asterius codegen needs built-in support for these structs, including knowledge of:

Size and alignment of the struct/every field
Whether it's needed to fake values of certain fields

Contribution - beginner friendly issues, guide etc.

I'm looking to contribute to this project.

Improve linker performance

The linker is undoubtedly the performance bottleneck right now, and a huge nuisance when linking large modules. Improvement is not hard though. General principle: avoid full AST traversals/rewritings as hard as we can; cache more info in the store to trade space for time.

How to use ahc-link and compiled output?

Hi, this is a bit of an echo of #1 but I'm pretty sure I followed the docs as well as I could. spoiler: I didn't

As a user of Haste, I'm excited to try using Asterius in my personal project. Haste is a great tool but has issues with stack overflows in loops which have a bad outlook (I think the project might be dead). A year later I ran into the same Haste issue porting my Haskell project to the browser.

Anyway, I followed Tweag blog post and made a file essentially identical to the "JS calling HS" section: (I need main otherwise GHC complains. If I replace return () with a hello world, node fails to compile the .wasm, but maybe that's another issue?)

foreign export javascript "mult_hs" (*) :: Int -> Int -> Int
main = return ()

Then I ran

ahc-link --input browser.hs --asterius-instance-callback='i => {
    i.wasmInstance.exports.hs_init();
    console.log(i.wasmInstance.exports.mult_hs(6, 7));
}'

which produced a file with the expected callback. However, running it in node I see i.wasmInstance.exports doesn't include the foreign exports I specified. I get this error: TypeError: i.wasmInstance.exports.mult_hs is not a function

I'm using the latest docker image.

Fix broken `stg_enter_ret` and `stg_ap_0_fast`

Right now, the codegen outputs broken code for stg_enter_ret, which makes stg_returnToStackTop accidently ignores Main_main_closure on the stack. We hardcoded a version of stg_enter_ret that's known to work in Asterius.Builtins, but the hack doesn't make the underlying issue go away.

Another potentially broken function is stg_ap_0_fast, it pre-maturely jumps to 0, which aborts execution without even entering StgReturn. This error often shows up when compiling code related to GHC.List.

Fix Cmm narrowing operators for `Int8#`/`Word8#`

Note: Int8#/Word8# isn't widely used in the boot libs yet, so this is a medium-priority issue and will be checked only after #44

Cmm has narrowing operators MO_SS_Conv/MO_UU_Conv which narrows a 64-bit Int#/Word# to 32-bit or lower, while still using a 64-bit local/global register for storage. The codegen already handles 64-to-32 narrowing, but the newly added Int8#/Word8# requires more aggressive 64-to-8 narrowing. Note that wasm doesn't have native opcodes for this purpose, so we need a pair of store/load opcodes here. When we later Workerify the runtime, we also need extra care not to introduce a race condition since it involves a pinned temporary memory region.

Implement the "persistent vault" feature

Note: this feature is not provided by the original ghc runtime. I intend to implement it since it has a reasonable difficulty level, and is also useful for certain use cases, see explanations below.

Motivation

The typical method of using asterius is: compile some .hs files to a .wasm/.js, initiate an asterius "instance", then call exported functions from that instance from the regular js world. What do we do in case an unrecoverable error pops up? (e.g. failing to grow the linear memory and allocate new blocks)

A naive solution would be: wipe the instance completely, initiate another one and start from ground zero. This surely works if whatever exported function we're calling is stateless. But what if we'd like the function to share some state across invocations, even when an instance is wiped and rebooted? Between the old & new instance, something must be transferred, and it should be accessible in the Haskell world as well.

Interface

Let's abstract all the persistent state across different instances to a single "vault". The initialization of an asterius instance can either initiate an empty vault or take an existing one. The global vault can be accessed by all Haskell code currently being executed in that instance.

What can be saved in a vault? Arbitrary Haskell closures isn't an option since the linear memory will be wiped, and if we implement closure copying, then we already have a gc, making vaults a lot less useful. Nor compact regions, since we haven't really tested if the Compact# related primops work for our custom sm (I suspect it'll take a lot of work). However, saving Haskell values which are explicitly serializable via bytestring is possible, this makes the vault effectively a mutable KV store (making it immutable is possible, but not worth the efforts). Another thing which can be saved: a JSRef table which stores JS values "accessible" in Haskell.

These two can actually be unified: when saving a Haskell state, we don't enforce a Binary constraint or whatever, so users aren't tied to a specific serialization framework, all keys & values are explicitly a ByteString; and a ByteString can be easily converted from/to an ArrayBuffer, so we need only to implement the second kind of vault: a JSRef table.

The Haskell interface is something like:

type Key = JSArrayBuffer
type Value = JSVal

vaultInsert :: Key -> Value -> IO ()
vaultLookup :: Key -> IO (Maybe Value)
vaultDelete :: Key -> IO ()

Implement missing WebAssembly shims for rts routines

Codegen from pure Haskell to binaryen IR is mostly complete, but when attempting to link things into a standalone module, we encounter missing rts routines.

The logic of linking is in Asterius.LinkStart. Given a Haskell program and the "root symbols" to look at, it traverses the symbol database and attempt to collect all relevant data sections/programs. The asterius:link-fact test suite performs linking for a simple program that calculates factorial, then link and output result (or errors).

Here's the factorial program:

module Fact where

fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n - 1)

facts :: [Int]
facts = scanl (*) 1 [1 ..]

root :: Int
root = fact 5

Ideally we can use Fact_root_closure as the root symbol, and after generating a standalone module we can evaluate it and observe how the linear memory is mutated. The error message (already reduced a lot after patching rts):

link-fact.exe: ChaseException {unseen = fromList ["allocBlock_lock"], unavailable = fromList []}

What if we introduce a dep on facts? Plain old Prelude stuff isn't it? Well..let's try root = fact 5 + facts !! 5:

link-fact.exe: ChaseException {unseen = fromList ["stg_ap_pppv_info","u_towtitle","stg_ap_v_fast","runCFinalizers","stg_ap_p_fast","isDoubleNaN","isDoubleInfinite","u_iswspace","memmove","isDoubleNegativeZero","u_iswalnum","stg_ap_pppp_fast","stg_ap_ppp_info","isFloatNaN","u_iswalpha","stg_ap_pv_fast","free","stg_ap_v_info","memcpy","__hsbase_MD5Update","stg_ap_ppv_fast","malloc","__hsbase_MD5Final","u_towlower","dirty_MUT_VAR","realloc","calloc","stg_ap_ppp_fast","stg_ap_pp_info","__decodeDouble_2Int","__word_encodeFloat","__decodeFloat_Int","stg_ap_pp_fast","stg_ap_p_info","__word_encodeDouble","allocBlock_lock","__hsbase_MD5Init","isFloatNegativeZero","u_towupper","u_gencat","isFloatInfinite","barf"], unavailable = fromList ["stg_getMaskingStatezh","stg_maskAsyncExceptionszh",".Lc2ulW","s6LoP_entry",".Lc1CN5","r2o2_entry","stg_newByteArrayzh",".Lc1CSr",".Lc2vvd",".Lc6MXu",".Lc4tt",".LcuB","s6Lpy_entry",".Lc1ep",".Lc1CPo","stg_unmaskAsyncExceptionszh","stg_raisezh",".Lc2vGs",".Lc2uuv",".Lc1CQQ",".Lc1F79",".Lc1CUz",".Lc2ukg",".Lc2usO",".Lc1CKQ",".Lcvz",".Lc2vTw",".Lc1CLF",".Lc1F8W",".Lc1EAW",".Lc1Fep","base_GHCziStackziCCS_currentCallStack1_entry","stg_newPinnedByteArrayzh",".Lc1Fcq",".Lc1CK4",".Lc4v1",".Lc5R8Q","r2o1_entry","stg_maskUninterruptiblezh",".Lc1CTx",".Lc6MVJ",".Lc1Fax"]}

So, right now the bottleneck of this project is implementing WebAssembly shims for missing bits and pieces in rts, which are mostly related to the storage manager.

Some approaches now being taken:

Comment out threads/GC related logic in the .cmm files of rts, since there's no threads and no GC for the prototype.
In Asterius.CodeGen, directly shadow the invocations to stg_gc* Cmm functions and replace with WebAssembly unreachables.
Write the shims.

Handle JavaScript FFI syntactic sugar in renamer/typechecker

As a sub-issue of #23.

The current way of handling foreign import javascript/foreign export javascript is a bit silly: we process the parsed AST, recognize JSFFI basic types by string, and after converting to our own FFIMarshalState which will later be used at link-time, we rewrite JavaScript FFI to C FFI, so ghc will happily typecheck them and generate relevant cmm code.

The disadvantages are very obvious:

Users cannot properly define newtypes to existing JSFFI basic types since we don't recognize them, and newtypes is sort of a standard practice when Haskellers work with conventional C FFI.
When we invoke ahc to compile boot libraries using Cabal, if Haskell modules contain JSFFI declarations, linking will fail because the symbols aren't really in any native object file. So, when we need something like asterius-prim which contain common utilities for Haskell/JavaScript interop, it has to be shipped as a home module pasted into every project, instead as a pre-compiled boot package.

There is only one advantage: working with parsed AST is easy, since it's totally context-free, no loading of dependent modules is required, not do we need to learn about TcRnMonad and such.

We should really handle the syntactic sugar in the renamer/typechecker instead, and do better than disguising JSFFI as C FFI in order to make ghc codegen happy. ghcjs implements and uses hooks to type checking foreign decls, and that's a good starting point.

Fix incorrect `fromJSString` when two UTF-16 units are required for one code point

Directly related: #43

The current behavior of fromJSString (either the strict version in runtime or lazy version in pasted AsteriusPrim modules) is wrong when a code point exceeds 65535. Reason: I thought String.prototype.codePointAt takes a code point index, but the index is really for UTF-16 code units, so is String.prototype.length.

Investigate broken `stg_ap_v_ret`

Currently the MVP example works fine on Windows, but on Linux there is a high chance of broken stg_ap_v_ret which triggers a barf call. The bug is indeterministic; a successful build is seen at https://circleci.com/gh/tweag/asterius/1070.

Linker error regarding `stg_enter_info`

When stg_stop_thread_info is introduced as a dependency, the linker complains that stg_enter_info is not found. However, stg_enter_info and stg_enter_ret both compiles fine in HeapStackCheck.cmm.

Printing each iteration of linker dependency analysis starting from the direct parent node of stg_enter_info yields:

(fromList ["stg_stop_thread_ret"],ChaseResult {directDepBy = fromList [], statusMap = fromList [(Unfound,fromList []),(Unavailable,fromList []),(Available (),fromList [])]})
(fromList ["_asterius_TSO","stg_enter_info","StgReturn"],ChaseResult {directDepBy = fromList [("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList []),(Unavailable,fromList []),(Available (),fromList ["stg_stop_thread_ret"])]})
(fromList ["stg_TSO_info","_asterius_Stack"],ChaseResult {directDepBy = fromList [("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList
["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("_asterius_Stack",fromList ["_asterius_TSO"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["_asterius_TSO","stg_stop_thread_ret","StgReturn"])]})
(fromList ["stg_STACK_info","stg_TSO_entry","_asterius_Stack"],ChaseResult {directDepBy = fromList [("stg_STACK_info",fromList ["_asterius_Stack"]),("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("stg_TSO_entry",fromList ["stg_TSO_info"]),("_asterius_Stack",fromList ["_asterius_TSO","_asterius_Stack"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["_asterius_TSO","stg_stop_thread_ret","stg_TSO_info","_asterius_Stack","StgReturn"])]})
(fromList ["stg_STACK_info","stg_STACK_entry","stg_TSO_entry"],ChaseResult {directDepBy = fromList [("stg_STACK_info",fromList ["_asterius_Stack"]),("stg_STACK_entry",fromList ["stg_STACK_info"]),("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("stg_TSO_entry",fromList ["stg_TSO_info","stg_TSO_entry"]),("_asterius_Stack",fromList ["_asterius_TSO","_asterius_Stack"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["stg_STACK_info","_asterius_TSO","stg_stop_thread_ret","stg_TSO_info","stg_TSO_entry","_asterius_Stack","StgReturn"])]})
(fromList ["stg_STACK_entry"],ChaseResult {directDepBy = fromList [("stg_STACK_info",fromList ["_asterius_Stack"]),("stg_STACK_entry",fromList ["stg_STACK_info","stg_STACK_entry"]),("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("stg_TSO_entry",fromList ["stg_TSO_info","stg_TSO_entry"]),("_asterius_Stack",fromList ["_asterius_TSO","_asterius_Stack"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["stg_STACK_info","stg_STACK_entry","_asterius_TSO","stg_stop_thread_ret","stg_TSO_info","stg_TSO_entry","_asterius_Stack","StgReturn"])]})
ChaseResult {directDepBy = fromList [("stg_STACK_info",fromList ["_asterius_Stack"]),("stg_STACK_entry",fromList ["stg_STACK_info","stg_STACK_entry"]),("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("stg_TSO_entry",fromList ["stg_TSO_info","stg_TSO_entry"]),("_asterius_Stack",fromList ["_asterius_TSO","_asterius_Stack"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["stg_STACK_info","stg_STACK_entry","_asterius_TSO","stg_stop_thread_ret","stg_TSO_info","stg_TSO_entry","_asterius_Stack","StgReturn"])]}

So stg_enter_info is mistakenly marked as Unfound since the 3rd iteration, indicating a hidden bug inside the dependency analyzer. This is a major blocker for M1.

Improve documentation for developers

Currently, we have out-of-line docs which are built on CI and serve as a wiki for devs, and other than that we don't have haddock or ghc-style source notes. This will probably drive away some potential contributors, given the growing size of the codebase.

Another concern is: as the main dev of asterius, I'm not dogfooding the docs enough; when I can't recall some implementation details, I usually navigate to relevant code and browse the implementation quickly. Ideally, I should be looking at comments or docs instead.

We should improve docs for developers; the objective is:

For newcomers looking for a low hanging fruit, there should be some high-level overview sections which give a clear explanation of basic ideas and demonstrate how to get hands wet; this is the "how" part.
For colleagues or me, there should be explanatory sections which explain design choices, compare different ways out and their tradeoffs, etc. This is the "why" part.

Efforts have begun on this front:

To utilize fragmented time, I'm adding new documentation sections (see ir.md for example)
I'm also keeping an eye on my regular workflow. When I run into a cache miss in my brain, I'll record it and see if it indicates some flaw in current documentation.

Data.ByteString support in Asterius

It would be nice to develop at least parts of Data.ByteString API in Asterius.
In future maybe merge with main.

Utilize `memarg.offset` in load/store instructions

All load/store instructions in WebAssembly takes a static memarg intermediate which can specify align/offset. We used neither in the past and whenever we accessed a base address plus an offset, we performed a runtime addition first. This was handy for developing "memory trap" for the debug mode since we only needed to care about the address.

However, in almost all cases, the offset is a statically known constant, and by utilizing memarg.offset we can reduce code size and also get rid of some unnecessary runtime overhead. We just need to make the memory trap take an extra offset parameter and take that into account.

This is a collaborative issue with #26.

Implement proper exception handling

Status quo of exceptions: When a Haskell exception is thrown, the scheduler returns with an error code, which is later checked by rts_checkSchedStatus and re-thrown as a JavaScript exception. We don't really know what is thrown; and when rts_checkSchedStatus finds out the Haskell thread hasn't exited gracefully, there's no chance to get back to the crime scene and fix stuff. Besides Haskell exceptions, there are also errors signaled in the runtime itself, and the same problems apply as well.

We definitely need to improve the exception story. Types of exceptions I can come up at the moment:

Asynchronous exceptions: there's really no need to consider async exceptions at the moment.
Synchronous exceptions:
- Explicit throw/throwIO calls in Haskell
- Heap/stack check fails
Runtime errors which aren't re-wrapped as a Haskell exception:
- Failure to grow linear memory and allocate blocks
- Encountered unimplemented stub rts interfaces
- JavaScript exceptions thrown from foreign import javascript code, or rts.js code
- Fatal errors signaled by barf() in rts cmm code

All possible exceptions can be roughly grouped into three kinds:

Can be caught and handled in Haskell.
Can't be handled in Haskell, but when signaled, can be handled by JavaScript and carry on Haskell execution.
Fatal enough to either log & crash, or wipe the current runtime instance & restart.

The key to handle all three kinds of exceptions is improving the current one-shot scheduler. When an StgRun loop yields back to scheduleWaitThread, we need some (potentially async) JavaScript logic to check and fix stuff, then re-enter the loop and get back to Haskell execution.

A rough roadmap for this issue:

Improve scheduleWaitThread, so Haskell execution may be automatically resumed for simple cases like heap/stack check failures. The storage manager won't need to allocate a large fixed-size nursery/object pool; it can take advantage of the recently implemented fast block allocator and only request as many blocks as needed. Also, the createThread wrapper functions perform extra bookkeeping, so when the thread is created from a static closure and a fatal error is signaled, it's possible to simply re-initiate a new instance and restart execution from there. This one will be delivered this week.
Refactor the throw/catch primitives in PrimOps.cmm/Exception.cmm to fit the new scheduler, enabling exceptions to be caught and handled in Haskell.
Well, let's first see if 1 & 2 work as expected.

Avoid the need for a custom-built GHC

The docs currenty state

asterius requires a custom ghc which:

Disables TABLES_NEXT_TO_CODE. It's hard to attach executable code to an info table on the WebAssembly platform.

Uses integer-simple instead of integer-gmp. Porting integer-gmp to WebAssembly requires extra work and is not currently scheduled.

In the long run it would probably be desirable to make asterius compile against ghc-the-library as is. This may require changes to GHC, in particular, to make these two aspects configurable at run-time (e.g. part of the DynFlags) rather than a compile-time option.

How much have you investigated if that is possible? I am happy to review a patch to GHC that would enable that…

Fix runtime error related to card table in `StgMutArrPtrs`

We located one place in the ghc codegen which leads to runtime errors which may be triggered when an StgMutArrPtrs object is allocated in the nursery and multiple writeArray# instructions follow.

The relevant part in ghc: https://github.com/TerrorJack/ghc/blob/5c4b51195b408e9636f8b6b7cd7af50f81e61bd5/compiler/codeGen/StgCmmPrim.hs#L1587. There's likely nothing wrong with the ghc part (otherwise it'll be an obvious memleak).

Support tables-next-to-code

Currently asterius assumes the host ghc is built with tables-next-to-code switched off, however it has been breaking recent builds of ghc-head, forcing us to stick to a particular revision at ghc-8.5.20180413. Rather than taking time to investigate the ghc building issue, we can instead choose to implement support for table-next-to-code in the host ghc before proceeding.

Prune unwanted `Typeable` related bindings

GHC automatically generates Typeable related bindings in the closures for user-defined datatypes. This has the consequence of rapidly bloating the range of related rts routines and causing the linker to fail, even if we don't need anything related to Typeable at all.

We need to identify and prune Typeable related bindings generated in TcTypeable of ghc, either in the stage of codegen or linking.

Implement opaque garbage-collected references in Cmm

Background:

Currently, asterius intercepts raw Cmm from ghc's regular native codegen pipeline and compiles to wasm from that. It means we're working with something which:

Already CPS'ed and all calls become tail calls. Parameters are passed via cmm global regs like R1, or on the stack, instead of being passed via wasm's native function calls.
Besides symbols (which will resolve to addresses at link-time), we only know about bit-patterns and raw pointers. The info tables generated during StgCmm are already serialized.
Assumes the existence and semantics of a lot of rts C APIs. In this case, the block allocator, the hierarchy of mblocks/blocks, the nursery/object pools, and structs like Capability/StgTSO/InCall/Tasks and functions operating on them are the things we're mainly concerned about.

To target a garbage-collected platform (in our case, post-MVP WebAssembly, but lots of other platforms too), one should either resort to compiling from STG, or add the feature of opaque garbage-collected references in Cmm. The minimal requirements here are:

We need gc reference types in the Cmm type system. A gc reference is completely opaque and its bit pattern is hidden, there exist no reinterpret-casts between them and regular machine words.
There exist gc arrays and structs, which can be accessed via/populated by gc references. Structs are for simple STG heap objects with a fixed schema defined in rts/storage/Closures.h, and arrays are needed for stuff like Array#.
Given requirements of 1 and 2, allocation of a gc object is completely decoupled from the linear memory; no heap checks, no bumping of Hp pointers, no branching to stg_gc* routines, etc. Allocating a gc object is done via wired-in instructions.
To support additional stuff like Weak#, there should be a notion of "finalizer" for gc objects. What kind of "finalizer" to support, however, is left as an open problem as of now.

Sadly, the ghc codegen and rts are currently very tightly coupled here. Although Cmm does have some notion of gc-ed pointers (one can distinguish gc-ed pointers and regular bits from CmmType, which is inferrable from any CmmExpr), it doesn't factor out the treatment of gc objects, instead, gc references are plain pointers, everything happens in the linear memory, and the rts move objects around and rewrite pointers when the gc routines are entered.

It would be a shame to ditch the current Cmm-based codegen, and we should seek to enhance Cmm & the ghc codegen to achieve our goals.

Objectives:

Add extra constructs in Cmm IR to represent allocation/load/store of garbage-collected data which happens outside the regular memory model. We call this extended Cmm language "garbage collected Cmm".
Add some extensibility in the ghc codegen. There are STG-to-Cmm and Cmm-to-raw-Cmm passes, and there should be some passes somewhere higher-level here, which generates garbage collected Cmm first. Perhaps we can implement something similar to Core Plugins which allows inserting/modifying the codegen passes, so we can filter out passes we don't need and add passes we're interested in. (Comment from @mchakravarty : CoreTodos existed before core plugins, so we might only need to implement something like CmmTodos without a new plugin mechanism, and passes may be adjusted based on command line arguments)
The performance of regular generated native code shouldn't see a visible impact. During native code generation, garbage collected allocation/load/store operations of Cmm should quickly rewrite to ordinary Cmm which makes allocations happen on the regular rts heap.

Possible steps:

Implement "garbage collected Cmm" in ghc, which satisfies basic requirements 1,2,3. This requires modifications to both the Cmm IR types and the codegen logic. This is the hardest part.
Simulate a no-op gc based on WebAssembly gc proposal, since the gc references are completely opaque and can be implemented in lots of possible ways, we can simulate a table containing anyrefs, and simulate anyrefs with regular pointers.
At this point, we may be missing some features originally provided by ghc's original rts (e.g. finalizers), so we refactor our prototype, and help improve the WebAssembly gc proposal during the process.
Even in the absense of WebAssembly gc feature in V8, it still is much easier to roll our own, given the efforts in the previous steps, since there's no need to obey the rts storage manager interfaces.

Proposed extensions to Cmm language as implemented in current GHC:

In CmmType, promote GcPtrCat to be a standalone constructor of CmmType. It means a garbage collected reference is no longer tied to a machine word (so we don't know its Width). To prevent breakage with current code, we may also add a new constructor representing opaque references instead of moving GcPtrCat out.
In CmmExpr, we add CmmGcLoad constructor, to represent loading from a garbage collected reference. We also need to extend CmmRegs, because general registers like R1 can't hold those references, so we need additional register types to hold them.
In CmmNode, we add CmmGcStore and CmmGcAlloc. CmmGcStore represents storing to a reference, and CmmGcAlloc represents allocating a gc struct/array. We don't need to generate heap/stack checks when emitting CmmGcAlloc code.
The newly added CmmGcLoad/CmmGcStore/CmmGcAlloc constructs are all accompanied by a "schema"; the "schema" comes from info tables.
The above are required extensions in the AST of Cmm. We also need some changes in the Cmm pipeline itself.

Q & A:

Why not just cross-compile rts? We already have emscripten.

The pros & cons of cross-compiled rts versus custom rts is worthy of another long writing to discuss, so I'll just make one thing clear: this is about a matter of choices. Implementing this feature grants us the extra power of basing generated code on host platform's gc functionality, but we can still cross-compile the original rts (albeit heavily patched), or even roll our own and compare different gc strategies/workloads.

When will this get finished?

Since that quite some additional research on the codegen & rts need to be done, this issue is currently a "20% time" one; some fragmented time will be invested into small experiments, and findings will be recorded here, but once we gain a clearer understanding of ghc backend's big picture, we will put up a concrete roadmap and begin working on this front.

Related readings:
Commentary/Rts/Storage - GHC

Obtain hidden sizes & offsets for rts C structs

When implementing WebAssembly versions of rts C functions, some size & offset constants are not present in DynFlags (or worse, DerivedConstants.h). For instance:

No constants for StgIndStatic is available
SIZEOF macros only exist for a few types
Some OFFSET macros are obviously wrong, e.g. #define OFFSET_StgInd_indirectee 0

We need to find a way to calculate the constants in a portable manner. Different rts ways may yield different constants. We only care about the vanilla way for now.

Implement memory traps for debug mode

In debug mode, we wish to mark certain regions in the static data segments as "trapped", so any load/store instructions involved with those regions will immediately trap. The trapped regions are mostly uninitialized fields of singleton structs like MainCapability or RTS_FLAGS, and since we didn't properly initialize them in init_rts_asterius, reading/writing those fields is undefined behavior.

Implement proper boot package database

Currently, when performing booting, we merely run Setup configure and Setup build for boot packages, collect wasm IR and throw away other outputs like .hi/.o files. There isn't a separate package database for boot libs. This means that:

When ahc-link invokes GHC API to compile end user's programs, it uses ghc's global package database. We can't patch packages like base to suit our own purposes.
JSFFI can only work by processing parsed AST, before going through renamer/typechecker, since GHC doesn't know what a JSRef really means. And users cannot define newtype wrappers for JSRef.
Weird interface errors can show up when compiling base, see recent breakage on wip-ghc-8.7.

So we shall take some time to properly set up a ghc pkgdb along with our own asterius_store, and make ahc-link use that pkgdb. Some known difficulties:

ahc needs to load a frontend plugin, which itself depends on the global/project pkgdb. There's no way to avoid pollution from those two pkgdbs.

So the real work here is figuring out a way to implement ahc which doesn't rely on GHC's plugin system (regardless of regular/frontend plugin, since they all rely on ghci's mechanism and introduces global/project pkgdb anyway). Luckily our logic are all contained in a Hooks, so it's possible to introduce an in-tree ghc-bin, patch it and add our own Hooks.

Investigate hidden binaryen workarounds

When comparing binary code emitted by binaryen/new backend, I discovered that binaryen doesn't simply do a post-order traversal on the expression IR as expected; there are quite some hidden transformations/workarounds here and there, which are mostly undocumented and there's no way to opt out even when optimize/shrink level is set to 0. A non-comprehensive list including workarounds I discovered so far:

When binaryen adds default passes, some are always added regardless of optimize/shrink level
Not reporting an error on receiving zero alignments, instead, falling back to "natural" alignment depending on byte length without a single warning.
Combining and remapping locals in the code section, so no more than 4 locals buckets appear in a function definition. Non-parameter local indices don't correspond to input expression IR (although there does exist a bijection)
Flattening anonymous blocks.
Emitting unreachable instructions not present in the original IR.

I've patched binaryen and shrank the list above to only contain the last entry. As long as the last piece is fit into the jigsaw puzzle, we'll base everything on the new backend right away.

We're not accusing binaryen's deficiencies here (though the caveats probably should be mentioned in docs); however, this debugging experience does reveal some additional mismatches between what binaryen offers and what we expect:

When we pass expression IR to the serializer, we don't expect ANY additional magic; any optimization is supposed to be done (and can be done) pre-serialization.
The linker/debugger performs aggressive IR rewriting and assumes IR isn't twisted in some way we aren't aware of. There is likely some dangerous chemistry between binaryen's rewriting and our own's.
When we serialize the expression IR, there may be validation errors hiding somewhere, in which case:
- If validation is only done upon real execution: fine.
- If a validation pass exists before serialization: find and report them. It's fine to have some false negatives.
- Status quo: binaryen implements the workaround of inserting the stack-polymorphic unreachable instruction in quite some places. Hand it a mal-formed expression IR, it uses this trick to make the IR automagically pass validation...
- What we expect: If such a trick is really needed, it indicates some fundamental flaw in how we engineer the expression IR. If we don't change the IR, at least the trick should be done pre-serialization.
- Anyway, it's not okay if we thought we generated a well-typed expression but we really didn't, and when we step out the comfort zone binaryen once offered, we're thrown into a chaotic and cold universe.

Errors attempting to reproduce local build

Hello again, I think I might have a PR to fix a bug, but I can't test it because I haven't gotten local builds to work yet.

I'm building with the same script/command as the resolved #28 (except with stack exec). ahc-link will run without issue but the generated wasm fails.

I get [ERROR] Uninitialized memory trapped at 0x00000000 (and subsequently, RuntimeError: unreachable which apparently always happens). --debug says it entered mult_hs and did some work before it failed.

From here I'm not really sure how to debug. Any help would be appreciated!

I'm running this in OpenSUSE Tumbleweed.

Uniform handling of `CurrentNursery` and `CurrentAlloc`

Currently, the storage manager separates the block group for the heap and the group for allocate calls. For simplicity when implementing the MVP, it's favourable to:

Statically allocate and initialize a megablock group. This is handled with the static data relocator in the linker and doesn't involve any initialization code at runtime.
All blocks in the megablock group form one single block group, and Hp/HpLim points to the whole area. The same area is also used for handling allocate/allocatePinned invocations.
The allocBlock*/allocGroup* family of functions are temporarily marked as traps, we assume allocation only takes place by either bumping Hp or invoking allocate/allocatePinned.

It's also easier to grow the allocation area with this uniform handling:

Instead of invoking mmap-like functions when calling alloc_mega_group, we use the grow_memory operator. With a bit of bookkeeping, the linear memory grows with a factor of k, which ensures amortized O(1) when allocating a block group.
Upon an entry of stg_gc*, heap growth is performed.

After fixed size heap is confirmed to work, we'll make the heap growable, and gc comes last.

Separate vanilla/debug "way" in JavaScript runtime

Currently, wasm code generated in debug mode shares the very same runtime script of vanilla mode. As the improvement of debug mode proceeds, the debug runtime script will be even more bloated, which is a huge waste for the vanilla mode.

So there should be two versions of rts.js, as well as JavaScript code generation modes. In the vanilla mode, we can drop the symbol table or error message pool in generated stub scripts (or put it in a separate .json file so not shipping that file won't hurt normal execution, but only results in less informative error messages).

Prototype JavaScript FFI

Right now it's already possible to implement a prototypical JavaScript FFI, without needing to wait for host-bindings or gc proposal of WebAssembly land in V8.

The main trick is: instead of directly passing JavaScript references across WebAssembly/JavaScript boundary, we only pass a handle and the JavaScript runtime maintains the mapping from handles to real objects. The JavaScript runtime and our own storage manager do not collaborate in any way, and manually calling something like freeHaskellFunPtr is required to avoid leaking on the JavaScript side.

A brief list of what work it takes and what we can achieve:

Without touching the type-checking procedure of ghc, we can't have foreign import javascript syntax working in the ghcjs way. Instead, we'll supply a separate file to ahc-link describing what to import, and ahc-link will make the relevant bindings available.
foreign import is relatively easy to implement: we can import any JavaScript object as a handle, and "calling" a JavaScript function is simply sending a handle list to the JavaScript side and the real work happens out of wasm.
foreign export is a lot more difficult since it requires the scheduler to properly support interrupting/resuming Haskell threads.
It's possible to calculate a JavaScript snippet in the wasm world, then using that snippet as the import schema. This requires finishing CString-related utils first.

Some test suites demonstrating importing Web API functions will be implemented.

Improve current AST rewriting mechanism

Currently, we resolve local/global regs by AST rewriting when marshalling a CmmProc, then serialize the AST. During linking, other rewriting passes are carried out, e.g. symbol resolution, debugging, etc.

The current approach is a bit ad hoc: we need to carefully restrict post-serialization rewriting because register allocation is already done when the AST is serialized, so when fresh registers are requested later, we need to hard-wire it in resolveLocalRegs and remember its absolute wasm local id.

The current AST rewriting mechanism needs improving. The basic idea shall be: serialize early & rewrite late. Register allocation shouldn't be performed when marshalling CmmProc.

Improve CI setup

CI setup is a bit manual and ad-hoc currently. To improve the situation the following steps are proposed:

Asterius currently needs GHC compiled with a custom config. I created a new repo here: https://github.com/tweag/asterius-ghc-gen.
I'll enable CircleCI for it and make it run periodically. It'll clone GHC, but necessary config in place and compile it.
A bindist will be produced and collected by ghc-artifact-collector that we use with vanilla GHC already. The job will be called asterius-x86_64-linux.
In Asterius CI we download the latest bindist created this way and use it.

Implement proper garbage collection

Background

Right now, the runtime is capable of handling heap overflow situations by moving the nursery/object pool to new blocks. This will surely exhaust available memory for a long-running application, and it's about right time to implement proper garbage collection.

In #35, we discussed the possibility of relying on wasm gc proposal to implement garbage collection. While that possibility is not totally rejected yet, a more pragmatic choice would still be starting from a single-generation copying GC, then use that as a starting point for gradual improvement.

Notable differences from regular ghc's gc

The initial version is written in JavaScript, so it's convenient to do it's own bookkeeping. It's still possible to switch to a pure wasm implementation later, but we'll focus on deliver a working version first.
No notion of generations yet.
The GC roots are simply individual StgTSOs and their stacks, plus the StablePtr table. We don't use the "run queue" of Capabilitys.
No special handling of CAFs is required. The statically allocated closures are placed in "static" blocks and can be moved around just like ordinary closures. The static blocks are data segments of the linear memory, and can be reclaimed like ordinary blocks. This simplifies the gc logic quite a bit.

Plan

The plan goes as follows (each phase roughly takes one man week, if not stumbled upon unknown difficulties):

After the above todos are all ticked, we'll eliminate all known sources of memory leak and get a garbage collector. There'll still be plenty of room for improvement, but having a working gc is still a significant milestone.

Binaryen on the single-function approach

Hi! Reading this interesting post I saw

One possible solution is to transform the program by collecting all Cmm functions into one WebAssembly function [..] We experimented with this approach and discovered that with a large number of blocks in a single function, binaryen is not sufficiently scalable for this approach to be practical.

Is that approach still interesting for you? Would it be useful if we looked into improving binaryen in that area?

Transparent & uniform handling of i64 FFI

Currently, WebAssembly as in spec/V8 doesn't support passing i64 across WebAssembly/JavaScript boundary, and the i64 via BigInt proposal isn't landing anytime soon. Before we properly generate 32-bit code (which is going to be a standalone issue later), we still need to polyfill this feature.

Previously, there are multiple different workarounds co-existing in asterius:

When generating JSFFI stub functions, we always chop off higher 32-bits when passing i64 from wasm to js. See Asterius.JSFFI
We generate wrappers for RTS API functions exported to js, using the same workaround (but via different functions, see Asterius.Builtins)
For the debug mode, however, when passing i64 from wasm to js, we pass the higher/lower 32-bits as two parameters, and re-assemble at the other end. See Asterius.Builtins.cutI64.

For more reliable code generation, i64 FFI should be handled in a transparent & uniform manner:

At the level of pre/post linking IR, we do not generate any wrapper function at all; we assume i64 FFI works out of the box at this level. No ugly __asterius_*_wrapper function exist here.
When generating wasm binary code/js stub code, we inspect the imports/exports, and insert wrappers in case they are needed. When V8 implements i64 via BigInt proposal, we can remove the wrappers and behavior of generated code shouldn't change.

Improve runtime error messages

Right now, when a runtime error is encountered, we simply invoke unreachable, which makes V8 panics and outputs a dummy stack trace which is unhelpful at all.

The runtime error messages should be improved incrementally:

Implement a panic interface which outputs an error code via console.error before crashing. Upon a crash, we'll get more clues about what's going wrong. This is being worked on.
Properly implement barf, so we get string-formed error messages.
Implement our own stack trace mechanism, without relying on ghc's profiling way

Improve the current WebAssembly-in-Haskell EDSL

Note: this is currently a low-priority issue, and won't be worked on before the next public announcement.

There is a monadic EDSL to construct WebAssembly code in Haskell and it's used extensively in Asterius.Builtins to write a significant part of our custom runtime. While writing WebAssembly in that EDSL is far more pleasant than writing tree-formed AST directly, there is still a lot of room for improvement:

Improving inline functions

Inline functions are really just reusable pieces of EDSL code disguised as Haskell functions. While they allow saving some function call overhead without hurting code reuse, there is one drawback: the debugger only sees inlined function bodies, so it can't emit correct context info (which function are we currently running) when entering an inline function. This is a minor annoyance when debugging a big inline function (e.g. schedule or scheduleHandleThreadFinished), especially when they are nested.

We shall improve inline functions a bit, so for "normal" mode it still doesn't emit standalone wasm functions and call instructions, but for debug mode, we can know when an inline function is entered.

Implementing static variables and buffers

Currently the EDSL doesn't implement "static" variables or buffers which are persisted across function calls. This means our hand-written wasm code is hard to maintain an internal state. It's still possible to do this by declaring a piece of state outside the function body as an AsteriusStatics, then refer to it by the symbol, but it's troublesome and not modular. We shall implement some EDSL primitives to allocate static variables/buffers.

Removing redundant type declarations

The EDSL desugars to a tree-formed AST which closely models the binaryen AST. There is a lot of redundant type declaration in binaryen AST nodes so it makes binaryen author's life much easier (lots of passes don't have to carry around a huge context). The AST in Asterius.Types and consequently Asterius.EDSL inherited the same redundancy, but we don't really need that, for instance:

someWasmFunc = runEDSL [I64] $ do
    setReturnType I64
    r <- call' "someOtherWasmFunc" [] I64
    emit r

Here, someWasmFunc and someOtherWasmFunc all have types [] -> [i64], yet the type information exists in multiple places. I'd really love to eliminate that redundancy, so fewer keystrokes and less likely to get type errors when refactoring those functions (easy to forget to update one or two places)

Fusing "one-shot" immutable locals

The call'/callIndirect' primitives in the EDSL all allocate a temporary local, assign the result to the local and carry on. This is essential to avoid multiple calls when the result is used multiple times, but a common scenario is only using the result once in the remaining function code, in which case we should fuse away the local and embed the call instruction in place directly. Saves some bytes in wasm binary.

Generate empty stubs for SRTs in the MVP

SRTs are not used for executing vanilla Haskell code in a no-gc setting, and they tend to introduce a lot of redundant dependencies (see Sole_srt in the compiled artifact of GHC.List for example). With a bit of work in the codegen, SRTs shall be eliminated in the compilation phase.

Handle blackholing related calls

Any non-trivial program compiled by asterius involves creating & updating thunks, which brings in dependency to functions like updateWithIndirection, updateThunk, checkBlockingQueues, messageBlackhole, etc.

We need to patch the headers/cmm sources in the bundled rts of ghc-toolkit to simplify the handling of blackholes. Notable differences include:

There's no gc yet, so no need to keep track of mutable objects
There's no multi-Capability yet, so no issue of synchronization between blackhole evaluators.
There's no "scheduler", only a run-to-completion StgRun, so a Haskell thread cannot be suspended and resumed, any logic involving interruption of threads must be removed in a surgery.