tweag / asterius Goto Github PK
View Code? Open in Web Editor NEWDEPRECATED in favor of ghc wasm backend, see https://www.tweag.io/blog/2022-11-22-wasm-backend-merged-in-ghc
DEPRECATED in favor of ghc wasm backend, see https://www.tweag.io/blog/2022-11-22-wasm-backend-merged-in-ghc
The cmm language has W8/W16/W32/W64 integers, but wasm32 only supports Int32/Int64. Currently the codegen deals with W8/W16 operands in a pretty ad hoc way: it pretends they're all W32. This is problematic when emitting Load/Store instructions, since the unused bits are likely to be corrupted.
A naive fix will cause code bloating in the codegen. Some degree of refactoring (related to type inference for CmmExpr
, marshalCmmExpr
, etc) is required before fixing this without growing another several hundreds of LoC.
In current code generator, when we perform a cmm load/store, we emit one wasm load/store instruction, and memarg
has offset
and align
all set to 0
, allowing potentially unaligned load/store at arbitrary address (which is permitted by WebAssembly spec).
To my amazement (or horror), when checking output binary code by the previous binaryen
backend, I found the memarg.align
were actually set to 3
, which leads to unspecified behavior when an unaligned load/store occurs. I might have misused BinaryenLoad
/BinaryenStore
, and if the binaryen C API's align
means something different from memarg.align
in WebAssembly spec, documentation should improve a bit on that end.
Investigating fix & impact of this problem right now.
Currently, we have ghc-toolkit
to hijack a part of ghc's post-Core pipeline, and it's implemented by pasting & adapting a part of ghc
. We are not fully in control of ghc behavior if we must have a transitive dependency of the host ghc's ghc
-the-package.
We need to implement a decoupled GHC API for asterius, satisfying the following goals:
ghc
tree, configure
options and target package database, builds a Stage-1 ghc into that databaseghc-*
packages are renamed to prevent collision with their counterparts in the host ghc's global package databasegenprimopcode
) then loading the Haskell parts via a ghci
session. In that way it'll be possible to use a ghci
-based IDE to hack ghc codeCould you please add more instruction in your README (i'm using the docker version).
Thanks!
Input (stripped down fib
):
foreign import ccall unsafe "print_f64" print_f64 :: Double -> IO ()
main :: IO ()
main = print_f64 $ cos 0.5
This works fine with a release version of node.js, but crashes node.js v8-canary version with exitcode -11
. Interesting.
The current method of moving a piece of data between Haskell(WebAssembly)/JavaScript is both slow and troublesome:
StablePtr
/JSRef
entry whenever moving anything that isn't a simple unboxed value.foldl'
over a String
). Even if GHC manages to compile the "driver" to fast worker/wrapper pairs with tight inner loops, there's still overhead.When implementing marshaling logic for bytestring
, the above problems rose again, so time to improve the marshaling story, for bytestring
, regular String
s, or any other data structure:
import
ed into any input home module of ahc-link
. We probably should add something like asterius-prim
package, similar to ghcjs-prim
.JSString -> String
as an example; previously we issue a JavaScript call when moving every code point to Haskell, but we really should just directly build the fully-evaluated Haskell string on the heap and return it.StablePtr
; it's going to be dereferenced and freed soon anyway. This indicates something like foreign import javascript "f()" f :: SomeHaskellType -> IO SomeHaskellType
is possible.Currently, when we need custom behavior of generated wrapper .js
files, we can only use the --asterius-instance-callback
flag to supply a JavaScript callback which takes an already initialized instance as a parameter. We really should provide a more user-friendly interface.
Some requirements to keep in mind here:
.js
scripts generated from different ahc-link
runs may co-exist in a page. Make sure we don't accidentally pollute the global scope.Worker
, storing in IndexedDB, etc..js
file.First of all, thanks for this great initiative. There is no good compiler of any functionnal language to Wasm and it will be great to have one for Haskell. Unfortunately, I'm still very noob with Haskell, compilers, and all but I try to digg into Wasm. Still, I succeed in using Asterius through Docker, thanks again.
I'm trying to export from Haskell an hello_world
function which takes a string as only argument and return an other one. The mult example worked fine. Unfortunately exporting a function which takes a String in arguments always failed, whatever the workaround I tried. I succeed to make it works returning a String but not receiving one.
There is the code I got:
toJSString :: String -> JSRef
toJSString = foldl (\s c -> js_concat s (js_string_fromchar c)) js_string_empty
fromJSString :: JSRef -> String
fromJSString s = [js_string_tochar s i | i <- [0 .. js_length s - 1]]
foreign import javascript "\"\"" js_string_empty
:: JSRef
foreign import javascript "${1}.concat(${2})" js_concat
:: JSRef -> JSRef -> JSRef
foreign import javascript "${1}.length" js_length
:: JSRef -> Int
foreign import javascript "String.fromCodePoint(${1})" js_string_fromchar
:: Char -> JSRef
foreign import javascript "${1}.codePointAt(${2})" js_string_tochar
:: JSRef -> Int -> Char
foreign import javascript "console.log(${1})" js_print :: JSRef -> IO ()
foreign export javascript "hello_world" hello_world :: JSRef -> JSRef
hello_world :: JSRef -> JSRef
hello_world name = toJSString ("Hello, " ++ (fromJSString name))
main :: IO ()
main = do js_print 42
But it fails with:
root@b31716e46fdd:~/2048# ahc-link --input mult.hs --export-function=hello_world --asterius-instance-callback=cb
[INFO] Loading boot library store from "/root/.stack-work/install/x86_64-linux/ghc-8.7/8.7.20181027/share/x86_64-linux-ghc-8.7.20181027/asterius-0.0.1/.boot/asterius_lib/asterius_store"
[INFO] Populating the store with builtin routines
[INFO] Compiling mult.hs to Cmm
[INFO] Marshalling from Cmm to WebAssembly
[INFO] Marshalling "Main" from Cmm to WebAssembly
[INFO] Attempting to link into a standalone WebAssembly module
ahc-link: No match in record selector hsTyCon
I tried importing JavaScript callbacks to change signatures, I succeed to pass the compilation step but never succeed to have my String in argument.
Some Unique
s are leaked into current Asterius.CodeGen
output modules, which is highly dangerous. A proper fix requires a careful redesign of UnresolvedSymbol
, make it contain source module and namespace.
Currently, ahc-link
has a --run
flag which runs generated stub js to load compiled wasm code, given the compilation target is Node.js. The flag is used by all unit tests. And that was the only type of Node.js/Haskell interaction we have: only node
exit code could be automatically checked, and one needed to manually read/grep the debug logs. (There could be golden tests to check textual outputs though, but that's still far from satisfaction)
This is the tracking issue for proper Node.js/Haskell communication. Short-term objectives:
node
process and asterius processes (ahc-link
, unit tests, etc)rts
emits structured debug log entries, which can be inspected in asterius.The heavy work is already mostly finished in the new generation of inline-js
, the updated readme page of inline-js
explains the motivation behind the rewrite in details.
This work will also be the keystone of future TH/GHCi/Plugins support.
We already have a memory trap enabled when ahc-link --debug
is enabled, which captures all Load
/Store
instructions and check for invalid pointers. They have proven to be extremely useful in the arduous debugging marathons.
A current con of memory trap: it's a wired-in function in Asterius.Builtins
and only checks for a few known invalid memory segments and each check involves super high overhead by a long chain of nested pointer comparisons. The improved memory trap function should:
Note: this is a medium-priority issue which has a significant influence on user experience. Will be picked up after next announcement.
To convert a dynamically created Haskell closure to a JavaScript callback, currently we rely on two canned rts magic functions: makeHaskellCallback
/makeHaskellCallback1
, limiting supported Haskell closure type to IO ()
and JSVal -> IO ()
. They surely cover a lot of use cases, but we really should support Haskell functions of any arity. The standard C FFI mechanism already has a "foreign import wrapper" notion for this:
foreign import ccall "wrapper" some_callback_factory :: ft -> IO (FunPtr ft)
Where ft
can be anything valid in a vanilla foreign import ccall
signature. We should steal this syntax to allow users to create a JavaScript callback from Haskell function closures of any arity.
How to implement:
foreign import javascript "wrapper"
clauses in our JSFFI preprocessor. This is harder compared to vanilla foreign import javascript
; When desugaring, GHC emits code which assumes createAdjustor
in RTS to do fancy pointer hacking but we don't really want that, so we need extra logic to handle that. Also FunPtr
doesn't make any sense; we want to fetch a JSFunction
instead.The cmm code emitted by ghc assumes knowledge of several C structs in rts
, most notably:
BaseReg
bdescr
StgTSO
StgStack
Capability
and so on. The asterius codegen needs built-in support for these structs, including knowledge of:
I'm looking to contribute to this project.
The linker is undoubtedly the performance bottleneck right now, and a huge nuisance when linking large modules. Improvement is not hard though. General principle: avoid full AST traversals/rewritings as hard as we can; cache more info in the store to trade space for time.
Hi, this is a bit of an echo of #1 but I'm pretty sure I followed the docs as well as I could. spoiler: I didn't
As a user of Haste, I'm excited to try using Asterius in my personal project. Haste is a great tool but has issues with stack overflows in loops which have a bad outlook (I think the project might be dead). A year later I ran into the same Haste issue porting my Haskell project to the browser.
Anyway, I followed Tweag blog post and made a file essentially identical to the "JS calling HS" section: (I need main
otherwise GHC complains. If I replace return ()
with a hello world, node fails to compile the .wasm, but maybe that's another issue?)
foreign export javascript "mult_hs" (*) :: Int -> Int -> Int
main = return ()
Then I ran
ahc-link --input browser.hs --asterius-instance-callback='i => {
i.wasmInstance.exports.hs_init();
console.log(i.wasmInstance.exports.mult_hs(6, 7));
}'
which produced a file with the expected callback. However, running it in node I see i.wasmInstance.exports
doesn't include the foreign exports I specified. I get this error: TypeError: i.wasmInstance.exports.mult_hs is not a function
I'm using the latest docker image.
Right now, the codegen outputs broken code for stg_enter_ret
, which makes stg_returnToStackTop
accidently ignores Main_main_closure
on the stack. We hardcoded a version of stg_enter_ret
that's known to work in Asterius.Builtins
, but the hack doesn't make the underlying issue go away.
Another potentially broken function is stg_ap_0_fast
, it pre-maturely jumps to 0, which aborts execution without even entering StgReturn
. This error often shows up when compiling code related to GHC.List
.
Note: Int8#
/Word8#
isn't widely used in the boot libs yet, so this is a medium-priority issue and will be checked only after #44
Cmm has narrowing operators MO_SS_Conv
/MO_UU_Conv
which narrows a 64-bit Int#
/Word#
to 32-bit or lower, while still using a 64-bit local/global register for storage. The codegen already handles 64-to-32 narrowing, but the newly added Int8#
/Word8#
requires more aggressive 64-to-8 narrowing. Note that wasm doesn't have native opcodes for this purpose, so we need a pair of store/load opcodes here. When we later Worker
ify the runtime, we also need extra care not to introduce a race condition since it involves a pinned temporary memory region.
Note: this feature is not provided by the original ghc runtime. I intend to implement it since it has a reasonable difficulty level, and is also useful for certain use cases, see explanations below.
The typical method of using asterius is: compile some .hs
files to a .wasm
/.js
, initiate an asterius "instance", then call exported functions from that instance from the regular js world. What do we do in case an unrecoverable error pops up? (e.g. failing to grow the linear memory and allocate new blocks)
A naive solution would be: wipe the instance completely, initiate another one and start from ground zero. This surely works if whatever exported function we're calling is stateless. But what if we'd like the function to share some state across invocations, even when an instance is wiped and rebooted? Between the old & new instance, something must be transferred, and it should be accessible in the Haskell world as well.
Let's abstract all the persistent state across different instances to a single "vault". The initialization of an asterius instance can either initiate an empty vault or take an existing one. The global vault can be accessed by all Haskell code currently being executed in that instance.
What can be saved in a vault? Arbitrary Haskell closures isn't an option since the linear memory will be wiped, and if we implement closure copying, then we already have a gc, making vaults a lot less useful. Nor compact regions, since we haven't really tested if the Compact#
related primops work for our custom sm (I suspect it'll take a lot of work). However, saving Haskell values which are explicitly serializable via bytestring
is possible, this makes the vault effectively a mutable KV store (making it immutable is possible, but not worth the efforts). Another thing which can be saved: a JSRef
table which stores JS values "accessible" in Haskell.
These two can actually be unified: when saving a Haskell state, we don't enforce a Binary
constraint or whatever, so users aren't tied to a specific serialization framework, all keys & values are explicitly a ByteString
; and a ByteString
can be easily converted from/to an ArrayBuffer
, so we need only to implement the second kind of vault: a JSRef
table.
The Haskell interface is something like:
type Key = JSArrayBuffer
type Value = JSVal
vaultInsert :: Key -> Value -> IO ()
vaultLookup :: Key -> IO (Maybe Value)
vaultDelete :: Key -> IO ()
Codegen from pure Haskell to binaryen IR is mostly complete, but when attempting to link things into a standalone module, we encounter missing rts routines.
The logic of linking is in Asterius.LinkStart
. Given a Haskell program and the "root symbols" to look at, it traverses the symbol database and attempt to collect all relevant data sections/programs. The asterius:link-fact
test suite performs linking for a simple program that calculates factorial, then link and output result (or errors).
Here's the factorial program:
module Fact where
fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n - 1)
facts :: [Int]
facts = scanl (*) 1 [1 ..]
root :: Int
root = fact 5
Ideally we can use Fact_root_closure
as the root symbol, and after generating a standalone module we can evaluate it and observe how the linear memory is mutated. The error message (already reduced a lot after patching rts
):
link-fact.exe: ChaseException {unseen = fromList ["allocBlock_lock"], unavailable = fromList []}
What if we introduce a dep on facts
? Plain old Prelude
stuff isn't it? Well..let's try root = fact 5 + facts !! 5
:
link-fact.exe: ChaseException {unseen = fromList ["stg_ap_pppv_info","u_towtitle","stg_ap_v_fast","runCFinalizers","stg_ap_p_fast","isDoubleNaN","isDoubleInfinite","u_iswspace","memmove","isDoubleNegativeZero","u_iswalnum","stg_ap_pppp_fast","stg_ap_ppp_info","isFloatNaN","u_iswalpha","stg_ap_pv_fast","free","stg_ap_v_info","memcpy","__hsbase_MD5Update","stg_ap_ppv_fast","malloc","__hsbase_MD5Final","u_towlower","dirty_MUT_VAR","realloc","calloc","stg_ap_ppp_fast","stg_ap_pp_info","__decodeDouble_2Int","__word_encodeFloat","__decodeFloat_Int","stg_ap_pp_fast","stg_ap_p_info","__word_encodeDouble","allocBlock_lock","__hsbase_MD5Init","isFloatNegativeZero","u_towupper","u_gencat","isFloatInfinite","barf"], unavailable = fromList ["stg_getMaskingStatezh","stg_maskAsyncExceptionszh",".Lc2ulW","s6LoP_entry",".Lc1CN5","r2o2_entry","stg_newByteArrayzh",".Lc1CSr",".Lc2vvd",".Lc6MXu",".Lc4tt",".LcuB","s6Lpy_entry",".Lc1ep",".Lc1CPo","stg_unmaskAsyncExceptionszh","stg_raisezh",".Lc2vGs",".Lc2uuv",".Lc1CQQ",".Lc1F79",".Lc1CUz",".Lc2ukg",".Lc2usO",".Lc1CKQ",".Lcvz",".Lc2vTw",".Lc1CLF",".Lc1F8W",".Lc1EAW",".Lc1Fep","base_GHCziStackziCCS_currentCallStack1_entry","stg_newPinnedByteArrayzh",".Lc1Fcq",".Lc1CK4",".Lc4v1",".Lc5R8Q","r2o1_entry","stg_maskUninterruptiblezh",".Lc1CTx",".Lc6MVJ",".Lc1Fax"]}
So, right now the bottleneck of this project is implementing WebAssembly shims for missing bits and pieces in rts
, which are mostly related to the storage manager.
Some approaches now being taken:
.cmm
files of rts
, since there's no threads and no GC for the prototype.Asterius.CodeGen
, directly shadow the invocations to stg_gc*
Cmm functions and replace with WebAssembly unreachables.As a sub-issue of #23.
The current way of handling foreign import javascript
/foreign export javascript
is a bit silly: we process the parsed AST, recognize JSFFI basic types by string, and after converting to our own FFIMarshalState
which will later be used at link-time, we rewrite JavaScript FFI to C FFI, so ghc will happily typecheck them and generate relevant cmm code.
The disadvantages are very obvious:
newtype
s to existing JSFFI basic types since we don't recognize them, and newtype
s is sort of a standard practice when Haskellers work with conventional C FFI.ahc
to compile boot libraries using Cabal
, if Haskell modules contain JSFFI declarations, linking will fail because the symbols aren't really in any native object file. So, when we need something like asterius-prim
which contain common utilities for Haskell/JavaScript interop, it has to be shipped as a home module pasted into every project, instead as a pre-compiled boot package.There is only one advantage: working with parsed AST is easy, since it's totally context-free, no loading of dependent modules is required, not do we need to learn about TcRnMonad
and such.
We should really handle the syntactic sugar in the renamer/typechecker instead, and do better than disguising JSFFI as C FFI in order to make ghc codegen happy. ghcjs implements and uses hooks to type checking foreign decls, and that's a good starting point.
Directly related: #43
The current behavior of fromJSString
(either the strict version in runtime or lazy version in pasted AsteriusPrim
modules) is wrong when a code point exceeds 65535. Reason: I thought String.prototype.codePointAt
takes a code point index, but the index is really for UTF-16 code units, so is String.prototype.length
.
Currently the MVP example works fine on Windows, but on Linux there is a high chance of broken stg_ap_v_ret
which triggers a barf
call. The bug is indeterministic; a successful build is seen at https://circleci.com/gh/tweag/asterius/1070.
When stg_stop_thread_info
is introduced as a dependency, the linker complains that stg_enter_info
is not found. However, stg_enter_info
and stg_enter_ret
both compiles fine in HeapStackCheck.cmm
.
Printing each iteration of linker dependency analysis starting from the direct parent node of stg_enter_info
yields:
(fromList ["stg_stop_thread_ret"],ChaseResult {directDepBy = fromList [], statusMap = fromList [(Unfound,fromList []),(Unavailable,fromList []),(Available (),fromList [])]})
(fromList ["_asterius_TSO","stg_enter_info","StgReturn"],ChaseResult {directDepBy = fromList [("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList []),(Unavailable,fromList []),(Available (),fromList ["stg_stop_thread_ret"])]})
(fromList ["stg_TSO_info","_asterius_Stack"],ChaseResult {directDepBy = fromList [("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList
["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("_asterius_Stack",fromList ["_asterius_TSO"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["_asterius_TSO","stg_stop_thread_ret","StgReturn"])]})
(fromList ["stg_STACK_info","stg_TSO_entry","_asterius_Stack"],ChaseResult {directDepBy = fromList [("stg_STACK_info",fromList ["_asterius_Stack"]),("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("stg_TSO_entry",fromList ["stg_TSO_info"]),("_asterius_Stack",fromList ["_asterius_TSO","_asterius_Stack"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["_asterius_TSO","stg_stop_thread_ret","stg_TSO_info","_asterius_Stack","StgReturn"])]})
(fromList ["stg_STACK_info","stg_STACK_entry","stg_TSO_entry"],ChaseResult {directDepBy = fromList [("stg_STACK_info",fromList ["_asterius_Stack"]),("stg_STACK_entry",fromList ["stg_STACK_info"]),("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("stg_TSO_entry",fromList ["stg_TSO_info","stg_TSO_entry"]),("_asterius_Stack",fromList ["_asterius_TSO","_asterius_Stack"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["stg_STACK_info","_asterius_TSO","stg_stop_thread_ret","stg_TSO_info","stg_TSO_entry","_asterius_Stack","StgReturn"])]})
(fromList ["stg_STACK_entry"],ChaseResult {directDepBy = fromList [("stg_STACK_info",fromList ["_asterius_Stack"]),("stg_STACK_entry",fromList ["stg_STACK_info","stg_STACK_entry"]),("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("stg_TSO_entry",fromList ["stg_TSO_info","stg_TSO_entry"]),("_asterius_Stack",fromList ["_asterius_TSO","_asterius_Stack"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["stg_STACK_info","stg_STACK_entry","_asterius_TSO","stg_stop_thread_ret","stg_TSO_info","stg_TSO_entry","_asterius_Stack","StgReturn"])]})
ChaseResult {directDepBy = fromList [("stg_STACK_info",fromList ["_asterius_Stack"]),("stg_STACK_entry",fromList ["stg_STACK_info","stg_STACK_entry"]),("_asterius_TSO",fromList ["stg_stop_thread_ret"]),("stg_enter_info",fromList ["stg_stop_thread_ret"]),("stg_TSO_info",fromList ["_asterius_TSO"]),("stg_TSO_entry",fromList ["stg_TSO_info","stg_TSO_entry"]),("_asterius_Stack",fromList ["_asterius_TSO","_asterius_Stack"]),("StgReturn",fromList ["stg_stop_thread_ret"])], statusMap = fromList [(Unfound,fromList ["stg_enter_info"]),(Unavailable,fromList []),(Available (),fromList ["stg_STACK_info","stg_STACK_entry","_asterius_TSO","stg_stop_thread_ret","stg_TSO_info","stg_TSO_entry","_asterius_Stack","StgReturn"])]}
So stg_enter_info
is mistakenly marked as Unfound
since the 3rd iteration, indicating a hidden bug inside the dependency analyzer. This is a major blocker for M1.
Currently, we have out-of-line docs which are built on CI and serve as a wiki for devs, and other than that we don't have haddock or ghc-style source notes. This will probably drive away some potential contributors, given the growing size of the codebase.
Another concern is: as the main dev of asterius, I'm not dogfooding the docs enough; when I can't recall some implementation details, I usually navigate to relevant code and browse the implementation quickly. Ideally, I should be looking at comments or docs instead.
We should improve docs for developers; the objective is:
Efforts have begun on this front:
ir.md
for example)It would be nice to develop at least parts of Data.ByteString
API in Asterius.
In future maybe merge with main.
All load/store instructions in WebAssembly takes a static memarg
intermediate which can specify align
/offset
. We used neither in the past and whenever we accessed a base address plus an offset, we performed a runtime addition first. This was handy for developing "memory trap" for the debug mode since we only needed to care about the address.
However, in almost all cases, the offset is a statically known constant, and by utilizing memarg.offset
we can reduce code size and also get rid of some unnecessary runtime overhead. We just need to make the memory trap take an extra offset parameter and take that into account.
This is a collaborative issue with #26.
Status quo of exceptions: When a Haskell exception is thrown, the scheduler returns with an error code, which is later checked by rts_checkSchedStatus
and re-thrown as a JavaScript exception. We don't really know what is thrown; and when rts_checkSchedStatus
finds out the Haskell thread hasn't exited gracefully, there's no chance to get back to the crime scene and fix stuff. Besides Haskell exceptions, there are also errors signaled in the runtime itself, and the same problems apply as well.
We definitely need to improve the exception story. Types of exceptions I can come up at the moment:
throw
/throwIO
calls in Haskellforeign import javascript
code, or rts.js codebarf()
in rts cmm codeAll possible exceptions can be roughly grouped into three kinds:
The key to handle all three kinds of exceptions is improving the current one-shot scheduler. When an StgRun
loop yields back to scheduleWaitThread
, we need some (potentially async) JavaScript logic to check and fix stuff, then re-enter the loop and get back to Haskell execution.
A rough roadmap for this issue:
Improve scheduleWaitThread
, so Haskell execution may be automatically resumed for simple cases like heap/stack check failures. The storage manager won't need to allocate a large fixed-size nursery/object pool; it can take advantage of the recently implemented fast block allocator and only request as many blocks as needed. Also, the createThread
wrapper functions perform extra bookkeeping, so when the thread is created from a static closure and a fatal error is signaled, it's possible to simply re-initiate a new instance and restart execution from there. This one will be delivered this week.
Refactor the throw/catch primitives in PrimOps.cmm
/Exception.cmm
to fit the new scheduler, enabling exceptions to be caught and handled in Haskell.
Well, let's first see if 1 & 2 work as expected.
The docs currenty state
asterius requires a custom ghc which:
- Disables TABLES_NEXT_TO_CODE. It's hard to attach executable code to an info table on the WebAssembly platform.
- Uses integer-simple instead of integer-gmp. Porting integer-gmp to WebAssembly requires extra work and is not currently scheduled.
In the long run it would probably be desirable to make asterius
compile against ghc
-the-library as is. This may require changes to GHC, in particular, to make these two aspects configurable at run-time (e.g. part of the DynFlags
) rather than a compile-time option.
How much have you investigated if that is possible? I am happy to review a patch to GHC that would enable that…
We located one place in the ghc codegen which leads to runtime errors which may be triggered when an StgMutArrPtrs
object is allocated in the nursery and multiple writeArray#
instructions follow.
The relevant part in ghc: https://github.com/TerrorJack/ghc/blob/5c4b51195b408e9636f8b6b7cd7af50f81e61bd5/compiler/codeGen/StgCmmPrim.hs#L1587. There's likely nothing wrong with the ghc part (otherwise it'll be an obvious memleak).
Currently asterius
assumes the host ghc is built with tables-next-to-code switched off, however it has been breaking recent builds of ghc-head, forcing us to stick to a particular revision at ghc-8.5.20180413. Rather than taking time to investigate the ghc building issue, we can instead choose to implement support for table-next-to-code in the host ghc before proceeding.
GHC automatically generates Typeable
related bindings in the closures for user-defined datatypes. This has the consequence of rapidly bloating the range of related rts
routines and causing the linker to fail, even if we don't need anything related to Typeable
at all.
We need to identify and prune Typeable
related bindings generated in TcTypeable
of ghc, either in the stage of codegen or linking.
Background:
Currently, asterius intercepts raw Cmm from ghc's regular native codegen pipeline and compiles to wasm from that. It means we're working with something which:
R1
, or on the stack, instead of being passed via wasm's native function calls.StgCmm
are already serialized.Capability
/StgTSO
/InCall
/Task
s and functions operating on them are the things we're mainly concerned about.To target a garbage-collected platform (in our case, post-MVP WebAssembly, but lots of other platforms too), one should either resort to compiling from STG, or add the feature of opaque garbage-collected references in Cmm. The minimal requirements here are:
rts/storage/Closures.h
, and arrays are needed for stuff like Array#
.Hp
pointers, no branching to stg_gc*
routines, etc. Allocating a gc object is done via wired-in instructions.Weak#
, there should be a notion of "finalizer" for gc objects. What kind of "finalizer" to support, however, is left as an open problem as of now.Sadly, the ghc codegen and rts are currently very tightly coupled here. Although Cmm does have some notion of gc-ed pointers (one can distinguish gc-ed pointers and regular bits from CmmType
, which is inferrable from any CmmExpr
), it doesn't factor out the treatment of gc objects, instead, gc references are plain pointers, everything happens in the linear memory, and the rts move objects around and rewrite pointers when the gc routines are entered.
It would be a shame to ditch the current Cmm-based codegen, and we should seek to enhance Cmm & the ghc codegen to achieve our goals.
Objectives:
CoreTodo
s existed before core plugins, so we might only need to implement something like CmmTodo
s without a new plugin mechanism, and passes may be adjusted based on command line arguments)Possible steps:
anyref
s, and simulate anyref
s with regular pointers.Proposed extensions to Cmm language as implemented in current GHC:
CmmType
, promote GcPtrCat
to be a standalone constructor of CmmType
. It means a garbage collected reference is no longer tied to a machine word (so we don't know its Width
). To prevent breakage with current code, we may also add a new constructor representing opaque references instead of moving GcPtrCat
out.CmmExpr
, we add CmmGcLoad
constructor, to represent loading from a garbage collected reference. We also need to extend CmmReg
s, because general registers like R1
can't hold those references, so we need additional register types to hold them.CmmNode
, we add CmmGcStore
and CmmGcAlloc
. CmmGcStore
represents storing to a reference, and CmmGcAlloc
represents allocating a gc struct/array. We don't need to generate heap/stack checks when emitting CmmGcAlloc
code.CmmGcLoad/CmmGcStore/CmmGcAlloc
constructs are all accompanied by a "schema"; the "schema" comes from info tables.Q & A:
The pros & cons of cross-compiled rts versus custom rts is worthy of another long writing to discuss, so I'll just make one thing clear: this is about a matter of choices. Implementing this feature grants us the extra power of basing generated code on host platform's gc functionality, but we can still cross-compile the original rts (albeit heavily patched), or even roll our own and compare different gc strategies/workloads.
Since that quite some additional research on the codegen & rts need to be done, this issue is currently a "20% time" one; some fragmented time will be invested into small experiments, and findings will be recorded here, but once we gain a clearer understanding of ghc backend's big picture, we will put up a concrete roadmap and begin working on this front.
Related readings:
Commentary/Rts/Storage - GHC
When implementing WebAssembly versions of rts
C functions, some size & offset constants are not present in DynFlags
(or worse, DerivedConstants.h
). For instance:
StgIndStatic
is availableSIZEOF
macros only exist for a few typesOFFSET
macros are obviously wrong, e.g. #define OFFSET_StgInd_indirectee 0
We need to find a way to calculate the constants in a portable manner. Different rts
ways may yield different constants. We only care about the vanilla way for now.
In debug mode, we wish to mark certain regions in the static data segments as "trapped", so any load/store instructions involved with those regions will immediately trap. The trapped regions are mostly uninitialized fields of singleton structs like MainCapability
or RTS_FLAGS
, and since we didn't properly initialize them in init_rts_asterius
, reading/writing those fields is undefined behavior.
Currently, when performing booting, we merely run Setup configure
and Setup build
for boot packages, collect wasm IR and throw away other outputs like .hi
/.o
files. There isn't a separate package database for boot libs. This means that:
ahc-link
invokes GHC API to compile end user's programs, it uses ghc's global package database. We can't patch packages like base
to suit our own purposes.JSRef
really means. And users cannot define newtype
wrappers for JSRef
.base
, see recent breakage on wip-ghc-8.7
.So we shall take some time to properly set up a ghc pkgdb along with our own asterius_store
, and make ahc-link
use that pkgdb. Some known difficulties:
ahc
needs to load a frontend plugin, which itself depends on the global/project pkgdb. There's no way to avoid pollution from those two pkgdbs.So the real work here is figuring out a way to implement ahc
which doesn't rely on GHC's plugin system (regardless of regular/frontend plugin, since they all rely on ghci's mechanism and introduces global/project pkgdb anyway). Luckily our logic are all contained in a Hooks
, so it's possible to introduce an in-tree ghc-bin
, patch it and add our own Hooks
.
When comparing binary code emitted by binaryen/new backend, I discovered that binaryen doesn't simply do a post-order traversal on the expression IR as expected; there are quite some hidden transformations/workarounds here and there, which are mostly undocumented and there's no way to opt out even when optimize/shrink level is set to 0. A non-comprehensive list including workarounds I discovered so far:
locals
buckets appear in a function definition. Non-parameter local indices don't correspond to input expression IR (although there does exist a bijection)unreachable
instructions not present in the original IR.I've patched binaryen and shrank the list above to only contain the last entry. As long as the last piece is fit into the jigsaw puzzle, we'll base everything on the new backend right away.
We're not accusing binaryen's deficiencies here (though the caveats probably should be mentioned in docs); however, this debugging experience does reveal some additional mismatches between what binaryen offers and what we expect:
unreachable
instruction in quite some places. Hand it a mal-formed expression IR, it uses this trick to make the IR automagically pass validation...Hello again, I think I might have a PR to fix a bug, but I can't test it because I haven't gotten local builds to work yet.
I'm building with the same script/command as the resolved #28 (except with stack exec
). ahc-link
will run without issue but the generated wasm fails.
I get [ERROR] Uninitialized memory trapped at 0x00000000
(and subsequently, RuntimeError: unreachable
which apparently always happens). --debug
says it entered mult_hs
and did some work before it failed.
From here I'm not really sure how to debug. Any help would be appreciated!
I'm running this in OpenSUSE Tumbleweed.
Currently, the storage manager separates the block group for the heap and the group for allocate
calls. For simplicity when implementing the MVP, it's favourable to:
Hp
/HpLim
points to the whole area. The same area is also used for handling allocate
/allocatePinned
invocations.allocBlock*
/allocGroup*
family of functions are temporarily marked as traps, we assume allocation only takes place by either bumping Hp
or invoking allocate
/allocatePinned
.It's also easier to grow the allocation area with this uniform handling:
mmap
-like functions when calling alloc_mega_group
, we use the grow_memory
operator. With a bit of bookkeeping, the linear memory grows with a factor of k, which ensures amortized O(1) when allocating a block group.stg_gc*
, heap growth is performed.After fixed size heap is confirmed to work, we'll make the heap growable, and gc comes last.
Currently, wasm code generated in debug mode shares the very same runtime script of vanilla mode. As the improvement of debug mode proceeds, the debug runtime script will be even more bloated, which is a huge waste for the vanilla mode.
So there should be two versions of rts.js
, as well as JavaScript code generation modes. In the vanilla mode, we can drop the symbol table or error message pool in generated stub scripts (or put it in a separate .json
file so not shipping that file won't hurt normal execution, but only results in less informative error messages).
Right now it's already possible to implement a prototypical JavaScript FFI, without needing to wait for host-bindings
or gc
proposal of WebAssembly land in V8.
The main trick is: instead of directly passing JavaScript references across WebAssembly/JavaScript boundary, we only pass a handle and the JavaScript runtime maintains the mapping from handles to real objects. The JavaScript runtime and our own storage manager do not collaborate in any way, and manually calling something like freeHaskellFunPtr
is required to avoid leaking on the JavaScript side.
A brief list of what work it takes and what we can achieve:
foreign import javascript
syntax working in the ghcjs way. Instead, we'll supply a separate file to ahc-link
describing what to import, and ahc-link
will make the relevant bindings available.foreign import
is relatively easy to implement: we can import any JavaScript object as a handle, and "calling" a JavaScript function is simply sending a handle list to the JavaScript side and the real work happens out of wasm.foreign export
is a lot more difficult since it requires the scheduler to properly support interrupting/resuming Haskell threads.Some test suites demonstrating importing Web API functions will be implemented.
Currently, we resolve local/global regs by AST rewriting when marshalling a CmmProc
, then serialize the AST. During linking, other rewriting passes are carried out, e.g. symbol resolution, debugging, etc.
The current approach is a bit ad hoc: we need to carefully restrict post-serialization rewriting because register allocation is already done when the AST is serialized, so when fresh registers are requested later, we need to hard-wire it in resolveLocalRegs
and remember its absolute wasm local id.
The current AST rewriting mechanism needs improving. The basic idea shall be: serialize early & rewrite late. Register allocation shouldn't be performed when marshalling CmmProc
.
CI setup is a bit manual and ad-hoc currently. To improve the situation the following steps are proposed:
ghc-artifact-collector
that we use with vanilla GHC already. The job will be called asterius-x86_64-linux
.Right now, the runtime is capable of handling heap overflow situations by moving the nursery/object pool to new blocks. This will surely exhaust available memory for a long-running application, and it's about right time to implement proper garbage collection.
In #35, we discussed the possibility of relying on wasm gc proposal to implement garbage collection. While that possibility is not totally rejected yet, a more pragmatic choice would still be starting from a single-generation copying GC, then use that as a starting point for gradual improvement.
StgTSO
s and their stacks, plus the StablePtr
table. We don't use the "run queue" of Capability
s.The plan goes as follows (each phase roughly takes one man week, if not stumbled upon unknown difficulties):
Addr#
s across FFI boundaries. We must pass StablePtr#
s from now on.freeGroup
for reclaiming free block groups. The free groups may be reused later for allocGroup
calls, and can be zeroed to eliminate a potential attack surface.allocGroup
.JSVal
in GC
JSVal
s when they're out of scope, so the references won't cause leakage in the JavaScript world.JSVal
a newtype
to StablePtr
, recognizable by a tag bit.JSVal
objects in GC, and remove unseen entries in the JSVal
table after each run.After the above todos are all ticked, we'll eliminate all known sources of memory leak and get a garbage collector. There'll still be plenty of room for improvement, but having a working gc is still a significant milestone.
Hi! Reading this interesting post I saw
One possible solution is to transform the program by collecting all Cmm functions into one WebAssembly function [..] We experimented with this approach and discovered that with a large number of blocks in a single function, binaryen is not sufficiently scalable for this approach to be practical.
Is that approach still interesting for you? Would it be useful if we looked into improving binaryen in that area?
Currently, WebAssembly as in spec/V8 doesn't support passing i64
across WebAssembly/JavaScript boundary, and the i64
via BigInt
proposal isn't landing anytime soon. Before we properly generate 32-bit code (which is going to be a standalone issue later), we still need to polyfill this feature.
Previously, there are multiple different workarounds co-existing in asterius:
i64
from wasm to js. See Asterius.JSFFI
Asterius.Builtins
)i64
from wasm to js, we pass the higher/lower 32-bits as two parameters, and re-assemble at the other end. See Asterius.Builtins.cutI64
.For more reliable code generation, i64 FFI should be handled in a transparent & uniform manner:
__asterius_*_wrapper
function exist here.i64
via BigInt
proposal, we can remove the wrappers and behavior of generated code shouldn't change.Right now, when a runtime error is encountered, we simply invoke unreachable
, which makes V8 panics and outputs a dummy stack trace which is unhelpful at all.
The runtime error messages should be improved incrementally:
panic
interface which outputs an error code via console.error
before crashing. Upon a crash, we'll get more clues about what's going wrong. This is being worked on.barf
, so we get string-formed error messages.Note: this is currently a low-priority issue, and won't be worked on before the next public announcement.
There is a monadic EDSL to construct WebAssembly code in Haskell and it's used extensively in Asterius.Builtins
to write a significant part of our custom runtime. While writing WebAssembly in that EDSL is far more pleasant than writing tree-formed AST directly, there is still a lot of room for improvement:
Inline functions are really just reusable pieces of EDSL
code disguised as Haskell functions. While they allow saving some function call overhead without hurting code reuse, there is one drawback: the debugger only sees inlined function bodies, so it can't emit correct context info (which function are we currently running) when entering an inline function. This is a minor annoyance when debugging a big inline function (e.g. schedule
or scheduleHandleThreadFinished
), especially when they are nested.
We shall improve inline functions a bit, so for "normal" mode it still doesn't emit standalone wasm functions and call
instructions, but for debug mode, we can know when an inline function is entered.
Currently the EDSL doesn't implement "static" variables or buffers which are persisted across function calls. This means our hand-written wasm code is hard to maintain an internal state. It's still possible to do this by declaring a piece of state outside the function body as an AsteriusStatics
, then refer to it by the symbol, but it's troublesome and not modular. We shall implement some EDSL primitives to allocate static variables/buffers.
The EDSL desugars to a tree-formed AST which closely models the binaryen AST. There is a lot of redundant type declaration in binaryen AST nodes so it makes binaryen author's life much easier (lots of passes don't have to carry around a huge context). The AST in Asterius.Types
and consequently Asterius.EDSL
inherited the same redundancy, but we don't really need that, for instance:
someWasmFunc = runEDSL [I64] $ do
setReturnType I64
r <- call' "someOtherWasmFunc" [] I64
emit r
Here, someWasmFunc
and someOtherWasmFunc
all have types [] -> [i64]
, yet the type information exists in multiple places. I'd really love to eliminate that redundancy, so fewer keystrokes and less likely to get type errors when refactoring those functions (easy to forget to update one or two places)
The call'
/callIndirect'
primitives in the EDSL all allocate a temporary local, assign the result to the local and carry on. This is essential to avoid multiple calls when the result is used multiple times, but a common scenario is only using the result once in the remaining function code, in which case we should fuse away the local and embed the call instruction in place directly. Saves some bytes in wasm binary.
SRTs are not used for executing vanilla Haskell code in a no-gc setting, and they tend to introduce a lot of redundant dependencies (see Sole_srt
in the compiled artifact of GHC.List
for example). With a bit of work in the codegen, SRTs shall be eliminated in the compilation phase.
Any non-trivial program compiled by asterius
involves creating & updating thunks, which brings in dependency to functions like updateWithIndirection
, updateThunk
, checkBlockingQueues
, messageBlackhole
, etc.
We need to patch the headers/cmm sources in the bundled rts
of ghc-toolkit
to simplify the handling of blackholes. Notable differences include:
StgRun
, so a Haskell thread cannot be suspended and resumed, any logic involving interruption of threads must be removed in a surgery.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.