webassembly / exception-handling Goto Github PK

Proposal to add exception handling to WebAssembly

Home Page: https://webassembly.github.io/exception-handling/

License: Other

Makefile 0.30% Python 5.15% CSS 0.02% Shell 0.06% Batchfile 0.18% Standard ML 0.01% OCaml 3.53% JavaScript 2.50% WebAssembly 86.77% Perl 0.13% HTML 0.03% TeX 0.01% Bikeshed 1.33%

proposal

exception-handling's People

Contributors

Stargazers

Watchers

exception-handling's Issues

The else branch of `if_except` should not be optional

@sunfishcode pointed out that the fact that our current if instruction has an optional else branch complicates decoding because we do not know at the time we see the if instruction whether we are going to parse one block of instructions or two. We should avoid getting into that case here.

One option is to always require an else branch for if_except, which can be empty if its not needed. Another would be to have separate opcodes for the two-armed and one-armed if_except blocks.

What do people think?

Toolchain support for exception handling

This issue tracks toolchain support for WebAssembly exception handling support.

Remove "terminate" / describe uncaught exceptions

In the Overview, Throws and Debugging sections, an uncaught exception is described to terminate the application/execution/thread. Thus far, there is not a "terminate" concept in webassembly and, e.g., after a trap, the semantics explicitly allow the host environment to call exports in the future. E.g., wasm has no problem with this JS:

var code = text2binary(`(module (func (export "yay")) (func (export "boo") unreachable))`);
var i = new WebAssembly.Instance(new WebAssembly.Module(code));
i.exports.yay();
try { i.exports.boo() } catch(e) {}
i.exports.yay();

I think we should have the same semantics for uncaught exceptions. Moreover, in a host environment (like JS/Web) which allows interleaved activations JS -> wasm -> JS -> wasm, an exception thrown by the inner wasm activation can be caught by the outer wasm activation.

It'd be good to spell this all out explicitly and remove any use of "terminate".

rethrow's immediate arguments

In the current spec, rethrow does not take any immediate argument, and only pops except_ref from the top of stack and rethrows it.

I'm planning to add two immediate arguments to rethrow. These two are not new; they have been discussed previously.

Depths to break out of

Branches can break out of arbitrary number of blocks and loops thanks to their immediate depth argument, but rethrow currently does not have it. So I am planning to add a depth immediate argument to rethrow, which was first proposed in #29 (comment).

One thing we should consider is, depth calculation should be slightly different from that of branches. I'd like to make the depth calculation for rethrow the same as that of branches, but for branches, the depth is increased at block start instructions (block, loop, and try) and is decreased at block end instruction (end), whereas for rethrows, the depth should be decreased not at block end instruction (end) but at catch instruction. For example,

try
  try
    br 0       // branches to (2)
    rethrow 0  // rethrows to (1)
  catch        <-- (1)
    br 0       // branches to (2)
    rethrow 0  // rethrows to (3)
  end
  ...          <-- (2)
catch          <-- (3)
end

Here two br instructions are within the same try~end scope, so they branch to the same place. But two rethrow instructions rethrow to different places, because the level is decreased not at end but at catch.

To resolve this, the current LLVM toolchain implemenation maintains separate EH stack for rethrow depth calculation, in the way that EH stack depth is only incremented with try and decremented with catch and does not count other blocks or loops. As in the example below, rethrow does not count all blocks when computing depth.

try                                                                              
  block                                                                          
    block                                                                        
      block                                                                      
        block                                                                    
          try                                                                    
            rethrow 0  // rethrows to (1)                                        
            rethrow 1  // rethrows to (2)                                        
          catch        <-- (1)                                                   
          end                                                                    
        end                                                                      
      end                                                                        
    end                                                                          
  end                                                                            
catch                                                                            
end                    <-- (2)

@mstarzinger and I discussed how to compute rethrow's depth argument over email chain a little, and I'm also not sure if we settled on a conclusion.

Which exception to rethrow

The first version of EH proposal had an immediate argument to specify which exceptions on the stack to rethrow. So

try
  ...
catch 1
  ...
  block
    ...
    try
      ...
    catch 2
      ...
      try
        ...
      catch 3
        ...
        rethrow N
      end
    end
  end
  ...
end

In this example, N is used to disambiguate which caught exception is being rethrown. It could rethrow any of the three caught exceptions. Hence, rethrow 0 corresponds to the exception caught by catch 3, rethrow 1 corresponds to the exception caught by catch 2, and rethrow 3 corresponds to the exception caught by catch 1.

I don't see any use cases of this in C++, but I recall @rossberg suggested to keep this for other languages. Is that right?

Rethrow should take a label immediate

Try blocks can nest, so something like this can occur:

try
  ...
catch
  ...
  try
    ...
  catch
    ...
    rethrow  ;; which?
  end
end

There can be cases where you need to rethrow the outer exception. We should avoid the popular mistake of only providing one built-in (here, implicit) name for something for which several bindings can be in scope.

Thus, I propose that rethrow should have a label immediate referencing the desired try.

Clarify catch block syntax

The current version of V8 seems to require an index on the catch instruction. I receive:

CompileError: AsyncCompilation: Compiling wasm function "simple2" failed: Invalid exception index: 0 @+38

Where the 0 is a varuint32 number after the catch instruction in the binary code

(module
  (func $simple2 (result i32) (local i32) (local anyref)
    try
      i32.const 5
      i32.const 0
      i32.div_s
      set_local 0
    catch 0
      set_local 1
      i32.const 2
      set_local 0
    end
    get_local 0
    return
  )
).

The original code look like:

    static int simple2() {
        int r;
        try {
            r = 5 / 0;
        } catch(Exception ex ) {
            r = 2;
        }
        return r;
    }

What this index means after the catch operation? Or is this simple n bug in the early implementation in V8: 7.3.0-node.5 (candidate)

event section ID

The event section is currently specified as id 12. The bulk memory proposal is also going to add a new section (see WebAssembly/bulk-memory-operations#42), and will likely ship first. Should it have 12 instead so we don't have a gap in the known section ids? Or does it even matter?

Exception section position and identifier

The proposal does not list a section id for the exception section. Based on this change to V8, I assume this should be 13?

It looks like the exception section must come before the code section, so sections no longer come strictly in order (modulo custom sections). How precisely should we specify the location of the exception section?

Type of Wasm exceptions in JS

What should be the type of Wasm exceptions caught in JS? Should they be an instance of WebAssembly.RuntimeError with some extra fields for the identity of the exception and its arguments? Or does it make sense to have a separate type like WebAssembly.ProgramException to separate user-defined errors from those defined by the standard?

Undo pull request 45 from exception handling.

I screwed up and accidentally committed changes that weren't reviewed. How do I fix this?

Clarify `if_except` text format

The if_except format is described like this:

if_except block_type except_index
  Instruction*
end

But this doesn't include the optional label, which all other block types have. Assuming that comes first, as with others, it should be:

if_except label_opt block_type except_index
  Instruction*
end

This introduces a slightly tricky parse:

(except $e i32)
...
if_except $e
end

In this case, $e is the exception index, but that's not known until end, as it could be the label name.

It's not too hard to make this parse, but I wonder if it would be better to put the exception_index first instead to simplify things.

What counts as an exception?

Restoring lost issue:

@wibblymat

Should things like unreachable, out-of-bounds memory access, call_indirect with an invalid table index, etc., become catchable exceptions?

Proposal: using only catch_all to catch all exceptions

TL;DR

The current form of separate catch instructions (catch i, catch j, ...,
catch_all) is very hard to generate from the compiler's side and possibly
detrimental for code size and performance. So I propose to merge all catch
blocks into one catch_all block that handles all tags (meaning both C++
exceptions and foreign exceptions).

Problem

Suppose we have this original C++ code.

try {
  ...
} catch (int e) {
  action 1;
} catch (...) {
  action 2;
}

action 1 and action 2 can be arbitrary user code.

Itanium C++ ABI
specifies that catch (...) clause should be able to catch not only C++
exceptions but also foreign exceptions that are not generated from one of C++
catch clauses. It may not be necessary that we should strictly follow the
Itanium ABI spec, it makes most sense for catch (...) to handle also foreign
exceptions anyway because that's the only way for a C++ programmer to specify
some action when it catches a foreign exception. That means, when there is
catch (...), we should generate a catch_all instruction. Then the generated
Wasm code will look like, in pseudocode,

block $label0
  try
    ...
  catch i
    if (int e)
      action 1
    else
      br $label0
  catch_all
    br $label0
  end
end
action 2

Here action 2 part is factored out so that it can be shared between catch i
and catch_all in order to prevent code duplications. But whether we duplicate
action 2 part of the code or factor it out, the requirement is that we should
be able to know which part of the code corresponds to the catch (...) so we
can generate correct code by factoring out or duplication.

Let's see another case. When the original C++ code is like

MyClass obj;
try {
  ...
} catch (int e) {
  action 1;
}

There is no catch (...) in this code, but that means we should generate
cleanup code to call the destructor for obj. And that cleanup code should
run regardless of whether we catch a C++ exception or a foreign exception. So,
it should be either duplicated or factored out as well:

block $label0
  catch <c++>
    if (int e)
      action 1
    else
      br $label0
  catch_all
    br $label0
  end
end
delete obj
rethrow

Now we face the same problem: we should know which part of the code corresponds
to corresponds to the cleanup code.

Separating code within catch (...) by examining and pattern matching LLVM IR
(or any other compiler's IR) is not always possible because code can be
transformed or optimized in many different ways. Windows EH developers once
tried it and failed. Windows EH
requires identifying not only catch (...) but also all the catch clauses, but
the problem here is inherently the same. Pattern matching cleanup code is also
not simple, because from the IR's point of view, they are just function
(desturctor) calls.

Can this be done if we demarcate these parts in clang (or more generally,
frontend code generation phase)? The answer looks, maybe yes, but it will be
much hard, and I'm not sure if it's worth it. Basically what we need is the way
to demarcate some code parts, and prevent any code entering or escaping from
that regions in all of the IR-level passes and backend level passes. Code
hoisting or sinking across the boundaries should not occur in any pass, and
instruction scheduling in backend should treat these boundaries as fences for
not only memory instructions but also all instructions. Windows EH developers
faced similar problems and came up with new specialized
instructions,
but their objectives were different - they did this because they didn't use
Itanium ABI and they had to satisfy MSVC's spec -, it does not look possible to
reuse their approaches. Also, there will be more work that has to be done: such
as, matching each landing pad to its parent scope's cleanup code.

Even if it is possible by creating new instructions and doing more work on clang
side, it will also prevent code optimization opportunities, because it basically
separates certain parts of code and does not allow any optimization across their
border. For example, shared expressions may not be able to be merged.

Proposal

Considering the amount of work that needs to be done to satisfy the current spec
and the expected downside of code size and performance degradation, I think
having one catch_all instruction that handles all exceptions is the best way
to go. Actually we can do this even with the current spec by only using
catch_all and not using other catch tag instructions, but that brings
another point: is catch tag instruction ever useful?

To use only catch_all, there should be a way to tell if the current exception
is a C++ exception or not within a catch_all clause. While I think it can be
done by setting some variable within some libcxxabi functions (such as
__cxa_throw and __cxa_begin_catch), it would still be better if there is an
easy way to access the currently caught exception's tag within a catch_all
block. Maybe catch_all block can return the caught exception's tag.

Even if catch_all instruction does not put an exception object on top of Wasm
stack, there are ways we can relay an exception object from throw to
catch_all: one possible way is to use Wasm global. throw instruction sets a
Wasm global with the pointer to an exception object so within a catch_all
block we can retrieve it.

The only possible downside of this scheme is, when a foreign exception is thrown
and there is no cleanup code to run for a certain stack frame, anyway it should
stop at that frame because it is caught by catch_all instructions. But I
hardly imagine this case will be common enough to affect performance.

Does the current design of `catch` and `if_except` require `pick`?

IIUC, catch pushes the except_ref value on the stack and if_except reads it from the top of the stack. I was thinking that, in general, code generators may need to access the except_ref value when it's not at the top of the stack. Normally, this is achieved by storing values in locals and using get_local. pick could achieve the same for except_ref values on the stack, so should this proposal depend on (or smuggle in) pick?

Is the exception section needed?

The type section currently prefixes each function type by the byte 0x60. It sounds like this was looking to a time when there would be more kinds of types than just functions. Would it make sense to move the exception definitions into the type section rather than defining a whole new section?

Clarify folded `try` text format

The proposal describes the flat format for a try block, but not the folded format. I believe the previous proposal used:

(try
  ...
  (catch
    ...
  )
)

Should we do the same here?

Bikeshed: names of exception handling instructions

See #33 (comment) and #33 (comment)

Use `else` opcode instead of `catch`?

I seem to recall that an older version of this proposal used the else opcode to mean catch_all. Seems like we could do that for catch now, since there's only ever one catch block. What do you think?

Simplify exception dispatch to branches

The current if_except for handling exceptions is a structured instruction and as such quite complex. For example, embodying blocks, it has to deal with block signatures, labels, etc.

In the context of working on the GC proposal, where a similar construct is needed for downcasts, I realised that we probably don't want to introduce a zoo of block constructs. And it turns out that with the multi-value proposal we can simplify all these by replacing them with simple branches.

For the exception proposal, this would mean that instead of the structured if_except construct, we simply have a new branch of the form

br_on_exn <labelidx> <exnidx>

which checks whether the exception package on the top of the stack matches the exception denoted by <exnidx>, and if so, branches to <labelidx>. The trick, however, is that the block signature of the branch target has to be compatible with the exception's tag type -- because it receives the exception arguments as branch operands. If the exception does not match, it remains on the stack.

For example:

(exception $e i32 i64)
...
block $l (result i32 i64)
  ...
  try
    (i32.const 1)
    (i64.const 2)
    (throw $e)
  catch
    (br_on_exn $l $e)  ;; branch to $l with $e's arguments
    (rethrow)  ;; ignore other exceptions
  end
  ...
end
;; (i32.const 1) (i64.const 2) are now on the stack

This can now be used to construct handler switches in the same way br_table is used to construct regular switch:

block $end
  block $l1
    ...
      block $lN
        (br_on_exn $l1 $e1)
        ...
        (br_on_exn $lN $eN)
        (rethrow)
      end $lN
      ;; handler for $eN here
      (br $end)
    ...
  end $l1
  ;; handler for $e1
end $end

I think this is a simpler primitive and makes better reuse of existing instructions. (The try construct remains unchanged.) WDYT?

Should there be a separate exception type section.

The proposal introduces exception types. Should there be a separate type section, or should they be merged into the type section.

Interaction with host environments

It doesn't look like the current spec describes the interaction with host environments. We should, at least for JS, specify how the exception handling spec is intended to interface with a host's native exceptions, if the host has any. In the JS case, what object is given to the JS catch block?

Also, what do exported exceptions look like to the host environment. Presumably for JS they must be some kind of object, with some internal data associated with them.

One idea would be for for each exception "be" an ES6 like subclass of WebAssembly.RuntimeError. Then if you catch one of these in JS it's indexed properties would be the payload. It also seems like it should have a signature and length as well.

Although, there might be some encapsulation concerns with the above proposal w.r.t non-exported exceptions. Perhaps you should only be able to access the payload if you have the exported exception, which we could do by hiding the data under a Symbol only exposed on the export object.

Inconsistency between except_ref and anyref/anyfunc

This proposal spells the type except_ref with _, but the official spec and the Reference Type proposal write the reference types anyref and anyfunc without _. I think the spelling of these types should be consistent.

Change `try ... catch` + `br_on_exn` to `br_on_catch{,_else}`?

Edit: Redid a couple things.

After reading #58 and that eventually getting merged in #71, I feel you could remove try ... catch ... end altogether and simplify the branching generation a lot by just unifying try ... catch and br_on_exn with a generic br_on_catch len ($lbl $id)+ and br_on_catch_else len ($lbl $id)* $default_lbl that operate more like a br_table for exceptions. The label of the corresponding block for $default_lbl must be a (result except_ref), but the rest just need to match the corresponding exception. And except_ref can only be plugged into a rethrow, manipulated as an opaque parameter, local, or result, or dropped.

The br_on_catch is sugar for br_on_catch_else, just with the default branch always rethrowing. (This is very commonly the case, so I felt it was worth including.)

This would make the corresponding grammar look like this:

br_on_catch len (label except_index)+ |
br_on_catch_else len (label except_index)* label |
throw except_index |
rethrow

And of course, this makes the code a lot smaller with no loss in power. (You can always organize your blocks to have shared logic as appropriate - it's roughly the same amount of code, and you'd likely need to do it anyways in the case of C++ exception handling.)

Factors behind using `else` for default catch block

I have a bit of a two-fold issue with using else as the keyword for the default catch block.

First, it's a bit of an awkward word choice for the operation conceptually. For example, with just a try block with no special catches:

try
  instruction
else
  instruction
end

Conceptually, it's "try to do something, else do something". Compared to the catch keyword, it doesn't seem to quite fit. (I fully accept that this is probably a subjective opinion.)

My second problem is that its use conflicts with the current implementation of else as a companion to if. The compiler would have to be able to tell the difference between an else block attached to an if block or one attached to a try/catch block. While that's certainly possible to do, I can't help but wonder if using a different keyword would reduce the necessary complexity. Even using catch without an immediate seems like it would be simpler (not to mention easier to conceptualize).

So with my above two issues laid out, I would propose to either use the catch keyword without an immediate for the default catch block, or to use a different keyword like default that more accurately conveys the meaning of that block.

Measure size increase for enabling EH on modern C++ codebases

This is a request from people working on the producer side to get some real-world measurements of the size increase of enabling exception handling in large C++ codebases, particularly those that heavily use the common RAII style which will end up giving a large percentage of functions one or more catch_all blocks.

I'm not sure what the criteria is for what's an acceptable increase, but if it's significant then we should probably reconsider some of the strategies that would allow sharing code between the normal and unwinding exit paths. It won't feel great if we do all this work to add EH to wasm and the general advice immediately becomes "don't enable EH".

Discuss relationship with other proposals

Exception handling is related to some of the other proposals. For example, we've taken inspiration from host bindings, and may use multi-valued blocks. We should mention these in the proposal and discuss the relationship.

Should index of rethrow index blocks (like branch) or just try blocks?

For nested rethrow my initial expectation was that rethrow's label would only index try blocks (effectively "filtering" the block stack for try blocks) rather than simply indexing the block stack. This seems marginally more convenient for producers and it doesn't seem like any trouble for anyone. I don't have a strong opinion here, though.

Should `if_except` pop the exception off the stack?

For the behavior of if_except, the current proposal states:

The conditional query of an exception succeeds when the exception on the top of
the stack is an instance of the corresponding tagged exception type (defined by
except_index).

If the query succeeds, the data values (associated with the type signature of
the exception class) are extracted and pushed onto the stack, and control
transfers to the instructions in the then block.

If the query fails, it either enters the else block, or transfer control to the
end of the if_except block if there is no else block.

It does not explicitly state what happens to the exception object on the stack. I would interpret this to mean that the exception object remains on the stack in both the then and the else branches. Is this the behavior we want?

There are a couple of other options that I think could be reasonable choices and are worth discussing.

Option 0: Status Quo

This is the option that matches my read of the current proposal. The exception stays on the stack in both the then and else branches of if_except. I'm calling it Option 0 to make it easy to refer to in discussions.

Having the exception available in the then branch is possibly convenient for rethrowing the exception. However, it will be buried by the exception arguments and the producer will have to pop these off before getting access to the exception, meaning it may not actually be all that useful.

Having the exception available in the else branch makes it more compact to chain multiple tests, such as:

if_except 0:
 ... then 0 ...
else:
  if_except 1:
    ... then 1 ...
  end
end

Option 1: `if_except` Always Pops the Exception

In this case, the producer would have to explicitly save the exception to a local if it wants to refer to it multiple times. For example:

tee_local 0
if_except 0:
 ... then 0 ...
else:
  get_local 0
  if_except 1:
    ... then 1 ...
  end
end

This seems more consistent with the rest of the Wasm instructions, which generally remove any arguments used from the stack.

The downside is that every chained if_except will need a get_local, although chained if_excepts are likely to be rare, I expect.

Option 2: Pop in Then Branch, Leave in Else Branch

This feels natural to me. The if_except instruction attempts to unpack the exception, and if it exceeds then it consumes the exception and leaves the contents of the exception on the stack. In the else case, things are left unchanged since it failed to unpack the exception.

Care will be needed to make sure the types balance on the two branches, and the one armed if_except case seems weird under this strategy. This option does allow chaining like Option 0 does.

Option 3: Multiple Matches

The main advantage for leaving the exception on the stack is to allow chaining. Another option would be to make if_except act more like a case statement. For example:

if_except 0:
  ... then 0 ...
else_except 1:
  ... then 1 ...
else_except 2:
  ... then 2 ...
else:
  ... else ...
end

If we had this kind of format then it would make most since to me to always have if_except pop the exception, and if multiple copies are needed then locals can be used.

What does everyone think? My choice would be for Option 3, with Option 2 as a second choice. Once we seem to have some consensus here, I can write up a PR to clarify which option is used.

Should catch blocks be nested inside try blocks.

The current proposal defines catch blocks and nested "catch" ... "end" blocks within the corresponding "try" ... "end" block.

Nesting the blocks makes several parts of the proposal a bit clumsy.

Zero value for except_ref type?

The Local Variables section of Semantics in the design repo says

Local variables have value types and are initialized to the appropriate zero value for their type (0 for integers, +0. for floating-point) at the beginning of the function, except parameters which are initialized to the values of the arguments passed to the function.

Because except_ref type can be assigned to locals, we should have a zero value like nullref for except_ref type. Do we have it?

Throwing/catching with an optional value

Most common languages with exception handling have some sort of way to throw with a value, including OCaml try ... with Foo ... -> and Java try { ... } catch (Exception e) { ... }. I noticed this proposal only has tags, but not values, so I feel this may have been overlooked.

`if_except` with block params (multi-value proposal)

The if_except true branch extracts the values from the exception and pushes them on to the stack. The multi-value proposal also allows you to have block params. How do these two features interact?

e.g.

(except $e i64)
...
(func
  try
    ...
  catch
    if_except (param i32) $e
      ;; what is on the stack here?
      ...

An alternate model for exceptions

This exceptions proposal seems quite complex and overly oriented to the needs on one situation. This is a proposal for a (hopefully) simpler exception handling framework.

I propose partitioning the exception handling into two separable pieces: the modeling of exceptions and the modeling of the control flow.

Exceptions should not be special. I propose that exception values simply be any value that can be passed to a function. I.e., there would be no special 'marking' of certain values to be exception values.
Control flow. There are three 'interesting events' in the life of an exception: when the exception is first thrown, when it is caught, and when it propagates out of a function.

The most interesting case is when an exception propagates out. In this proposal, instead of having a global unwind mechanism that can unwind an arbitrary number of stack frames, I suggest having a special 'invoke' instruction that combines a normal function call with the possibility of throwing:

invoke

together with a return_throw instruction.

When an invoked function returns normally, it is as though nothing abnormal happens. But, if a return_throw instruction is invoked, then the corresponding invoke instruction also fails.

I.e., an invoke instruction behaves as though it were one of two instructions: function_call or throw; depending on whether the called function existed with a return or a return_throw instruction.

Other than that, some of the existing proposal would stay the same. In particular, the basic control flow form:

try resulttype instruction* catch instruction* end

would be essentially the same; although, IMO, the type of the thrown exception should also be included:

try resulttype exceptiontype instruction* catch instruction* end

The instructions in the so-called catch block would be responsible for decoding the exception value; which is one reason for including the type of that value in the try-block itself.

Similarly, functions that can throw should also have that reflected in their signature.

I am aware that so-called checked exceptions are a controversial topic. This proposal is oriented towards checked exceptions but some small adjustments would allow for unchecked exceptions too.

There would be no intrinsic support for distinguishing between exceptions thrown in one language and caught in another. This is deliberate. Such interlanguage issues can be addressed using the forthcoming proposal for xxx-IDL bindings.

There would be no intrinsic support for features such as stacktrace. This is deliberate.

As far as I am aware, this proposal also represents a 'zero cost' exception handling proposal. One architectural difference is that multi-frame unwinding of exceptions is represented explicitly in the code rather than being implicit.

Should producers/consumers assume throwing is "rare" and, if so, can the spec note this?

This is certainly the basic assumption for C++ but not for many other languages. The underlying question is whether engines can assume they can use the classic "zero-cost" EH strategy without tanking on performance of some language that throws all the time. AFAICS, if one has to assume throwing/catching can be a hot path, it'd be preferable to use either an extra 'throwing' return value (with branches on every callsite) or a setjmp/longjmp-like strategy, which produce a general slowdown in the non-throwing case.

Given that:

the reason to add EH to wasm is to allow compilers/runtimes to achieve better performance than they could otherwise in wasm-without-EH,
the zero-cost strategy is something only the engine can do (at least in the short- to medium-term),
with multi-return, a compiler/toolchain could implement a pretty-fast non-zero-cost EH strategy in wasm-without-EH,

I think we should non-normatively state this assumption in the spec so engines can predictably provide the performance of the zero-cost strategy.

JS Embedding: When to capture stack trace

The current Layer 1 proposal splits creating the exception object from throwing it. JavaScript exceptions usually have a stack trace associated with them. Should the embedding capture the stack trace at the point the exception is created, or when it is thrown?

See #33 (comment)

Document MVP requirements in Layer1.md

See comment at #33 (comment)

How to handle system exceptions?

If I understand the spec correctly then it is only possible to handle self defined/throwed exceptions via br_on_exn.

What is with exceptions from the system/runtime self? For example OutOfMemory, UnrechableCode, DivisionByZero, ArrayOutOfRange, NullPointer, ...

Is it possible to differ between this different system exceptions? If yes, how is this possible?

Consider removing tags from proposal

Recent discussions have revealed that the tag system we are currently proposing is not very useful for implementing exceptions in C++ via LLVM. While tools are certainly welcome to ignore tags entirely and use only catch_all, as #31 proposes, it seems that we should consider removing the tags portion of the proposal entirely.

As far as I know, we currently have no languages that are planning to use tags. This means we could be in the unfortunate position of requiring more complexity on the part of WebAssembly implementations for a feature that is unused. It seems better not to require this in the first place.

From a design standpoint, I like tags, and I can definitely see how they could be useful to some languages. If there are languages actively planning to use these features, we should pull the implementors into the design process to make sure we come up with something that meets their needs.

If we do not have current users for tags, I suggest removing them from the proposal. Ideally we would do this in a way that leaves the door open for them if they are needed in the future.

Should there be support for a finally clause?

Some languages such as Java and Python support a finally clause with a definition like that from the Python spec:

A finally clause is always executed before leaving the try statement, whether an exception has occurred or not.

Can I save an exception to a local variable?

Exist an exception value of type except_ref only on the stack or can I safe it also in a local variable? Which type should the local variable have?

I try the follow code with wabt:

(module
  (func $foo (result i32) (local $ex anyref)
    try
      i32.const 5
      i32.const 0
      i32.div_s
      return
    catch
      local.set 0
      i32.const 2
      return
    end
  )
)

and receive:
error: type mismatch in local.set, expected [anyref] but got [except_ref]

The using of a local type except_ref work also not.

If there would be some test code then it can be very helpful.

Add 'end' at the end of 'catch' as well

#52 discussed how should folded format for try and catch would look like, and we settled on

(try
   ...
  (catch
     ...
  )
)

But in real instructions, currently this sequence is

try
  ...
catch
  ...
end

I'm wondering, can we add end at the end of catch as well, to match the text format, like this?

try
  ...
  catch
  ...
  end
end

This would increase the code size slightly, so I'm not pushing hard for this, but just would like to hear people's opinions. I'm not proposing to promote catch to a full block that can take a signature. It will not take a signature, but it will be counted as a block boundary when we compute relative depth for br, br_if (and other branch instructions that will be added in future), and rethrow.

Backstory:
It's not currently in the spec, and I'm planning to add this to the spec text soon, but anyway the plan is to add a 'depth' argument to the rethrow instruction so that it can rethrow not only to the innermost enclosing catch but also to any outer catches. This was first suggested in #29 (comment). Without this code generation for exception becomes very difficult, because we can't structure control flow for EH as in with branches with depth arguments.

So the advantage of adding end at the end of catch is, we can make rethrow's new 'depth' argument consistent. I'd like to compute rethrow's depth argument in the same way as we compute branch's depth argument, but in the current spec, it can't be done. When computing depths,

For branches,
block / loop / try: stack depth +1
end_block / end_loop / end_try : stack depth -1

For rethrows,
block / loop / try: stack depth +1
end_block / end_loop / catch: stack depth -1 (not end_try!)

try                    +1
  try                  +1
  catch                -1 only for rethrows
    br_if N
    rethrow N
  end                  -1 only for branches
catch                  -1 only for rethrows
end                    -1 only for branches

To avoid this problem, the current LLVM toolchain implementation maintains separate EH stack for rethrow depth calculation, in the way that EH stack depth is only incremented with try and decremented with catch and does not count other blocks or loops. We can do it this way, but if the code size increase for adding end at the end of catch is negligible, it would make the way of computing depth argument for branches and rethrows the same, which is more consistent.

@mstarzinger and I discussed how to compute rethrow's depth argument over email chain a little, and I'm also not sure if we settled on a conclusion.

Should we model exception constructors using existing functions?

After reading a lot of the feedback, I am beginning to think that the notion of an "exception section" is unnecessary to implement exception handling.

What I propose to do instead is to use "thow" and "catch" opcodes that immediately precedes a call instruction, creating new constructs.

The throw construct calls the call instruction to build the exception, which is an internal record with
the following fields:

(1) The function called to construct the exception.
(2) The value(s) returned by the called function.

A catch prefix doesn't call the function. Rather it is used to match field (1) of the exception. If it matches, the catch block is run. Before running, the value(s) of field (2) are pushed on the block.

Note that the default catch (an else clause in a try block) would apply if none of the other catch constructs match.

This solution has the advantage that we don't introduce new sections to the module. The existing framework already handles all the important parts, other than modeling a WASM exception.

Further, all exceptions, including those not produced by WASM, can be represented this way. One
just needs to build a record that has an appropriate function pointer for field (1), and any accessable values in field (2).

Is this a viable way to implement exception?

Proposal on the spec changes

Proposal on the Spec Changes

I would like to propose some changes to the current proposal.

Propsed Changes

Try with Relative Depth Argument

try now can have a relative depth argument as in the case of branches. The
'normal' try - a try in which calls unwind to a catch next to the try -
has a depth of 0.

Here are examples. For brevity, only one catch instruction is shown for each
try instruction.

# Example 1
try 0
  throw
catch i         # Control goes here
  ...
end

# Example 2
try 0
  try 1
    throw
  catch i
    ...
  end
catch i         # Control goes here
  ...
end

# Example 3
try 0
  try 0
    try 2
      throw
    catch i
      ...
    end
  catch i
    ...
  end
catch i         # Control goes here
  ...
end

Catchless Try Block

When an argument (relative depth) of a try instruction is greater than 0, its
matching catch block does not have any uses. For example,

try 0
  try 1
    throw
  catch i       # Not used!
    ...
  end
catch i         # Control goes here
  ...
end

In this case, when an exception occurs within try 1 block, the program control
is transferred to the outer catch block. So in this case the inner catch
block is not used, so if we do not generate this kind of catch blocks, it will
help reduce the code size. Effectively, a catchless try block is the same as a
catch with an immediate rethrow. So this code

try 0
  try 1
    throw
  end
catch i         # Control goes here
  ...
end

has the same effect as

try 0
  try 1
    throw
  catch i
    rethrow 0
  end
catch i         # Control goes here
  ...
end

Actually, try 1 would not have a real use, because code inside try 1 would
go to the one-level outer catch, in which case we can just omit try 1 and
place the call inside try 0 outside.

The relative depth argument of try instruction only counts the number of try
nests: it does not count block or loop nests. For example,

try 0
  block
    try 1
      block
        throw
      end
    end
  end
catch i         # Control goes here
  ...
end

In this case, when the throw instruction throws, the control is still
transferred to the outer catch i block, even though now there are two block
nests in the code.

Motivation

Background

In LLVM IR, when a function call can throw, it is represented as an
invoke instruction
which has two successors in CFG: its 'normal' destination BB and 'unwind'
destination BB. When an exception does not occur, the control goes to the
'normal' BB, and when it does, the control goes to the 'unwind' BB. Here is a
couple LLVM-IR level CFG examples:

C++ code:

try {
  foo();
} catch (...) {
}

LLVM IR-like pseudocode:

entry:
  invoke @foo to label %try.cont unwind label %lpad

lpad:
  %0 = landingpad ...
  ...

try.cont:
  ...

C++ code:

try {
  foo();
  foo();
} catch (int n) {
}

LLVM IR-like pseudocode:

entry:
  invoke @foo to label %invoke.cont unwind label %lpad

invoke.cont:
  invoke @foo to label %try.cont unwind label %lpad

lpad:
  %0 = landingpad ...
  ...
  if caught type is int br label %catch else br label %eh.resume

catch:
  ...
  br label %try.cont

try.cont:
  ...

eh.resume:
  resume ...

invoke instructions are lowered to calls in the backend, but they still have
a landing pad BB as their successor. landingpad instructions disappear in the
lowering phase, and the compiler inserts a catch instruction in the beginning
of each landing pad BB.

In terms of control flow, an invoke, or a call lowered from it, is similar
to that of a conditional branch br_if. When a branch is taken, br_if jumps
out of the current enclosing block(s) by the number of relative depth specified
as an argmuent. When an exception is thrown within a function call, the control
flow jumps out of the current enclosing try block. But the difference, in
the current EH proposal, is it can only break out of a single depth, because
call does not take a relative depth as an argument and the VM transfers the
control flow to the nearest matching catch instruction.

Structured Control Flow

To make a control flow structured, there should not be an incoming edge from
outside of a block-like context (block, loop, or try), to the middle of
it. So it is required that the first BB of a block-like context should dominate
the rest of the BBs within it (otherwise there can be an incoming edge to the
middle of the context).

In the CFGStackify
pass,
here is how roughly block markers are placed:

For each BB that has a non-fallthrough branch to it (this BB will be where
end marker will be)
Compute the nearest common dominator of all forward non-fallthrough
predecessors.
If the nearest common dominator computed is inside a more deeply nested
context, walk out to the nearest scope which isn't more deeply nested. For
example,
```
A
block
  B    <- nearest common dom. is inside this block!
end
BB     <- we are processing this BB. end marker will be here
```
In this case, we can't place a block marker in B. So we walk out of the
scope to reach A.
Place a block marker in the discovered block (the nearest common
dominator of branches or some block found by the process in 2) and place a
end marker in BB.

For loops, a loop header is by definition dominates all the BBs within the loop,
so we just place a loop marker there and end marker in the latch.

Problems with the Current Proposal

A try/catch block is divided into two parts: a try part and a catch part.
What we should do for grouping a try part is similar to grouping a block,
because we also want try to be structured.

For each landing pad, where catch instruction is
Compute the nearest common dominator of all call instructions that has this
landing pad as its successor
If the nearest common dominator is inside a more deeply nested context,
walk out to the nearest scope that more isn't nested.
Place a try marker in the discovered block.
(Grouping catch part is not covered here because it is not relevant)

The problem is, unlike branches, call instructions do not have a relative
depth argument so cannot break out of multiple contexts. But from the nearest
common dominator to the landing pad it is possible some call instructions that
might throw unwind to outer landing pads (landing pads ouside of the nearest
common dominator of throwing calls ~ current landingpad scope) or do not unwind
to any landing pad, which means when they throw, the exception should be
propagated out to the caller. For example,

try
  try
    call @foo()    # if it throws, unwinds to landing pad 1
    ...
    call @bar()    # if it throws, unwinds to landing pad 2
    ...
    call @baz()    # if it throws, propagates to the caller
  catch i          # landing pad 1
    ...
  ...
catch i            # landing pad 2
  ...
end

Because it is not possible for a call instruction that might throw to specify a
relative depth, or in other words, it cannot specify which landing pads to go,
in the current EH proposal, this does not work.

Why the New Scheme is Better

The only way that can make the current scheme work is to split landing pads
until all the possibly-throwing calls within a try block unwind to the a
single landing pad or landing pads that's in the nested context of the try
block. Minimizing the number of split landing pads will require nontrivial CFG
analysis, but still, it is expected to increase code size compared to when we
use the new proposed scheme above.

Code Size

For a simple example, suppose we have a call that unwinds to an outer landing
pad in case it throws.

try
  call @foo    # unwinds to the current landing pad
  call @bar    # unwinds to outer landing pad
  call @baz    # unwinds to the current landing pad
catch i        # current landing pad
  some code
end

If we split this landing pad, the code will look like the below. Here we assumed
that we factored out the some code part in the original catch part to reduce
code size.

block
  try
    call @foo
  catch i
    br 1
  end
  call @bar
  try
    call @baz
  catch i
    br 1
  end
end
some code

So roughly, when we split a landing pad into n landing pads, there will be n
trys + n catchs + n brs + n ends that have to be added.

If we use our new scheme:

try 0
  call @foo    # unwinds to the current landing pad
  try 2
    call @bar  # unwinds to outer landing pad
  end
  call @baz    # unwinds to the current landing pad
catch i        # current landing pad
  some code
end

In the same case that we should split a landing pad into n, if we use the new
scheme, roughtly we will need to add (n-1) trys and (n-1) ends. (trys now
take an argument, so it may take a bit more space though.)

Easier Code Generation

Generating Wasm code is considerably easier for the new scheme. For our current
scheme, the code generation wouldn't be very hard if we attach a catch
instruction to every call that might throw, which boils down to a try/catch
block for every call. But it is clear that we shouldn't do this and if we want
to optimize the number of split landing pads, we would need a nontrivial CFG
analysis to begin with.

And there are many cases that need separate ad-hoc handlings. For example,
there can be a loop that has two calls that unwind to different landing pads
outside of the loop:

loop
  call @foo   # unwinds to landing pad 1
  call @bar   # unwinds to landing pad 2
end
landing pad 1
...
landing pad 2
...

It is not clear how to solve this case, because, already a part of a try is
inside an existing loop but catch part is outside of the loop, and there are
even another call that jumps to a different landing pad that's also outside of
the loop.

There can be ways to solve this, but there are many more tricky cases. Here, the
point is, the code generation algorithm for the new scheme will be a lot easier
and very straightforward. Code generation for the new scheme can be very similar
to that of block marker placement in CFGStackify. We place try markers in
a similar way to placing block markers, and if there is a need to break out of
multiple contexts at once, we can wrap those calls in a nested try N context
with an appropriate depth N. Optimizing the number of newly added try N
markers will be also straightforward, because we can linearly scan the code to
check if any adjacent try blocks can be merged together.

Add exception specifier to function signature

Exception specifiers are a common-enough feature in strongly-typed languages.

C++ has noexcept
Java has throws
Dlang has nothrow

These specifiers have a few advantages, that warrant integrating them into WebAsm:

If a language uses a monadic or state-machine error model (eg Rust's Result<T, E> type), exceptions specifiers would allow them to interface with functions that may throw exceptions, by automatically transforming int fooBar() @mayThrow into Result<int, GenericException> fooBar().
If an interpreter decides to implement a "branch at every call site" strategy for functions that will frequently throw (see also #19), it's very important to be able to tell the interpreter which functions won't ever throw to avoid unnecessary overhead.
Especially in C++, noexcept specifiers can enable both compiler optimizations (better control flow analysis) and user optimizations (eg STL move optimizations).

More generally, there's an argument to be made that whether or not a function can interrupt the control flow of your program should be a part of its API, and therefore its signature.

Is `rethrow` necessary?

I'm not sure what separates a rethrow from just throw. In implementation, the instructions would have to do work to find out what the contextual exception that a rethrow would be referring to, but it seems like just rethrowing the exception with just a throw ex would be perfectly acceptable as well as simpler to implement.

Should catch blocks have type signatures.

The current proposal adds a type signature that must be the same as the try block. Should we be doing this.

Could the proposal expand/clarify the notion of exception tag identity?

Right now the concept of exception tag when introduced is a bit fuzzy: the first use says "Within the module, exceptions are identified by an index into the exception index space. This index is referred to as the exception tag."

I think tags need to be introduced first by saying that they are runtime objects that have (1) a unique identity (2) a type signature and that exceptions are "branded" with a tag on creation and catch blocks match exceptions by specifying a tag's identity to match. This is all independent of index; indices are just the way that throw/catch within a module refer to a defined/imported tag identity. It'd also be good to call out the symmetry to memory/table definitions and that sharing or the lack thereof between instances in a single app is controlled by the user by which identities are used to instantiate.

Lastly, while the core wasm spec doesn't say how a host environment can create multiple instances, it should speak to multiple instances being able to import the same exception tag identity and therefore throw and catch each others exceptions and this would be the basis for, e.g., C++ dynamic linking.

Problem of rethrow instruction

Problem

The current rethrow instruction has some problems that make its use very
limited. For example, suppose there are two try/catch blocks and in both catch
they jump to some common code, which is followed by a rethrow instruction.

block $label0
  try
    ...
  catch i
    br $label$0
  end
  ...
  try
    ...
  catch i
    br $label$0
  end
end

some common code
rethrow

This cannot be supported in the current spec because rethrow can occur only in
the scope of a catch block. But this code pattern is very common especially
when there is some cleanup code (calling destructors) to run before rethrowing
an exception up to a caller. If you compile the code below,

MyClass obj;
try {
  foo(); // might throw
} catch (int n) {
  ...
}
...
try {
  foo(); // might throw
} catch (int n) {
  ...
}

The generated code will look like this:

block $label0
  try
    call $foo
  catch i
    br $label$0
  end
  ...
  try
    ...
  catch i
    br $label$0
  end
end

call MyClass's destructor to destruct obj
rethrow

In this case, cleanup action consists of only a single destructor call, but it
can take more code space if there are many objects to destroy, so I think
duplicating this cleanup code into possibly n catch blocks is not a viable
idea. This is a classic case in which we need a rethrow, but it cannot be
executed at the end of this code because it's not in the scope of a catch.

This is one example of code sharing that's very common, but code sharing between
catch blocks can occur in other cases as well. You can use goto within catch
clauses to jump to a common block. Or any middle-IR-level compiler optimization
pass can factor out some common code in catch blocks.

While this problem can be worked around using not a rethrow but just a normal
throw, throw is considered as throwing a completely new exception, and the
VM wouldn't be able to carry an attached backtrace with it, which can be useful
when we later support backtrace debugging in the future. And more importantly,
this problem effectively makes rethrow unusable at all, because the most
common usecase of it is, as illustrated above, when it occurs after some common
cleanup code which is shared between many catch blocks. It can also occur when
there is shared cleanup code between catch i and catch_all clauses, which
will be very common case as well, but I'm actually planning on proposing
something else for catch clauses... but anyway.

Idea?

This is a rough idea and not a complete spec yet. And it's not I'm proposing
this as a single concrete alternative and I appreciate comments and suggestions.

I think it is necessary to make it possible to access some kind of handle to an
exception object outside a catch block. (The reason it is not a i32 value but a
handle is it can be opaque if it is for a foreign exception) There can be
multiple ways to do it. We can make catch instruction to return a handle, save
it to a local or something, and then use it after we exit a catch block. In this
case, rethrow should take an handle as an argument now.

try
  ...
catch i
  set_local 0, handle
end

get_local 0
use handle

In this case, I think when the VM can destroy an exception object is unclear,
and it can be an issue, maybe? Maybe the VM should maintain a map of handle to
an exception object until the program ends.

Or, to make the VM can destroy exception object when they are not necessary
anymore, we can add some reference count to exception objects, and make a way to
capture an exception handle within a catch block. Suppose capture_exception
instruction captures the current exception handle.

try
  ...
catch_all
  set_local 0, capture_exception
end

get_local 0
use handle

The reference count for an exception starts as 1 when any catch block is
entered.

It will be decremented when
- It hits an try_end instruction.
- When the function execution ends
It will be incremented when
- When capture_exception instruction is executed within a catch block
- When rethrow instruction is executed on the handle.
  This approach is in a way similar to the newly added library functions to the
  C++11 spec:
  std::current_exception, and std::rethrow_exception.

Identity of exception objects across frames

Should exceptions remain strictly equal across call frames?

let wasm_module = /* instantiate a module with appropriate functions */

var saved_exception = undefined

function called_by_wasm() {
  try {
    wasm_module.exports.function_that_throws_exception();
  } catch(e) {
    saved_exception = e
    throw e
  }
}

function call_wasm() {
  try {
    wasm_module.exports.function_that_calls_called_by_wasm();
  } catch (e) {
    console.info(e === saved_exception)
  }
}

Let's assume that wasm_module.exports.function_that_calls_called_by_wasm either does not contain a catch block or catches and rethrows the exception.

Should the console.info line be guaranteed to print true?

One could imagine an implementation strategy for rethrow that reconstructs an equivalent exception rather than throwing the exact same one. Do we want to require rethrow to reuse the same exception object?

I think this constraint would only apply to the JavaScript embedding. I imagine there exist platforms where "reuse the same exception object" doesn't mean much.

Do `try` blocks push a label on the label stack?

I assume that a try block introduces a new label, but I couldn't find it mentioned in the current overview.

e.g.

block
  try
    br 0 ;; does this go to A or B?
  else
    nop
  end
  ;; A
end
;; B

webassembly / exception-handling Goto Github PK

exception-handling's People

Contributors

Stargazers

Watchers

Forkers

exception-handling's Issues

Depths to break out of

Which exception to rethrow

TL;DR

Problem

Proposal

Option 0: Status Quo

Option 1: if_except Always Pops the Exception

Option 2: Pop in Then Branch, Leave in Else Branch

Option 3: Multiple Matches

Proposal on the Spec Changes

Propsed Changes

Try with Relative Depth Argument

Catchless Try Block

Motivation

Background

Structured Control Flow

Problems with the Current Proposal

Why the New Scheme is Better

Code Size

Easier Code Generation

Problem

Idea?

Recommend Projects

Recommend Topics

Recommend Org

Option 1: `if_except` Always Pops the Exception