Git Product home page Git Product logo

etk's People

Contributors

guanqun avatar gzanitti avatar iczc avatar knarz avatar lightclient avatar samwilsn avatar saw-mon-and-natalie avatar sheikh-a avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

etk's Issues

Nice error for mismatched push size

$ cat main.etk
push2 0x00

$ eas main.etk
thread 'main' panicked at 'source slice length (1) does not match destination slice length (2)', /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/slice/mod.rs:3058:13

This should be nicer.

Better errors

I'll probably just start a list of things I run into that could have custom errors:

  • space in selector

Add "builtin" macros to the assembly

Add a couple constant functions to the assembly to make it easier to write. For example (signatures to be bike-shedded):

  • selector(...)
  • include(...)

selector(...)

Description

selector(...) is replaced by the function selector of the given solidity function signature.

Example

push4 selector("updatePool(address)")    ; actually pushes 0x7b46c54f
eq
push :label 
jumpi

include(...)

include(...) is replaced by the opcodes (and constants, etc) from the given path. Only valid outside of an opcode.

Example

main.evm

push :hello
include("other.evm")

other.evm

jumpdest :hello
stop

Output

push :hello
jumpdest :hello
stop

Not possible to specify an opcode directly in hex

I would like to be able to test extension opcodes, that are otherwise invalid, using the assembler.
Unfortunately the parser does not seem to provide any escape mechanism for directly embedding opcodes in the program.

For example, say that I am defining in my EVM impl semantics for the 0xb0 opcode,
I would like to be able to just assemble a program that says 0xb0 or ins(0xb0), but the parser fails me.

Conditional compilation

Sometimes it would be nice to compile things in based on a flag. For example, maybe you want overflow detection to be optional. Here a couple potential approaches:

%import_if(foo, "safemath.etk")
%import_if(!foo, "unsafemath.etk")

or

#if safe_overflow
...
#endif

I think I prefer the latter because I expect inline usage to be common for a couple quick ops.

Add `PREVRANDAO`

At 0x44 DIFFICULTY was replaced with PREVRANDAO in EIP-4399. Though for etk I'd probably leave difficulty and add prevrandao as an alias to it.

ETK Website

Hi @SamWilsn and @lightclient .
This is not really an issue but it is the easiest way I found to be in contact with both of you at the same time.
I took the liberty to start a project to have a small ETK web site (including the playground that Sam asked for).

Many links are broken, or point to temporary places (install for example leads directly to github and should be replaced by a real guide) and even the design is quite "simple" (you can notice it in the playground that is just two painted textarea haha), but I thought it's good that you are aware of it and to have a place where we can discuss ideas.

By the way, I also made a logo for ETK :).
The temporary link to the site is this: https://etk-web.vercel.app/

I hope you like it and do not hesitate to ask for all the features you consider necessary.

P.S: If you have more resources (like https://github.com/quilt/vim-etk) or examples that you have written using ETK and can be added to the resources section, feel free to mention them here and I will add them as soon as I I write that section.

Proper label scoping in macro

In #69 we added a random postfix to label names to create a scope. It would be nice to have better guarantees about the scope and not rely on the small possibility of label collisions.

A few ideas we had:

  • Expand the capabilities of Ingest to support expanding macros. This is not trivial because the Independent scope is truly independent and needs no external data to fully assemble its ops. This is not the same assumption that instruction macros make, because they can use labels from the outer scope and when concretized they need to have an understanding of where they've been invoked in terms of concrete ops.
  • Use a sub-assembler, like with the push macro. Unfortunately it likely falls victim of the same issues as above.
  • Create a new Scope enum in the assembler module and expand the Label type to be a name and scope. This seems messy and it's annoying to have another Scope type.

Lookup function signatures

Would be nice to have a flag for the disassembler that cross references PUSH4 immediates against 4byte.directory.

eas: Provide assembly code as argument

Currently eas expects a file path as input, is there any reason why we cannot put the assembly directly?, this would be helpful when writing small examples:

$ eas --asm "caller push1 00 sstore stop"
3360005500

Add release binaries

Adding release binaries and also nightly releases using GitHub actions with be great.

These binaries then can be used to create a GitHub action that would build repos with ETK code base.

Create built-in instruction macros `%hex("...")` and `%string("...")`

%hex("...")

An instruction macro that inserts the argument literally into the assembled output:

jumpdest
%hex("7FAB")
invalid

Would become:

5b7fabfe

We already have %include_hex, but this would be useful for smaller snippets.

%string("...")

An instruction macro that inserts the argument literally into the assembled output, encoded as UTF-8:

jumpdest
%string("hello world")
invalid

Would become:

5b68656c6c6f20776f726c64fe

This macro is useful for revert reasons.

Macro for writing expressions

The most simplified form of a complex expression currently looks like this:

//((1+2)*3-(4/2) = 7
let expr = Expression::Minus(
    Expression::Times(
        Expression::Plus(
            Terminal::Number(1.into()).into(),
            Terminal::Number(2.into()).into(),
        ).into(),
        Terminal::Number(3.into()).into(),
    ).into(),
    Expression::Divide(
        Terminal::Number(4.into()).into(),
        Terminal::Number(2.into()).into(),
    ).into(),
);

It would be great to write a macro so you can write it like this:

//((1+2)*3-(4/2) = 7
let expr = expr! {
    Minus(
        Times(
            Plus(Number(1), Number(2)),
            Number(3)
        ),
        Divide(Number(4), Number(2))
    )
};

Assertion fails when assembling with two delayed definition instruction macros

I think these two test cases (in etk-asm/src/asm.rs) should pass:

    #[test]
    fn assemble_instruction_macro_two_delayed_definitions_mirrored() -> Result<(), Error> {
        let ops = vec![
            AbstractOp::new(GetPc),
            AbstractOp::Macro(InstructionMacroInvocation {
                name: "macro1".into(),
                parameters: vec![],
            }),
            AbstractOp::Macro(InstructionMacroInvocation {
                name: "macro0".into(),
                parameters: vec![],
            }),
            InstructionMacroDefinition {
                name: "macro0".into(),
                parameters: vec![],
                contents: vec![
                    AbstractOp::new(JumpDest),
                ],
            }
            .into(),
            InstructionMacroDefinition {
                name: "macro1".into(),
                parameters: vec![],
                contents: vec![
                    AbstractOp::new(Caller),
                ],
            }
            .into(),
        ];

        let mut asm = Assembler::new();
        let sz = asm.push_all(ops)?;
        assert_eq!(sz, 3);
        let out = asm.take();
        assert_eq!(out, hex!("58335b"));

        Ok(())
    }
    #[test]
    fn assemble_instruction_macro_two_delayed_definitions() -> Result<(), Error> {
        let ops = vec![
            AbstractOp::new(GetPc),
            AbstractOp::Macro(InstructionMacroInvocation {
                name: "macro0".into(),
                parameters: vec![],
            }),
            AbstractOp::Macro(InstructionMacroInvocation {
                name: "macro1".into(),
                parameters: vec![],
            }),
            InstructionMacroDefinition {
                name: "macro0".into(),
                parameters: vec![],
                contents: vec![
                    AbstractOp::new(JumpDest),
                ],
            }
            .into(),
            InstructionMacroDefinition {
                name: "macro1".into(),
                parameters: vec![],
                contents: vec![
                    AbstractOp::new(Caller),
                ],
            }
            .into(),
        ];

        let mut asm = Assembler::new();
        let sz = asm.push_all(ops)?;
        assert_eq!(sz, 3);
        let out = asm.take();
        assert_eq!(out, hex!("585b33"));

        Ok(())
    }

Parsing error should output original file location

Currently, if parsing fails in a child %import the error only gives the line number.

foo.etk

%import("bar.etk")

bar.etk

push1 0x42
asdf
$ eas foo.etk
Error: parsing failed
Caused by: lexing failed
Caused by:  --> 2:1
  |
2 | asdf␊
  | ^---
  |
  = expected EOI, op, push, local_macro, builtin, or label_definition

Use `;` as statement delimiter

I think etk will benefit from the ability to write simple one-liners, particularly in retesteth. They would look something like this:

push1 42; push1 13; sstore;

Since ; is currently the comment marker, we'll begin using # for comments.

Brainstorm how to insert constants into code while constructing a contract

It'd be nice to make it easier to insert constants into code while constructing a contract.

These are pretty much off the top of my head, so bear with me.

I think I'm partial to Solution 1 because it seems the most general / least opinionated, but Solution 2 probably has less footguns.

Solution 1: Exported Labels

We modify %include to expose the labels of the included file.

main.etk

foo:
push32 0x0000000000000000000000000000000000000000000000000000000000000000

bar:
push32 0x0000000000000000000000000000000000000000000000000000000000000000

ctor.etk

# Copy the runtime code.
%push(end-start)
dup1
%push(start)
%push(0)
codecopy

# Set the constants.
caller
%push(start.foo - start + 1)
mstore

caller
%push(start.bar - start + 1)
mstore

# Return the adjusted code.
%push(0)
return
start:
    %include("main.etk")
end:

Solution 2: %const(...)

We add a new built-in instruction macro %const(n). It errors if compiled directly, but expands to a push32 with some sentinel value (0xdeadc0de...) when included.

Then we modify %include to take extra parameters. Each parameter is the name of a label pointing to where the constant should be written.

Then your initcode can reference those labels to write the constants:

main.etk

%const(0)
%const(1)

ctor.etk

# Copy the runtime code.
%push(end-start)
dup1
%push(start)
%push(0)
codecopy

# Set the constants.
caller
%push(foo - start)
mstore

caller
%push(bar - start)
mstore

# Return the adjusted code.
%push(0)
return
start:
    %include("main.etk", foo, bar)
end:

Expand path elimination of ecfg to consider multiple blocks

ecfg currently only considers each basic block in isolation when determining possible jump targets. It can handle constants, arithmetic, and cases where z3 can prove only certain addresses are possible (ex. pop() * 0 will always be zero.) In all other cases, ecfg is forced to assume that all jump targets are reachable.

For small handwritten programs, this naive approach is sufficient, but even the simplest solidity contract turns into an unreadable mess of paths.

The next step for ecfg is to consider the program flow as a whole, and trace execution paths starting from offset zero, building a more complete picture. In other words, to further improve the control flow graph, the inputs to each block have to come from the preceding blocks.

Nested/local labels

Example

I think we could support something like:

root:
    push1 .nested
    jump
    .nested:
        jumpdest

other:
    .nested:
        push1 root.nested

This example would define the following labels:

  • root at 0
  • root.nested at 3
  • other at 4
  • other.nested at 4

Necessary Considerations

  • Probably should pay special attention how how this interacts with user defined macros (#18)
  • Deeply nested labels root.nested.super_nested
  • Parent references (maybe ..nested or super.nested or something else entirely)

Further Reading

https://www.tortall.net/projects/yasm/manual/html/nasm-local-label.html

Syntax for specifying exact bytecode offset

The output from disease is not immediately importable into eas, which is pretty annoying. I also have a weird function selector jump table implementation that requires precisely positioning instructions. I propose the following syntax to exactly specify an instruction's offset:

        invalid

<a>     jumpdest
        pc

# Leading zeros are ignored.
<0d>    gas

# Leading and trailing whitespace inside the brackets is ignored.
< f >   caller

Gaps between instructions are filled with zeros, and instructions must still be written in increasing order. Labels refer to the next written instruction, and not the fill bytes. A label after a gap refers to the location after the last fill byte. A label before a gap without a subsequent instruction is an error.

The example above would assemble to:

fe0000000000000000005b58005a000033

Strings!

Would be nice to be able to push a string, for revert messages and such.

Arithmetic for immediate arguments

Would be pretty sweet to be able write code that looks like this:

push4 (0x88 + 1) * 0o5

Would be even sweeter if we can write code like:

push :label - 33

%push and %jump macro builtins

Two new builtin macros:

  • %push(addr) -> replaced by the smallest push opcode that can represent addr.
  • %jump(addr) -> replaced by a push (as %push(addr)) and a jump opcode.

These macros are somewhat difficult because the size of label depends on the instructions preceding it.

Issue assembling file in same folder

$ ls
contract.etk
$ eas contract.etk out.hex
Error: Io { source: Os { code: 2, kind: NotFound, message: "No such file or directory" }, backtrace: Backtrace(()), path: Some("") }
$ eas ./contract.etk out.hex
$ ls
contractk.etk    out.hex

I don't think I should need to prepend the target file with ./.

Assembler panic when instruction macro has a push macro inside

%macro bad()
        %push(0x01)
%end
%bad()
$ eas test.etk
thread 'main' panicked at 'internal error: entered unreachable code', etk-asm/src/parse/mod.rs:53:14
stack backtrace:
   0: rust_begin_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
   2: core::panicking::panic
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:48:5
   3: etk_asm::parse::parse_abstract_op
   4: etk_asm::parse::macros::parse
   5: etk_asm::parse::parse_abstract_op
   6: etk_asm::parse::parse_asm
   7: etk_asm::ingest::Ingest<W>::ingest_file
   8: eas::main

eas assembles broken contracts when the body is split in multiple includes

I am trying to compose a contract as follows:

%push(body_end - body_begin)
dup1
%push(body_begin)
push1 0x00
codecopy
push1 0x00
return

body_begin:
%include("A.eas")
%include("B.eas")
...
body_end:

Unfortunately the assembly is broken, as eas generates illegal jumps in the body.
If I lump all the includes together and do a single file, everything works fine.
So this is a bug in the assembly.

Rust-style labels?

I like how we've been using the rust-style github labels for the spec work - should we also use them here? Is there a more correct term for that style of organization?

Define constants

Pretty self explanatory: have some sort of syntax to declare a constant value, and then be able to refer to that value elsewhere in the source.

If trying to use an undefined label and an undefined macro, etk panics

Min example:

%macro revert_if_neq() 
        push1 revert    # [revert, a != b]
%end

%revert_if_neq()
%revert()

Note: revert is used as macro and as label but is not defined.

Panic:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Some(2)`,
 right: `Some(0)`', /home/jochem/.cargo/registry/src/github.com-1ecc6299db9ec823/etk-asm-0.2.1/src/asm.rs:691:17
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
   4: etk_asm::asm::Assembler::expand_macro
   5: etk_asm::asm::Assembler::push
   6: etk_asm::ingest::SourceStack<W>::write
   7: etk_asm::ingest::Ingest<W>::ingest_file
   8: eas::main

Stack manipulation macro

A macro, name to be bikeshedded, that inserts swap, dup, and pop to rearrange the stack:

%stack(
    "a b c",
    "b b c a"
)

ecfg panics on certain contracts

I noticed that ecfg fails on most of the nontrivial contracts that I tried.

Here is an example, contact. AFAIR is just trivial "get uint from slot - put uint to slot" contract compiled with solidity (opt enabled).

Bytecode

0x6080604052348015600f57600080fd5b506004361060325760003560e01c8063b2010978146037578063cfae3217146049575b600080fd5b60476042366004605e565b600055565b005b60005460405190815260200160405180910390f35b600060208284031215606e578081fd5b503591905056fea2646970667358221220158feba571c05db2dfbfcf6d4bfd06d8ff6d697ef52c8e1fbba805a33a17720764736f6c63430008040033

Result of ecfg run

$ RUST_BACKTRACE=1 ./target/release/ecfg -x code.txt 
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `4`,
 right: `0`', etk-dasm/src/blocks/annotated.rs:172:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
   4: etk_dasm::blocks::annotated::AnnotatedBlock::annotate
   5: etk_analyze::cfg::ControlFlowGraph::new
   6: ecfg::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Naive cfg picture

Maybe this could be useful for anyone debugging this. Here is a "naive cfg" of this contract. Each block is symbolically executed on its own: if jump location is static—nodes are connected on the graph, if jump location comes from the stack—I draw (borrowed n) that means that this block ends with a jump to stack[-n] value at the beginning of the block, where stack[-1] is top.

I wonder if panics occurs precisely because of this kind of cfg nodes that jump to non-static locations—precisely nodes "6e" and "42".

image

Support WASM as a build target

With the eventual goal of writing a Remix plugin, we should look into how much effort it would be to support WASM, at least in etk-asm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.