Git Product home page Git Product logo

poggle's People

Contributors

iximeow avatar

Watchers

 avatar  avatar  avatar

poggle's Issues

Set operations on value ranges

Ex:

root := 0x07-0xff | 0x01
root := 0x01 | 0x03 | 0x06 | 0x10-0x20 ! 0x1f

to consider: operator precedence
| and & should probably have the same precedence as !, applied all left to right.

Also consider that negations can have ranges as well:

root := 0x00-0x0f | 0x04-0x0e

TL;DR: & == intersection, | == union, ! <set> == & negation <set>

Match backtracking is improperly handled

with the grammar

:components:
A := 0x04
B := 0x30
C := 0x24
D := (A|B):C
E := (A:B)|C

root := D:E

on the input
0x30 0x24 0x04 0x30

poggle inconsistently marks where to backtrack to, which results in an operation like @bytes[nil].

error looks like

/toy/poggle/parser_parts/matcher.rb:11:in `[]': no implicit conversion from nil to integer (TypeError)
    from /toy/poggle/parser_parts/matcher.rb:11:in `next'
    from /toy/poggle/parser_parts/primitives/byte_body.rb:15:in `match'
    from /toy/poggle/parser_parts/body_proxy.rb:11:in `match'

Allow explicit alignment requirements

Something like

A: byte{2} aligned 4 := 0x06:0x50

that would only allow A to match if at a 4-byte alignment. Slightly more concrete example:

Data: byte{4} aligned 4
IndexingInfo: byte{3} := 0x01:_
root := (IndexingInfo:_:Data){_}

such that in future alterations of the grammar it's known that any Data must lie on the alignment boundary. It can then error if that condition is not met. This applies more with computed offsets (another issue for the future)

There may be merit in adding a simple form to express padding as well, but I can't think of an example.

Support data at computed offsets

This one is hard.

It should be possible to write that an instance of a referenced rule must be found at a particular offset into the data stream.

Example:

A := 0x05:0x06:0x07:size\ASize:data\Data{size}:0x00
ASize: byte{2}
Data: byte

Offset: byte{2}
root := offset\Offset:A@offset

Things to think about:

  • How to handle the data stream when doing this?
    • 00 01 02 03 04 05 06 07 08
    • If 04 05 are read due to an offset which of the following should the data stream look like?
      • 00 01 02 03 06 07 08
      • 00 01 02 03 04 05 06 07 08
  • Will have to permit arithmetic on the offset
  • How to handle data between $curr_loc and the offset?
  • If there's a sequence of entries, should there be some nesting behavior?
  • In pathological cases of sequential data/offset information, especially if the mapping isn't linear, this will definitely involve calling external functions.

Real string support

cString, raw string, and unicode string support DOESN'T really exist. need to make that a thing.

Recursion causes stack overflow on parser creation

Example:

a := b:"1"
b := a:"0"|"2"

Should be able to parse "2101010101010...." but instead overflows the stack by greedily duplicating the mutually recursive rules.

a := a:1|0 also demonstrates this issue.

Rewriting input streams

Ran into a major headache working with the instruction set for the MSP430 microcontroller. First, notice the instruction set layout on this page.

The poggle grammar for instruction forms with no operands is simple and rather pleasant:

byteOpFlag: bit{1}
pcOffset:bit{10}

JNE := b000
JEQ := b001
JNC := b010
JC  := b011
JN  := b100
JGE := b101
JL  := b110
JMP := b111

jumpCondition: bit{3} :=
  JNE | JEQ | JNC | JC | JN | JGE | JL | JMP

noOp := b001:jumpCondition:pcOffset

where this correctly parses any jump-like instruction.

One operand instructions are a little trickier, building from the above:

RRC := b000
SWPB:= b001
RRA := b010
SXT := b011
PUSH:= b100
CALL:= b101
RETI:= b110

oneOpCode: bit{3} :=
  RRC | SWPB | RRA | SXT | PUSH | CALL | RETI

oneOpDestMode: bit{2}

destReg: bit{4}

oneOp := b000100:oneOpCode:byteOpFlag:oneOpDestMode:destReg

which parses one-op instructions correctly. It would be nice to parse out the destination addressing mode as the register-aware values Register direct, Indexed, Register indirect, Indirect auto-increment, Symbolic, Immediate, Absolute, or one of the various constants, but it's an acceptable loss for now. It can even be done with a few dozen extra matching rules.

But for trying to parse out addressing types directly on two operand instructions, the problem becomes very clear just from the structure:
twoOp := twoOpCode:sourceReg:twoOpDestMode:byteOpFlag:twoOpSourceMode:destReg

Because source, destMode, sourceMode, and dest are all interwoven, in order to parse out the addressing mode on two operand instructions there needs to be an exponential number of matching rules on the size of number of rules components range over. So in this case, there needs to be roughly sourceReg*twoOpDestMode*byteOpFlag*twoOpSourceMode*destReg number of matching rules, which comes out to about 1024 rules!

Alternatively, we can rewrite the input stream to look like
twoOpCode:byteOpFlag:twoOpSourceMode:sourceReg:sourceReg:twoOpDestMode:destReg:destReg, consuming the addressing mode/register pairs at once to read out an "enriched" addressing mode. This brings the number of total matching rules back down to around 15 or so, roughly linear on the number of matching rules the gap spans over.

Proposed syntax:

twoOp := twoOpCode:sourceReg:twoOpDestMode:byteOpFlag:twoOpSourceMode:destReg
twoOpRewrite $= twoOp =>
  twoOpCode:byteOpFlag:twoOpSourceMode:sourceReg:sourceReg:twoOpDestMode:destReg:destReg
twoOpPrime := twoOpCode:byteOpFlag:twoOpSourceModePrime:sourceReg:twoOpDestModePrime:destReg

with usage like

root := twoOpRewrite : twoOpPrime

where twoOpRewrite transparently rewrites the input it matches on.

Size inference fails when using bit-values

A rule like
X := b00100100
works on its own, but when mixed with other rules like

X := b00101101
Y := 0x05
Z := X:Y

the size inference step fails on Z with a NoMethodError.

Already fixed, will have PR up in a jiffy.

Unbounded-size expressions cause errors

When trying to parse

:functions:
buildShort(byte{2}): byte{2}
foo(byte{_}): buildShort

:components:
B: byte{_}
root := b\B

Poggle ends up attempting to call .force on an UnboundedSize in rule_body:14, trying to compute the size of the argument to foo. Poggle would end up building an AnyBytes out of this value, and AnyBytes' matching isn't terminated. As a result, expressions like byte{_}:0x50 on input like 0x00 0x50 would consume all input for the bytes, then fail to match, when the correct behavior is to return a result like [[0x00], 0x50].

Future notes: Poggle should be able to handle {_} in a greedy and non-greedy manner, possibly requiring a change to the syntax?

Friendly reminder to the maintainers to come up with a consistent way to express match results!

Scoping of variables is incorrect

A := v\byte{2}
B := v\byte{2}

root := A:B

This causes an error thinking v is being set in two different places, when the scope should be different in A and B.

Permit comments in grammars

Consider:

# this is where the magic happens
root := 0x50:0x60:0x70:0x80{4}

Currently # lines are just... parsed like anything else.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.