Git Product home page Git Product logo

binary-reader's People

Contributors

themodmaker avatar

Watchers

 avatar  avatar

binary-reader's Issues

Add assert statements

Asserts exist to validate values within a file are valid. Assertions must be valid for parsing to continue. Custom assertion handling (e.g. ignoring) will be tracked in another issue, for now, they will be fatal. The expression must be "truthy", see #4.

type Example {
  int32 a;
  assert a != 0;
  assert a;  // Same as above

  if (a > 10) {
    int32 b;
  }
  assert b;  // must have been parsed (i.e. a > 10) and not equal to 0
}

Add alias statements

alias statements define a smaller form for a type expression. This allows defining options in a single location. Only explicitly set options are set in the alias, defaults are only resolved when the type is used.

type Example {
  option byte_order = big;
  alias my_int = integer<21, signed>;

  option byte_order = little;
  my_int a;  // signed, little-endian
}

Add custom integer sizes

Currently, only existing integer sizes can be defined. We should add support for custom integer sizes. This issue is just for static sizes, not dynamic values. This issue will also track registering the generic integer type (only aliases like int32 are available now).

Integers are defined as having the following options: size, signedness, and byte_order. Size has no default value and must be set for the type to be valid. Size must be larger than 0 and less than or equal to 64 bits.

type Example {
  int32 x;
  integer<24, signed> y;
  integer<5, unsigned> z;
}

Add support for numeric expressions

Expressions are evaluated parsed values from the file and used to manage conditionals, loops, asserts, and more. Expressions are evaluated as they are seen within the definition file. Expressions are evaluated dynamically during file parsing and have access to already parsed fields and variables.

Expressions will be similar to C++ and will follow their operator precedence (though many will not be applicable). Values will have a dynamic type; values must be of a compatible type, coercion will not be performed.

Numbers are considered compatible with each other and evaluated by value. This means that signed and unsigned numbers interact according to their value and not their native types. Numbers are expanded to larger types if needed, but cannot be larger than native types (i.e. long double, int64_t, or uint64_t).

This only covers numeric expressions; this doesn't cover strings, object, or array access.

type Example {
  int32 a;
  int32 b;
  assert a + b == 0;
  assert a > 0;
}

Add ICU library to handle character encoding

Currently, the only encoding that is supported is UTF-8 and UTF-16 (native). We should use ICU to handle the character encoding. The Codecs API can easily be used to abstract the encoding support. The big problem is that the ICU library doesn't have native CMake support. Plus the only Windows support is with Visual Studio, and doesn't support static builds. So we cannot easily create a standalone static build of ICU.

I started work with a patch that could be applied after checking out the code, but ran into problems with getting it to build correctly.

Add support for static arrays

In addition to single fields, we should add array support. Arrays will be defined as a series of fields in order. This only tracks explicit arrays, not the implicit arrays added by loops. This also only tracks static sized arrays, not dynamically sized arrays.

type Example {
  int32 a;
  int32[8] array;
}

Add loop statements

Loop statements allow reading a block of fields multiple times. Fields within a loop are converted an array where each element is the loop index. Nested loops create nested arrays. Within the loop, the field can be access directly referring to the current element; outside the loop, the field needs to be indexed. If a conditional is placed in a loop, the elements where the condition isn't taken will be set to null.

For for loops, both the init and step must either be missing or a (possibly compound) assignment.

type Example {
  int32 a;
  for (i = 0; i < a; i+=1) {
    int64 b;
    if (b > 0) {
      int32 c;
    }
  }
  assert b == [1, 0, 2]
  assert c == [9, null, 10]
  assert b[0] == 1;

  set x = 9;
  while (x > 0) {
    int32 d;
    set x -= 1;
  }

  do {
    int32 e;
  } while (e > 0);
}

Add strings

Builtin definition:

enum string_type {
  // terminated by a null byte
  cstring,
  // terminated by a null character
  null_char,
  // length (in bytes) prefixed with given number of bytes; uses byte_order to determine endianness
  len1, len2, len4, len8,
}

type string<size, string_type, encoding, byte_order>;

alias cstring = string<cstring>;
alias string1 = string<len1>;
alias string2 = string<len2>;
alias string4 = string<len4>;
alias string8 = string<len8>;

Strings represent a series of characters. Strings have a set encoding which determines how to parse the bytes. They also need to specify either size or string_type to specify how to determine the length of the string. After a string is parsed, it is stored internally as a UtfString. Strings are compared by value, by comparing Unicode code points. It is not normalized first.

Example:

// Default can be controlled with "option" statement
option encoding = 'utf8';

type Example {
  cstring str;  // null byte terminated with current "default" codec (in this case utf8)
  string4 str2;  // 4 byte length prefixed string

  string<23, ascii> str3;  // 23 *byte* length string with given encoding
}

Add option statements

Options are used to control parsing. They can be applied to a specific type or to parsing in general (e.g. error handling). The option statement will be used to control the current option values or their defaults. Options are stored in an Options object, and the system default can be passed in the API or controlled with command line arguments.

option statements are lexically scoped, unlike fields or variables. An option's value is evaluated while parsing and can be dynamic. When setting an enum value, they can be set using a string value or a contextual keyword.

option encoding = 'utf8';  // may appear at top-level

type Example {
  byte is_sign;

  option byte_order = big;  // contextual keyword
  option signed = is_sign ? 'signed' : 'unsigned';  // dynamic value with string

  option signed = 'foo';  // error invalid signedness value
  option signed = is_sign ? signed : unsigned;  // error cannot use contextual keyword in expression

  // aliases use the value they were created with
  int32 x;  // always signed
  integer<32, signed> y;  // also always signed
  integer<32> y;  // based on option statement
}

Add if statements

if statements can provide conditional fields. Fields within the block are skipped and their value will be set to null.

type Example {
  int32 a;
  if (a > 0) {
    int32 b;
  }
  assert a > 0 || b == null;
}

Add alignment option

Alignment controls where in the file that a type can appear. Aligned fields must appear at a multiple offset of the alignment. Alignment is a universal option and any type can have it. Since the alignment is a number (and numbers default to setting the size), the option needs to have its name given. Alignment is given in bits.

type Example {
  alias aligned_int = int32<align=8>;

  aligned_int a;
  aligned_int b;
  integer<3> extra;
  aligned_int c;  // error unaligned field
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.