themodmaker / binary-reader Goto Github PK
View Code? Open in Web Editor NEWA tool to read and update binary files.
License: Apache License 2.0
A tool to read and update binary files.
License: Apache License 2.0
Asserts exist to validate values within a file are valid. Assertions must be valid for parsing to continue. Custom assertion handling (e.g. ignoring) will be tracked in another issue, for now, they will be fatal. The expression must be "truthy", see #4.
type Example {
int32 a;
assert a != 0;
assert a; // Same as above
if (a > 10) {
int32 b;
}
assert b; // must have been parsed (i.e. a > 10) and not equal to 0
}
alias
statements define a smaller form for a type expression. This allows defining options in a single location. Only explicitly set options are set in the alias, defaults are only resolved when the type is used.
type Example {
option byte_order = big;
alias my_int = integer<21, signed>;
option byte_order = little;
my_int a; // signed, little-endian
}
Currently, only existing integer sizes can be defined. We should add support for custom integer sizes. This issue is just for static sizes, not dynamic values. This issue will also track registering the generic integer
type (only aliases like int32
are available now).
Integers are defined as having the following options: size, signedness, and byte_order. Size has no default value and must be set for the type to be valid. Size must be larger than 0 and less than or equal to 64 bits.
type Example {
int32 x;
integer<24, signed> y;
integer<5, unsigned> z;
}
Expressions are evaluated parsed values from the file and used to manage conditionals, loops, asserts, and more. Expressions are evaluated as they are seen within the definition file. Expressions are evaluated dynamically during file parsing and have access to already parsed fields and variables.
Expressions will be similar to C++ and will follow their operator precedence (though many will not be applicable). Values will have a dynamic type; values must be of a compatible type, coercion will not be performed.
Numbers are considered compatible with each other and evaluated by value. This means that signed and unsigned numbers interact according to their value and not their native types. Numbers are expanded to larger types if needed, but cannot be larger than native types (i.e. long double
, int64_t
, or uint64_t
).
This only covers numeric expressions; this doesn't cover strings, object, or array access.
type Example {
int32 a;
int32 b;
assert a + b == 0;
assert a > 0;
}
Currently, the only encoding that is supported is UTF-8 and UTF-16 (native). We should use ICU to handle the character encoding. The Codecs API can easily be used to abstract the encoding support. The big problem is that the ICU library doesn't have native CMake support. Plus the only Windows support is with Visual Studio, and doesn't support static builds. So we cannot easily create a standalone static build of ICU.
I started work with a patch that could be applied after checking out the code, but ran into problems with getting it to build correctly.
In addition to single fields, we should add array support. Arrays will be defined as a series of fields in order. This only tracks explicit arrays, not the implicit arrays added by loops. This also only tracks static sized arrays, not dynamically sized arrays.
type Example {
int32 a;
int32[8] array;
}
Loop statements allow reading a block of fields multiple times. Fields within a loop are converted an array where each element is the loop index. Nested loops create nested arrays. Within the loop, the field can be access directly referring to the current element; outside the loop, the field needs to be indexed. If a conditional is placed in a loop, the elements where the condition isn't taken will be set to null
.
For for
loops, both the init and step must either be missing or a (possibly compound) assignment.
type Example {
int32 a;
for (i = 0; i < a; i+=1) {
int64 b;
if (b > 0) {
int32 c;
}
}
assert b == [1, 0, 2]
assert c == [9, null, 10]
assert b[0] == 1;
set x = 9;
while (x > 0) {
int32 d;
set x -= 1;
}
do {
int32 e;
} while (e > 0);
}
Builtin definition:
enum string_type {
// terminated by a null byte
cstring,
// terminated by a null character
null_char,
// length (in bytes) prefixed with given number of bytes; uses byte_order to determine endianness
len1, len2, len4, len8,
}
type string<size, string_type, encoding, byte_order>;
alias cstring = string<cstring>;
alias string1 = string<len1>;
alias string2 = string<len2>;
alias string4 = string<len4>;
alias string8 = string<len8>;
Strings represent a series of characters. Strings have a set encoding which determines how to parse the bytes. They also need to specify either size
or string_type
to specify how to determine the length of the string. After a string is parsed, it is stored internally as a UtfString
. Strings are compared by value, by comparing Unicode code points. It is not normalized first.
Example:
// Default can be controlled with "option" statement
option encoding = 'utf8';
type Example {
cstring str; // null byte terminated with current "default" codec (in this case utf8)
string4 str2; // 4 byte length prefixed string
string<23, ascii> str3; // 23 *byte* length string with given encoding
}
Options are used to control parsing. They can be applied to a specific type or to parsing in general (e.g. error handling). The option
statement will be used to control the current option values or their defaults. Options are stored in an Options
object, and the system default can be passed in the API or controlled with command line arguments.
option
statements are lexically scoped, unlike fields or variables. An option's value is evaluated while parsing and can be dynamic. When setting an enum value, they can be set using a string value or a contextual keyword.
option encoding = 'utf8'; // may appear at top-level
type Example {
byte is_sign;
option byte_order = big; // contextual keyword
option signed = is_sign ? 'signed' : 'unsigned'; // dynamic value with string
option signed = 'foo'; // error invalid signedness value
option signed = is_sign ? signed : unsigned; // error cannot use contextual keyword in expression
// aliases use the value they were created with
int32 x; // always signed
integer<32, signed> y; // also always signed
integer<32> y; // based on option statement
}
if
statements can provide conditional fields. Fields within the block are skipped and their value will be set to null
.
type Example {
int32 a;
if (a > 0) {
int32 b;
}
assert a > 0 || b == null;
}
Alignment controls where in the file that a type can appear. Aligned fields must appear at a multiple offset of the alignment. Alignment is a universal option and any type can have it. Since the alignment is a number (and numbers default to setting the size), the option needs to have its name given. Alignment is given in bits.
type Example {
alias aligned_int = int32<align=8>;
aligned_int a;
aligned_int b;
integer<3> extra;
aligned_int c; // error unaligned field
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.