Git Product home page Git Product logo

dessser's Introduction

Code generator tailored at data manipulation

Can generate (de)serialiers, converters, filters...

Supports several back-ends and various external data formats.

As of today:

Backends:

  • OCaml
  • C++

External formats:

  • Clickhouse's row-binary
  • Ramen's ringbuffers
  • CSV
  • S-Expressions
  • JSON

Suggested reading order

  1. DessserTypes.ml: defines the supported data types and operators to manipulate them

Dessser supports most compound data types, up to and including sum and product types (aka. tagged unions and tuples). Recursive types are supported to some extend. There is also limited support for user defined types. There is no support for type parameters, though. In other words, users cannot define polymorphic types.

Types are organized in two abstraction layers:

  • the types that can store user manipulable values, belonging to the type named typ. Most of those types can be (de)serialized and manipulated in many ways (the exceptions being the types used to implement serializers themselves, such as the pointer types etC).

  • often times, values (of some value type) are optional (aka null or unknown). So the maybe_nullable type extends the typ type with a boolean indicating whether these values can be null.

Notice that NULL in dessser behaves like SQL's NULL rather than ML language option types (Haskell Maybe or OCaml's option) in that any combination of NULLs collapse into one; for instance, NotNull (NULL) is not a valid value.

  1. DessserExpressions.ml: Although for technical reasons the type of expressions expr is defined in DessserTypes.ml, most of functions on expressions are defined in DessserExpressions.

The level of abstraction offered by the expression language tries to maintain a good balance between simplicity for the user and for the back-end.

  1. DessserBackEndOCaml.ml implements the OCaml back-end (the simplest)

  2. Dessser.ml implements the (de)serializers and converters (parameterized with encodings)

Given a data type and an encoding, Dessser.ml can generate a serializer, a deserializer, a converter between two encoding, etc.

Note that converters generated by Dessser do not store intermediary values in memory; instead of desserializing the whole value into the heap and then serializing it, it performs the conversion piecewise so that the full value is actually never materialized, to save time and memory. See DessserHeapValue to build such a fully-fledged value in memory.

  1. DessserSExpr.ml implements the simplest encoding: s-expressions

  2. DessserHeapValue.ml implement the special encoding for values stored in memory

Deserializing a value consists of converting from serial buffer into a memory "reified" value. DessserHeapValue construct an expression that will build that value (in any chosen back-end).

Likewise, serializing a value consists of building an expression that iterate through a memory value and write it in a buffer.

Additionally, DessserHeapValue can also build an expression that computes the size of the serialized value, without serializing it (come handy if preallocating the buffer is necessary).

  1. DessserStdLib.ml implements various meta-functions, generating expressions from expressions and acting like a library for Dessser's intermediary language.

dessser's People

Contributors

darlentar avatar rixed avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

darlentar axiles

dessser's Issues

Add default values to types and codecs

So that we can save space on the wire (for the encodings that allow it).
Also, makes it nicer for manually entered values (such as in JSON)

Actually, not all encoding formats allow this.
RamenRingBuf, for instance, has no way to skip a value, Every non null values are expected to appear in a specific order. For skipping values we would need to adapt the nullbit mask to have one bit per nullable or default-able value. Actually, that would be a presence mask, and nullable values would just have an implicit default to null. Alternatively, we could have an implicit default value for any type, and have one bit per field.

Some introspection for generated types

For instance, a const betraying the type, then a function providing the label name for enums (or erring out for non enum likes), etc.
The idea is that it should be possible to write static code that dynamically adapts to any type.
Required in C++ backend first but the API would benefit from being the same for other backends.

Detect code duplication

One of the most challenging issue when using dessser is to forget to bind largish chunks of code to a stage-1 variable, leading to very large generated code, which is not always easy to spot when looking at the DIL code, since it is large and repetitive by nature.

There could be a function we call explicitly before transpiling to the backend, that tries to detect that this occurred and warn.
Ie. detection of common subtrees in large tree.

Random value generator for a given type

Can be used to test actual serializer/deserializers by checking a given value can go through ser/deser without loss, for every backend and format.

The simplest is to generate values as S-Expressions that can then be unserialized using the S-expression DESserializer.

DIL type notation should accept a name hint for types

For instance, [ A | B ] AS a_or_b should make backend code generator use that name for that type.
This, of course, does not make a_or_b an external type ($a_or_b is invalid).

Could also come handy for improving support for recursive types.

Another type error involving external types

Fatal error: exception Type Error: In expression
    (apply (ext-identifier test_ext of-ringbuf) (identifier "with_nullbit_done2_fst_126"))
expression
    (identifier "with_nullbit_done2_fst_126")
should be a (Ptr; (Ptr; Size)[[]]) but is a Ptr
Raised at file "src/DessserExpressions.ml", line 1068, characters 10-60
Called from file "src/batList.mlv", line 1023, characters 6-11
Called from file "src/DessserTypeCheck.ml" (inlined), line 111, characters 25-44
Called from file "src/DessserTypeCheck.ml", line 192, characters 12-33
Called from file "src/DessserExpressions.ml", line 1241, characters 15-29
Called from file "src/DessserExpressions.ml", line 1252, characters 15-30
Called from file "src/DessserExpressions.ml", line 1241, characters 15-29
Called from file "src/DessserExpressions.ml", line 1252, characters 15-30
Called from file "src/DessserExpressions.ml", line 1244, characters 15-30
Called from file "src/DessserExpressions.ml", line 1252, characters 15-30
Called from file "src/DessserExpressions.ml", line 1244, characters 15-30
Called from file "src/DessserExpressions.ml", line 1231, characters 15-30
Called from file "src/DessserTypeCheck.ml" (inlined), line 15, characters 2-1023
Called from file "src/DessserCompilationUnit.ml", line 123, characters 13-33
Called from file "src/DessserHeapValue.ml", line 331, characters 29-56
Called from file "src/dessserc.ml", line 123, characters 27-59
Called from file "list.ml", line 117, characters 24-34
Called from file "src/dessserc.ml", line 156, characters 17-65
Called from file "src/dessserc.ml", line 708, characters 13-17

Can't use "this" and "this subtype" in the same type definition

Fatal error: exception Unknown type "t". Only known types are: subtype
Raised at file "src/DessserTypes.ml", line 1083, characters 4-26
Called from file "src/DessserTypes.ml", line 1166, characters 15-26
Called from file "src/batArray.mlv", line 483, characters 12-49
Called from file "src/DessserTools.ml", line 134, characters 6-26
Called from file "list.ml", line 221, characters 17-20
Called from file "src/DessserTypes.ml", line 1122, characters 10-21
Called from file "src/DessserTypes.ml", line 1107, characters 20-33
Called from file "src/DessserTypes.ml", line 1118, characters 55-63
Called from file "array.ml", line 103, characters 21-40
Called from file "src/DessserTypes.ml", line 1118, characters 25-69
Called from file "src/DessserTypes.ml", line 1145, characters 12-20
Called from file "list.ml", line 106, characters 12-15
Called from file "src/dessserc.ml", line 90, characters 2-32
Called from file "src/dessserc.ml", line 708, characters 13-17

Type error when generating code for ringbuf or JSON encoder

Testing dessserc
Testing I/O encoding ringbuf
...on tests/test_shrink_bug.type
Fatal error: exception Type Error: In expression
    (apply (myself "(THIS; Ptr)") (if (eq (unsafe-nth (u8 0) (identifier "repeat_n_326")) (i32 0)) (identifier "dlist2_snd_329") (identifier "dlist2_snd_329")))
expression
    (if (eq (unsafe-nth (u8 0) (identifier "repeat_n_326")) (i32 0)) (identifier "dlist2_snd_329") (identifier "dlist2_snd_329"))
should be a Ptr but is a (Ptr; (Ptr; Size)[[]])
Raised at file "src/DessserExpressions.ml", line 1062, characters 10-60
Called from file "src/batList.mlv", line 1023, characters 6-11
Called from file "src/DessserTypeCheck.ml" (inlined), line 111, characters 25-44
Called from file "src/DessserTypeCheck.ml", line 191, characters 12-33
Called from file "src/DessserExpressions.ml", line 1235, characters 15-29
Called from file "src/DessserExpressions.ml", line 1267, characters 15-29
Called from file "src/DessserExpressions.ml", line 1246, characters 15-30
Called from file "src/DessserExpressions.ml", line 1238, characters 15-30
Called from file "src/batList.mlv", line 239, characters 23-28
Called from file "src/DessserExpressions.ml", line 1218, characters 15-40
Called from file "src/DessserExpressions.ml", line 1262, characters 15-29
Called from file "src/DessserExpressions.ml", line 1238, characters 15-30
Called from file "src/batList.mlv", line 239, characters 23-28
Called from file "src/DessserExpressions.ml", line 1218, characters 15-40
Called from file "src/DessserExpressions.ml", line 1238, characters 15-30
Called from file "src/DessserExpressions.ml", line 1246, characters 15-30
Called from file "src/DessserExpressions.ml", line 1238, characters 15-30
Called from file "src/DessserExpressions.ml", line 1235, characters 15-29
Called from file "src/DessserExpressions.ml", line 1235, characters 15-29
Called from file "src/DessserExpressions.ml", line 1266, characters 15-29
Called from file "src/DessserExpressions.ml", line 1246, characters 15-30
Called from file "src/DessserExpressions.ml", line 1238, characters 15-30
Called from file "src/DessserExpressions.ml", line 1235, characters 15-29
Called from file "src/DessserExpressions.ml", line 1246, characters 15-30
Called from file "src/DessserExpressions.ml", line 1238, characters 15-30
Called from file "src/DessserExpressions.ml", line 1235, characters 15-29
Called from file "src/DessserExpressions.ml", line 1225, characters 15-30
Called from file "src/DessserTypeCheck.ml" (inlined), line 15, characters 2-1023
Called from file "src/DessserCompilationUnit.ml", line 123, characters 13-33
Called from file "src/dessserc.ml", line 123, characters 27-59
Called from file "list.ml", line 117, characters 24-34
Called from file "src/dessserc.ml", line 156, characters 17-65
Called from file "src/dessserc.ml", line 708, characters 13-17
make: *** [Makefile:456: dessserc-check] Error 2

Clearly this is because those 2 encoding formats uses a custom pointer instead of just Ptr.
Some function, somewhere, is not abstract enough and assume a Ptr.

OsX: C++ compilation error

 [48 1! / 111] >check_heapvalue>src/DessserQCheck.ml:1125                      /var/folders/x6/4gb9ck5d4kb6s97zqqh8vvmh0000gn/T/dessser_converter_b0b81a.cc:150:69: error: no matching constructor for initialization of
      '::dessser::gen::test::t3c0e2f6ceb2032915f19660622878e72'
  ...id_20113 { .v_96a026eaf0977b4366de8e19d25b7792_ejgvx = drec_fst_4864, .v_96a026eaf0977b4366de8e19d25b7792_kngke = drec_fst_4867 };
     ^        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Random type and expression generator

To use with quickcheck, testing that:

  • evey expressions that type-check can be compiled in all backends
  • every type can be printed and also parsed back into the same type
  • every expressions can be printed and parsed back (requires a thorough overhaul of expression printing)

Very slow JSON parsing for some invalid input

With that JSON (out of ramen alerting configuration):

{"via": {"Kafka": { "options": [("topic.message.timeout.ms","5000")], "topic": "alert-events", "partition": 0, "text": "yada" }}}

Note: it should be ... "options": [["topic.message.timeout.ms","5000"]],..., it is apparently confused by the parentheses.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.