Git Product home page Git Product logo

gencode's Introduction

Gencode

Gencode is a code generation based data serialization/deserialization system. Gencode attempts to both encode/decode fast, and have a small data size.

Code is generated from a schema that is similar to native Go semantics, though there are a few differences/additions

For example:

struct Person {
  Name string
  Age uint8
}

Run through using gencode go -schema test.schema -package test

Yields:

package test

import (
	"io"
	"time"
	"unsafe"
)

var (
	_ = unsafe.Sizeof(0)
	_ = io.ReadFull
	_ = time.Now()
)

type Person struct {
	Name string
	Age  uint8
}

func (d *Person) Size() (s uint64) {
 ...
}
func (d *Person) Marshal(buf []byte) ([]byte, error) {
 ...
}
func (d *Person) Unmarshal(buf []byte) (uint64, error) {
 ...
}

(bulk removed for size reasons)

Data Types

Struct

Structs are built, similar to native Go, from various fields of various types. The format is slightly different, putting the struct keyword in front of the name of the struct and dropping the type keyword, in order to differentiate Gencode schemas from Go code. Structs may optionally be "framed", adding Serialize and Deserialize functions taking a io.Writer or io.Reader respectively. These structs have a prefixed vuint64 for the length of the whole struct, minus the prefix length. This allows efficient reading from network sockets and other streams.

Int

Integer data types consist of both signed and unsigned ints, in 8, 16, 32, and 64 bit lengths. In addition, any type can be varint encoded by prefixing it with the letter v. Some examples:

  • uint16
  • vuint32
  • vint64
  • int32

Float

Float types are allowed in either 32 or 64 bit lengths.

String

Strings are encoded with a prefixed vuint64 for length, so short strings only require a 1 or 2 byte prefix, but strings of practically any length can be used.

Byte

Bytes are basically an alias to uint8, though there is an optimization for a slice of bytes, i.e. []byte

Bool

Bools are stored as either a 0 or a 1 for false or true

Fixed Length Arrays

Fixed Length Arrays as encoded as the designated number of elements, with no length prefix. Note that the number of elements is fixed, but the elements themselves may take a variable number of bytes to actually encode. Examples:

  • [5]vuint64
  • [16]float64

Slices

Slices, as in go, are a variable length sequence that can be made out of any other valid gencode type. Slices are also prefixed with a vuint64 for length. Examples:

  • []byte
  • [][]int64

Pointers

Pointers translate directly into pointers on the Go struct as well, and are also used to allow potentially empty fields. A pointer field has a "prefix" of 1 byte, though if that byte is 0, the field will be set to nil, and there will be no more data for that field in the marshalled data.

Tagged Unions

Tagged Unions are one of the high points of the Gencode format and system. There is no direct match in the Go language itself, so tagged unions are handled on the Go side using interfaces. Tagged unions have a prefix vuint64 specifying the actual type of the field, and that field's serialization semantics then take over. This allows widely disjoint data types to be stored in the same field. While tagged unions can use all other types in the gencode system, the standard use is to use structs defined in the schema. Example:

struct Subscribe {
  Topic string
}

struct Unsubscribe {
  Topic string
}

struct Message {
  Request union {
    Subscribe
    Unsubscribe
  }
}

Message.Request can contain either a Subscribe or an Unsubscribe. The field itself is declared as an interface{}, and you can type switch on it. Alternatively, you can give an interface name to use:

struct Message {
  Request union Command {
    Subscribe
    Unsubscribe
  }
}

The Request field will be declared of type Command, which must be an interface that all the types in that union implement.

Speed

Gencode encodes to smaller amounts of data, and does so very fast. Some benchmarks (using schemas and test files located in the bench folder):

Gencode encoded size: 48
GOB encoded size: 182
GOB Stream encoded size: 62
JSON encoded size: 138
MSGP encoded size: 115
PASS
BenchmarkFixedBinarySerialize-8          2000000               894 ns/op
BenchmarkFixedBinaryDeserialize-8        3000000               539 ns/op
BenchmarkGencodeSerialize-8             10000000               174 ns/op
BenchmarkGencodeDeserialize-8           10000000               219 ns/op
BenchmarkFixedGencodeSerialize-8        20000000                75.7 ns/op
BenchmarkFixedGencodeDeserialize-8      100000000               20.7 ns/op
BenchmarkGobSerialize-8                   200000              9370 ns/op
BenchmarkGobDeserialize-8                  30000             40337 ns/op
BenchmarkGobStreamSerialize-8            1000000              1694 ns/op
BenchmarkGobStreamDeserialize-8          1000000              2125 ns/op
BenchmarkJSONSerialize-8                  500000              2780 ns/op
BenchmarkJSONDeserialize-8                300000              5263 ns/op
BenchmarkMSGPSerialize-8                 5000000               277 ns/op
BenchmarkMSGPDeserialize-8               2000000               608 ns/op

gencode's People

Contributors

andyleap avatar hyangah avatar knadh avatar wblakecaldwell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gencode's Issues

package support

Is there a way to convince gencode to use other package header than main?
It seems by default the package is main.

How to use gencode?

Do you have installation of gencode tutorial or something to work with? Thanks.

[Enhancement] Support struct tags in schema

Firstly, thanks for the library!

In a reasonably large codebase, structs generated by gencode from schema may be shared by various parts. A simple example would first populating a struct with a json.Unmarshal and then marshalling with struct.EncodeMsg. Here, the first step may require json tags.

As things stand, there are two approaches, both highly impractical.

  1. Generate struct using gencode and manually edit the generated struct and add tags.
  2. In addition to the schema, define Go structs in the codebase with tags that match the schema and copy values between these and gencode's generated structs.

Proposed solution
Add tag support to schema definitions, where they are simply copied over to the generated structs. They needn't change any existing logic in gencode, nor do they have to affect gencode's encoding or decoding logic.

eg:

struct Person {
	Name    string   `json:"name" redis:"name"`
	Age     int
}

Generated struct in x.gen.go

struct Person type {
	Name    string   `json:"name" redis:"name"`
	Age     int
}

I can see that this is possible by tweaking the grammar in parser.go, but I'm not really able to figure out how to do it with andyleap/parser as it has no documentation.

Thank you.

How to use its executable

The Readme.md does not describe where to find the gencode executable. Its not present as a linux package as well

Generated code corrupts the data

Try

// fails, UserId and SessionID will get corrupted
struct GenSession {
    UserId  int64
    SessionId int64
    IpAddr [16]byte
    Start int64
    End int64
}

VS
// works
struct GenSession {
    IpAddr [16]byte
    UserId  int64
    SessionId int64
    Start int64
    End int64
}

allow custom marschallers

Allow for custom types that are able to marschall themselves.

For example, the following should assume that the type Data in package A has the Size, Marschall and Unmarschall functions and use those to generate code.

struct Person {
  Name string
  Age A.Data //sizeMarschal
}

It might not be practical for every struct to have a .Size method(for whatever reason) so the following should get the bytes and prefix the length itself.

struct Person {
  Name string
  Age A.Data //Marschal
}

EDIT: when only the Marschal type is specifiend dont generate a size Method, or only support the sizeMarschal type.

How to parse a json array()

I am migrating from python to golang, and am struggling quite a lot. I have a problem something like this with GBs of lines (speed very important) each containing json array:

------ file xy.log with lots of lines-------
[{"str1":"hello","value1":1.2,"value2":3.14,"time":12345678912},{"str1":"world","value1":2.1,"value2":4.43,"time":12345678913}]
[{"str1":"hello","value1":1.2,"value2":3.14,"time":12345678912},{"str1":"world","value1":2.1,"value2":4.43,"time":12345678913},{"str1":"dinner","value1":12.1,"value2":24.43,"time":12345678914}]
[{"str1":"hello","value1":1.2,"value2":3.14,"time":12345678912},{"str1":"world","value1":2.1,"value2":4.43,"time":12345678913}]

I wonder if it is possible to iterate via lines and parse the json array? Any help very much appreciated.

On the optimization of large slice

I see that when I convert the slice into a byte array, I traverse it. If you can judge the slice length here, when the slice length is too large, consider using other conversion methods.

for example:
arr is a big slice.
arr := make([]float32,1000)
header := *(*reflect.SliceHeader)(unsafe.Pointer(&arr))
header.Len = len(arr) * 4
header.Cap = header.Len
bytes := ([]byte)(unsafe.Pointer(&header))

Not Working for me

I'm new to this whole code-generated data serialization thingy... and I hope I followed the correct steps

  1. my main() code, named gencoder.go
package main

import "fmt"

func main() {
    entry := Datadoc{
        ID: "45678911452457710", 
        Name: "Test Item", 
        Desc: "This is a test description. Length isn't really a matter here", 
        Numb: 8, 
        AnArray: []string{"as54dasd897asd","as5d456a4sd4756","as8d7as987da7sd8f"}, 
    }
    buf, _ := entry.Marshal(nil)
    fmt.Printf("Gencode'd size: %v\n", len(buf))
    fmt.Printf("Gencode'd data: %s\n", buf)
}
  1. my schema, named new.schema
struct Datadoc {
    ID string
    Name string
    Desc string
    Numb int32
    AnArray []string
}
  1. I run the command gencode go -schema new.schema -package main and I see the newly generated file named new.schema.gen.go in the same folder as the new.schema and gencoder.go

Now when I do go run gencoder.go in my working directory, I get the following error:

# command-line-arguments
.\gencoder.go:13: entry.Marshal undefined (type Datadoc has no field or method Marshal)

Anything I'm missing? BTW: I'm using Windows XP Pro SP3

Panics on truncated input

$ ls
gencode.schema  main.go
$ cat main.go 
package main

//go:generate gencode go -schema gencode.schema

func main() {
	var x Foo
	x.Unmarshal([]byte{})
}
$ cat gencode.schema 
struct Foo {
  Bar byte
}
$ go generate
$ go run main.go gencode.schema.gen.go 
panic: runtime error: index out of range

goroutine 1 [running]:
main.(*Foo).Unmarshal(...)
	/home/tv/go/src/eagain.net/2018/gencode-bug/gencode.schema.gen.go:45
main.main()
	/home/tv/go/src/eagain.net/2018/gencode-bug/main.go:7 +0x11
exit status 2
$ 

Slice of struct pointers

struct Rider {
Name string
}

struct Car {
Name string
Riders []*Rider
}

serial $ gencode go -schema car.schema -package car
2016/05/24 21:09:05 Error generating output: template: marshal:4:39: executing "marshal" at <.Offset>: Offset is not a field of struct type golang.PointerTemp

Can i do this ? If i remove the pointer "*" it works

Question - why all the i+0 ?

One question - in the generated code I see many instances of i+0. I'm pretty sure that the compiler will optimize those out, but I'm curious why they are there.

Unmarshal lost 1 byte

struct A {
B uint16
C uint16
D uint32
E uint32
}

struct F {
G uint16
H uint16
I []A
}

A only 11 bit

Feature Request: Gods containers

I use the generic container library [https://github.com/emirpasic/gods] quite frequently instead of go maps and slices. Would it be incredibly difficult to add a backend for these types?

union type not work ok

I copy the example union type as test.schema:

struct Subscribe {
  Topic string
}
struct Unsubscribe {
  Topic string
}
struct Message {
  Request union{
    Subscribe
    Unsubscribe
  }
}

then ./gencode go -schema test.schema -package test get error:
2016/03/24 15:19:47 Error generating output: template: size:5:27: executing "size" at <.Structs>: Structs is not a field of struct type golang.UnionTemp

byte array length

There seems to be an issue with byte arrays

do something like this -- ( the real code is in a big app ) -- i can create a better example if needed

seudo code...

struct Info {
Raw []byte
}

is1 := &Info{}
is1.Raw = []byte("testbytearray")
raw1, _ := is1.Marshal(nil)
ig1 := &Info{}
ig1.Unmarshal(raw1)

fmt.Printf("Test1 1 %s %s", string(is1.Raw), string(ig1.Raw))
fmt.Printf("Test1 2 %v %v", is1.Raw, ig1.Raw)
fmt.Printf("Test1 3 %s %s", global.ToJSONPretty(is1), global.ToJSONPretty(ig1))

Test1 -- print as string works
testbytearray
testbytearray

Test2 -- print as %v
[116 101 115 116 98 121 116 101 97 114 114 97 121]
[116 101 115 116 98 121 116 101 97 114 114 97 121 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

Test3 -- Convert struct to json print

"Raw": "dGVzdGJ5dGVhcnJheQ=="
"Raw": "dGVzdGJ5dGVhcnJheQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="

if the byte array is large and mixed in a bigger struct i get a panic ( the real use case)

Will this project still be usable in 2023?

I clone this repository, and I try to run go build .
GOD,All the code doesn't have any definition for the Grammar object. Why is that? Is it a conceptual project? How do people use it?

Unsafe option generates faulty code

Please find attached the schema for a struct and the corresponding .go files generated with the following commands: gencode go -schema user.schema -package datatypes and gencode go -schema user.schema -package datatypes -unsafe.

The Marshal method in the unsafe file sets the three int64 fields to zero (see lines 214-224):

` {

    d.LastSeen = *(*int64)(unsafe.Pointer(&buf[i+0]))

}
{

    d.Created = *(*int64)(unsafe.Pointer(&buf[i+8]))

}
{

    d.Updated = *(*int64)(unsafe.Pointer(&buf[i+16]))

}
return buf[:i+24], nil

`

Using time fields (still undocumented?) instead of int64 works fine.

issue.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.