Git Product home page Git Product logo

msgp's Introduction

MessagePack Code Generator

Go Reference test validate

This is a code generation tool and serialization library for MessagePack. You can read more about MessagePack in the wiki, or at msgpack.org.

Why?

Quickstart

In a source file, include the following directive:

//go:generate msgp

The msgp command will generate serialization methods for all exported type declarations in the file.

You can read more about the code generation options here.

Use

Field names can be set in much the same way as the encoding/json package. For example:

type Person struct {
	Name       string `msg:"name"`
	Address    string `msg:"address"`
	Age        int    `msg:"age"`
	Hidden     string `msg:"-"` // this field is ignored
	unexported bool             // this field is also ignored
}

By default, the code generator will satisfy msgp.Sizer, msgp.Encodable, msgp.Decodable, msgp.Marshaler, and msgp.Unmarshaler. Carefully-designed applications can use these methods to do marshalling/unmarshalling with zero heap allocations.

While msgp.Marshaler and msgp.Unmarshaler are quite similar to the standard library's json.Marshaler and json.Unmarshaler, msgp.Encodable and msgp.Decodable are useful for stream serialization. (*msgp.Writer and *msgp.Reader are essentially protocol-aware versions of *bufio.Writer and *bufio.Reader, respectively.)

Features

  • Extremely fast generated code
  • Test and benchmark generation
  • JSON interoperability (see msgp.CopyToJSON() and msgp.UnmarshalAsJSON())
  • Support for complex type declarations
  • Native support for Go's time.Time, complex64, and complex128 types
  • Generation of both []byte-oriented and io.Reader/io.Writer-oriented methods
  • Support for arbitrary type system extensions
  • Preprocessor directives
  • File-based dependency model means fast codegen regardless of source tree size.

Consider the following:

const Eight = 8
type MyInt int
type Data []byte

type Struct struct {
	Which  map[string]*MyInt `msg:"which"`
	Other  Data              `msg:"other"`
	Nums   [Eight]float64    `msg:"nums"`
}

As long as the declarations of MyInt and Data are in the same file as Struct, the parser will determine that the type information for MyInt and Data can be passed into the definition of Struct before its methods are generated.

Extensions

MessagePack supports defining your own types through "extensions," which are just a tuple of the data "type" (int8) and the raw binary. You can see a worked example in the wiki.

Status

Mostly stable, in that no breaking changes have been made to the /msgp library in more than a year. Newer versions of the code may generate different code than older versions for performance reasons. I (@philhofer) am aware of a number of stability-critical commercial applications that use this code with good results. But, caveat emptor.

You can read more about how msgp maps MessagePack types onto Go types in the wiki.

Here some of the known limitations/restrictions:

  • Identifiers from outside the processed source file are assumed (optimistically) to satisfy the generator's interfaces. If this isn't the case, your code will fail to compile.
  • Like most serializers, chan and func fields are ignored, as well as non-exported fields.
  • Encoding of interface{} is limited to built-ins or types that have explicit encoding methods.
  • Maps must have string keys. This is intentional (as it preserves JSON interop.) Although non-string map keys are not forbidden by the MessagePack standard, many serializers impose this restriction. (It also means any well-formed struct can be de-serialized into a map[string]interface{}.) The only exception to this rule is that the deserializers will allow you to read map keys encoded as bin types, due to the fact that some legacy encodings permitted this. (However, those values will still be cast to Go strings, and they will be converted to str types when re-encoded. It is the responsibility of the user to ensure that map keys are UTF-8 safe in this case.) The same rules hold true for JSON translation.

If the output compiles, then there's a pretty good chance things are fine. (Plus, we generate tests for you.) Please, please, please file an issue if you think the generator is writing broken code.

Performance

If you like benchmarks, see here and here.

As one might expect, the generated methods that deal with []byte are faster for small objects, but the io.Reader/Writer methods are generally more memory-efficient (and, at some point, faster) for large (> 2KB) objects.

msgp's People

Contributors

alexandercampbell avatar alrs avatar arl avatar asellappen avatar bradleypeabody avatar cce avatar client9 avatar dchenk avatar ebfe avatar eliottness avatar glycerine avatar harukasan avatar hectorj avatar jperkin avatar klauspost avatar mengzhuo avatar mhr3 avatar philhofer avatar pilebones avatar pwaller avatar rakitzis avatar shabbyrobe avatar shawnps-sigsci avatar shkw avatar thajeztah avatar therve avatar tianon avatar ttacon avatar ugorji avatar very-amused avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

msgp's Issues

multicore

We're using msgp a lot lately, and like it. Thank you!

We are marshalling and unmarshalling 40GB of data every 15-60 minutes, and have been doing a lot of this recently: https://xkcd.com/303/. Speeds are roughly 250MB/s from msgp. Which kicks ass. BUT, our disk storage runs at something like 900MB/S and we have a huge RAM cache which makes me suspect we could get at least a 4x speedup with some extra cores thrown at this. That would be great.

I've been thinking over whether one could automatically goroutinize a large load, and I'm curious for your thoughts about it. We could reinstrument our marshal/unmarshal functions to split data into files and recombine after load, but it would be really great to shard out reading from slices leaving one file which could be read single threaded or multicore. Any thoughts on doing this?

Make the encode funcs bound to the types, not pointers the types.

Because most encoding frameworks allow encoding either values or pointers to values (even if they only allow decoding to pointers for obvious reasons), it would ease integration with msgp if the encoder funcs were bound to values instead of pointers.

As in:

func (t Type) MarshalMsg ...

vs

func (t *Type) MarshalMsg ...

Maybe you have good reasons for having pointer receivers, but since I guess most of any type gets accessed in the encode func anyway, the copy shouldn't (without me having benchmarked this at all :) be slower than doing lots of dereferencing in a pointer receiver func.

Also, the value receiver func will work even when you have a *t, but not the reverse, so it's easier on the user :D

Does this make sense, or am I missing something?

Mechanism to generate Extension functions

When one simply wants to create a set of custom types to be used as a field of type interface{}, you really just want to register a type number with a normal struct.

Currently, you must add 4 boiler plate functions to satisfy the Extension interface.

Is there a reason for making the extension interface not use the same function names as (Un)Marshaller and add the additional ExtensionType function?

Example code I have copy/pasted (and renamed struct) in my code:

func (i *Identity) ExtensionType() int8 {
    return IdentityT
}
func (i *Identity) Len() int {
    return i.Msgsize()
}
func (i *Identity) MarshalBinaryTo(b []byte) error {
    _, err := i.MarshalMsg(b)
    return err
}
func (i *Identity) UnmarshalBinary(b []byte) error {
    _, err := i.UnmarshalMsg(b)
    return err
}

Generated code whitespace

Even though the generated code is gofmted, it can still look pretty gross, and that's mostly due to inconsistent line spacing.

+1 internets to anyone who can figure out how to make gofmt remove empty lines within function calls.

Issues with fixed size arrays

The generated code for fixed-size byte arrays fails to compile. For example:

$ cat types.go
//go:generate msgp -file types.go 

package foo 

type X struct {
        Items [32]byte `msg:"items"`
}
$ go generate
[...]
$ go build
[...]
./types_gen.go:62: dc.ReadByte undefined (type *msgp.Reader has no field or method ReadByte)
./types_gen.go:133: undefined: msgp.ByteSize

While this error seems to be specific to byte and the generated code for
other types compiles, the generated test-suite fails for at least string and the int variants. For example:

$ cat types.go
package foo 

//go:generate msgp -file types.go 

type X struct {
        Items [32]int32 `msg:"items"`
}
$ go generate
[...]
$ go test
--- FAIL: TestXMarshalUnmarshal (0.00s)
        types_gen_test.go:87: 1 bytes left over after Skip(): "\x00"
FAIL
exit status 1
[...]

Keep track of imports

Issue #45 is fixed, but the solution involves re-reading/writing the whole generated file.

A better solution (from an I/O standpoint) is to keep track of required imports in the generated file and insert them as necessary. However, this may end up being a lot of extra code to write/maintain, so a good solution should be fairly pithy.

Test/example not in repository

As I am rather new with golang it would be nice the complete wiki example as compilable code in the repository. Including the msgpack data and with json output.

Kind regards,
Jerry

Option to read string as []byte

Firstly, great package. Thanks very much.

I've seen in the README:

Maps must have string keys

Would you be open to making this configurable in some way?

Reason being, the MSGPACK I'm consuming sends all strings as bin because the strings themselves might be encoded in a different format to UTF8.

Generate Reset()

There should probably be a Reset() method, because by default ReadFrom doesn't zero-out fields before reading.

Document "rules"

Right now there's no documentation on the "rules" for decoding/decoding. They should be explicit.

JSON

The generator tree (gen.Elem) can be used as input for JSON method templates.

There's a bunch of boilerplate that has to be written, but we should be able to use at least ~60% of the existing code.

Allow overriding of the x_gen filename

I haven't tried it, so forgive me if I'm incorrect.

gen (another code generation thingy for Go) also uses the _gen naming convention for generated files. Are these two tools going to clobber each other if e.g. I want both a Objects struct AND a msgpack serialization of Object?

Process unexported types

At the moment msgp seems to ignore unexported types and does not generate any code for them.
More often than not, I just want to parse some msgpack data and not export the type parsed into. Is there a reason why msgp ignores unexported types?

Dropping the ast.FileExports calls in parse/getast.go enables processing of unexported types and so far I haven't hit any ill effects.

go vet complains on generated code

go vet beeps on the generated code

xxx_gen.go:400: github.com/philhofer/msgp/enc.ArrayError composite literal uses unkeyed fields
exit status 1
if asz != 2 {
             err = enc.ArrayError{asz, 2}
            return
         }

Support non-struct types

Things get rather difficult if we don't want to generate code for arbitrary types, but still want to support something like:

type Chunk [128]byte

type Block struct {
    Meta string
    Data Chunk
}

Right now, Chunk is interpreted as IDENT, even though we could conceivably generate the appropriate code for it in-situ. However, generating the appropriate methods for Chunk is simpler than transitively applying its type information across every location in which it is used.

TL;DR generate code for all valid *ast.TypeDecl nodes.

msgp fails when GOPATH has multiple entries

msgp fails when GOPATH has multiple entries

A better way to grab the base path of the tmpl files is to start from the path of the current go file.

runtime.Caller exposes this.

Sending a pull request in a few minutes.

Populate test/bench objects

We can't use testing/quick to populate structs for testing and benchmarking because of go issue #8818.

I'm open to short-term solutions to this problem until the issue is fixed (go1.5).

Ability to specify which methods are generated on a per struct basis

e.g. in pseudo code:

//msgp:shim string as:[]byte using:[]byte/string

//msgp:tuple MyTuple generate:DecodeMsg

type MyTuple struct {
    A, B string
}

would cause only the DecodeMsg method to be generated (but not any of the marshall-related methods or the EncodeMsg)

The use case here is again MSGPACK RPC. I want to be able to generate a decoder for one struct (the args to the method) and an encoder for another (the return values). Then combine those two structs into a wrapper struct along the following lines:

//msgp:tuple MethodArgs generate:DecodeMsg

type MethodArgs struct {
  I int
}

//msgp:tuple MethodRetVals generate:EncodeMsg

type MethodRetVals struct {
  S string
}

type MethodWrapper struct {
  *MethodArgs
  *MethodRetVals
}

I can then unambiguously call DecodeMsg on an instance of the MethodWrapper

Thoughts?

File generation if no annotations exist/are valid

This is more of a question about desired functionality. Should msgp create *_gen.go if I don't have any structs annotated for generation or if those structs don't have any exported fields? I feel like it shouldn't because this results in a file with imports but no other code, obviously causing a compilation error for the package.

Support constant expressions

Right now a field like [3]float64 won't work. Since the array length can be a constant, we'll have to support constant expressions along with it, e.g. [LEN]float64.

Option to implement empty structs

Hi!

Would it be possible to add a go generate argument to generate implementation for empty structures?
If yes, how easy is it to implement that feature?

handle multi-field lines

The following declaration breaks the current ast parser:

type Thing struct {
    A, B, C float64
}

We need to break up multi-name lines as multiple fields.

Possibility to start honouring `encoding.BinaryMarshaller` and `encoding/gob.GobEncoder`

Some third party packages (e.g. https://cloud.google.com/appengine/docs/go/datastore/) use proprietary types (e.g. https://cloud.google.com/appengine/docs/go/datastore/reference#Key) in ways that don't allow developers to transparently replace them with other types (see https://cloud.google.com/appengine/docs/go/datastore/reference where *datastore.Key has a fairly prominent role, as opposed to a simple type Key interface {...} which I think would have been preferable).

Sometimes these types (like datastore.Key) don't have any exported fields, and are thus usually impossible to serialize with encoders that use the praxis of only encoding exported fields.

To still allow encoding they usually implement http://golang.org/pkg/encoding/#BinaryMarshaler or (in this case) http://golang.org/pkg/encoding/gob/#GobEncoder.

If msgp honoured the above interfaces and used them to encode these types, msgp would be possible to use as a plug-in replacement of encoding/gob in scenarios where you want to serialize your App Engine entities (e.g. when implementing https://cloud.google.com/appengine/docs/go/memcache/reference#Codec).

Method-specific directives

Both #71 and #34 can't be well-expressed using the existing abstractions in gen.

Specifically, what I'm proposing (PR forthcoming) is directives along the lines of

//msgp:encode ignore {{type}}
//msgp:encode byvalue {{type}}
// ...etc

Type support for complex objects

Hello.
I'm experiencing a problem while trying to encode a complex object.
The structure is the following:

type MyStructA struct {
  Field interface{}
}

type MyStructB struct {
  Field MyStructC
}

type MyStructC struct {
 Field string
}

I need to encode an object of type MyStructA which contains an object of type MyStructB containing an object of type MyStructC :

package main

myObj := MyStructA {
 Field: MyStructB{
  Field: MyStructC{
   Field: "string",
  },
 },
}

After I generate the file using go generate (including this message inlining methods for MyStructC into MyStructB) and build using go build successfully with no errors or warning I get the following runtime error:
msgp: type "main.MyStructB" not supported
Now, if I change the MyStructB to this

type MyStructB struct {
 Field string // Changed from MyStructC to primitive type string
}

This works. Am I doing it wrong? As I can see in the code, it fails when checking the type of Field inside MyStructB

Revert #7

Using io.WriterTo and io.ReaderFrom was a mistake. Those implementations in the standard library read until EOF; the generated methods do not.

Skip unknown fields

We need a way for decoders to skip over fields with no corresponding element in the struct. dc.Skip() would go in the default switch case of the struct decoding template.

better map support

Currently only map[string]string and map[string]interface{} are supported.

We can, in theory, support any map[string]IDENT type, where IDENT is any gen.BaseElem. The code will have to be generated explicitly, as opposed to the existing static implementations.

Struct with *time.Time field doesn't import time

If I have a struct such as:

type A struct {
    C *time.Time
}

msgp does not add a time import into the generated code. This raises a compilation error as in Unmarshal/Decode such a field has the following code generated for it:

if identifier == nil {
    idenifier = new(qualified.Type)
}

I don't currently have the time to provide a patch for this, but I will when I can find some.

better handling of IDENT

Take the following declaration:

type MyType struct {
    Blah *Thing
}

The current implementation assumes that *Thing implements the io.WriterTo and io.ReaderFrom interfaces. In order to determine the underlying type of the identifier, we may have to do full-package identifier resolution.

Shims generate functions with the shimmed type as receiver

In #31 a nice functionality was implemented to generate shims for generic types.

Unfortunately, when I run the code now, it seems the generator generates functions with the shimmed type as receiver.

The entire point of the shim (for me, at least) was to be able to encode third party types that I can't create new functions for.

Example:

//go:generate msgp
//msgp:shim datastore.Key as:string using:utils.EncodeKey/utils.DecodeKey
package user

import "appengine/datastore"

type User struct {
  Id *datastore.Key
  Name string
}

When I run go generate with this code, something like

// MarshalMsg implements msgp.Marshaler
func (z *datastore.Key) MarshalMsg(b []byte) (o []byte, err error) {
        o = msgp.Require(b, z.Msgsize())
        o = msgp.AppendString(o, common.EncodeKey((*z)))
        return
}

will be generated, but won't compile of course.

[bug] Slices of small structs don't use shim funcs to encode/decode their fields

The shim funcs to encode/decode third party types work beautifully, and to encode/decode slices of structs including these third party types I just have to define explicit slice types.

These slice types usually use the struct type encode/decode funcs to encode/decode their elements, but if the struct type is small enough (I guess the reason to be) the slice type will just inline the field encoding/decoding.

In these cases, the shim methods are not used for the encoding/decoding, and the code won't compile.

func Locate() depth level

Hi!
I'm trying to recover a slice of a byte array (encoded msgpack) .
The msgpacked string looks good and I'm able to recover a slice from the root depth, but not more.
Here is the structure:

type MyStructA {
 Field1 interface{}
}

type MyStructB {
 Field2 interface{}
}

type MyStructC {
 Field3 string
}

data := MyStructA {
       Field1: &MyStructB {
         Field2: &MyStructC {
           Field3: "String",
           },
          },
      }

I want to isolate Field2 to decode it into a MyStructC variable to avoid dealing with a map[string]interface{} object since my fields are of type interface{}.
Using the raw content (the entire msgpack from data) I can access Field1 and decode it to a proper structure but not Field2 because msgp.Locate("Field2", raw) returns an empty byte array.

msgp:shim ignored for time.Time

shims seem to be ignored for time.Time.

package tt

import "time"

//go:generate msgp

//msgp:shim time.Time as:string using:timetostr/strtotime
type T struct {
    T time.Time
}

func timetostr(t time.Time) string {
    return t.Format(time.RFC3339)
}

func strtotime(s string) time.Time {
    t, _ := time.Parse(time.RFC3339, s)
    return t
}

The generated code still calls dc.ReadTime which expects a TimeExtension.

Better time.Time perf

time.Time requires an allocation for encoding, because right now we're using the MarshalBinary() method.

We should probably define a different encoding such that we can do this without allocating.

gofmt gen_test code

There are a few minor spacing issues that don't align with code from gofmt. Such as:

spacing

Unused variable error in Msgsize() for structs containing a slice of arrays

//go:generate msgp -file types.go 

package foo 

type X struct {
    Items [][32]int`msg:"items"`
}

The generated code fails to compile due to an unused variable in Msgsize():

./types_gen.go:167: xvk declared and not used
   165  func (z *X) Msgsize() (s int) {
   166          s = msgp.MapHeaderSize + msgp.StringPrefixSize + 5 + msgp.ArrayHeaderSize
   167          for xvk := range z.Items {
   168                  s += msgp.ArrayHeaderSize + (32 * (msgp.IntSize))
   169          }
   170          return
   171  }

Line 167 should probably read for _ = range z.Items or (similar as it is
done in sizeGen.gMap) _ = xvk added to the loop body. Something like 04c6662 would
fix this, but introduces a lot of unnecessary _ = idx statements.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.