fxamacker / cbor Goto Github PK

CBOR codec (RFC 8949) with CBOR tags, Go struct tags (toarray, keyasint, omitempty), float64/32/16, big.Int, and fuzz tested billions of execs.

License: MIT License

Go 100.00%

cbor rfc-8949 rfc-7049 cose cwt go golang json-alternative serialization cbor-library

cbor's People

Stargazers

Watchers

cbor's Issues

Add Marshaler and Unmarshaler interfaces

Add Marshaler and Unmarshaler interfaces to let user-defined types implement their own CBOR encoding and decoding.

This is needed by issue #11.

Replace options struct and improve option handling

It would be easier and cleaner to do this in v2.0 instead of v1.4 due to SemVer policy.

RFC 7049bis indicates additional considerations for encoding not mentioned in 7049, so future protocols are likely to use new encoding modes.

One way to handle options would be to use some integer "enums" to specify different aspects of encoding modes. So 6-7+ of these aspects can combine to specify any current or future mode. And each is an integer so they can have new values made available as needed.

Something like the following (with better names than this rough draft):

Sorting: 0=default, unsorted; 1=RFC 7049 Canonical, 2=Bytewise Lexicographic, 3=reserved, ...
SmallestFormFloats: 0=default, unchanged; 1=float16, 2=float32, 3=float64, 4=reserved, ...
SmallestFormIntegers: 0=default, smallest form is byte; 1=reserved, ...
DupMapKeys: 0=default, unchecked, last one used, continue; 1=reserved, ...
Utf8Problems: 0=default, reject as malformed and stop; 1=reserved, ...

And one or more aspects for each of these:

Other error-handling options 7049bis requires protocols to specify (so they can use this library).
Tag options like whether or not to encode tag for time values.

So a combination of these options can handle all existing CBOR modes plus some future modes that don't exist yet.

Relax decoding restriction on CBOR integer to Go float (useful for SenML)

Allow CBOR integer value to be decoded to Go float. This is currently denied by the library.

Also, see kanban regarding optional new feature involving floats. It might be convenient for you to implement everything float-related in one shot if you agree with this change.

Add main image

Improve decoding speed affected by commit 1a29187

Adding Unmarshaler decreased decoding speed in commit 1a29187. This affects all data types.

Make it faster by caching the result of reflect pkg's Type.Implements().

how to save one struct has pointer and embedded map with cbor

I have two struct like this

type IdsMapping struct {
UserIdIndexDict map[float64]float64
IndexUserIdDict map[float64]float64
ItemIdIndexDict map[float64]float64
indexItemIdDict map[float64]float64
}

type LFM struct {
classCount int
iterCount int
featureName string
labelName string
lr float64
lam float64
UserItemRatingMatrix *mat.Dense
ModelPfactor *mat.Dense
ModelQfactor *mat.Dense
RatingDf *dataframe.DataFrame
UseridItemidDict map[float64]map[float64]float64
UseridSet *gset.Set
ItemidSet *gset.Set
IdsMapping *IdsMapping
UserIndexItemIndexDict map[float64]map[float64]float64
}

I want to use cbor to serilazition LFM struct ,how to use it ,because the struct is complex ,thx

Change CBOR string validation error from SemanticError to SyntaxError

Valid() should return SyntaxError instead of SemanticError when:

indefinite length string chunks are not of the same type, or
indefinite length string chunks are not definite length.

Probably no difference except for logs. So this won't require a bugfix release.

Check if CBOR type can be used as Go map key

When decoding into a Go map with interface{} as key type, reject CBOR data if it can't be used as Go map key.

Support `encoding.[BinaryMarshaler,BinaryUnmarshaler]`?

Is there any interest in supporting special handling for types that know how to serialize themselves into a binary representation? encoding/json supports calling a given object's encoding.TextMarshaler and encoding.TextUnmarshaler implementations if provided, as does the go-codec library.

For CBOR the binary equivalents probably make more sense. I've started poking at this in a local branch, and while the encoding side was straightforward to add, the decoding side is a bit more involved due to parseInterface.

Add to cbor.io

I created a pull request to add your library to cbor.io. There are a few open questions you might be more qualified to answer than me: cbor/cbor.github.io#56 (comment)

Please jump in. :-)

Decoding subnormal float16 numbers produces incorrect values

When float16 numbers are subnomal (exponent 0, significand ≠ 0), decoded values are incorrect.

This problem only affects subnormal float16 numbers during decoding.

Fix this by replacing float16 to float32 conversion function with a new implementation that is verified 100% correct for all 65536 possible conversions.

Encoding struct with null pointer to embedded struct returns error

Encoding struct with null pointer to non-embedded struct works as expected.

Problem only affects null pointer to embedded struct.

What version?
go v1.12.12
cbor v1.2.0 (likely affects older versions too)

What did you do?

	type (
		T1 struct {
			N int
		}
		T2 struct {
			*T1
		}
	)
	v := T2{}
	cborData, err := cbor.Marshal(v, cbor.EncOptions{}

What did you expect to see?

cborData = []byte{0xa0}
err = nil

What did you see instead?

cborData = nil 
err = "cbor: cannot set embedded pointer to unexported struct: *cbor_test.T1"

Safe optimization for COSE (RFC 8152) and WebAuthn

It would be nice to be able to decode CBOR maps with integer keys to Go struct, and vice versa.

This feature would speed up and simplify using this library for COSE. It would also speed up WebAuthn since that uses COSE.

Publish old vs new benchmark deltas for new releases

This would help users of older versions evaluate new releases. And help prevent undetected performance regressions.

benchmark	old ns/op	new ns/op	delta
BenchmarkFoo	523	68.6	-86.88%

benchmark	old allocs	new allocs	delta
BenchmarkFoo	3	1	-66.67%

benchmark	old bytes	new bytes	delta
BenchmarkFoo	80	48	-40.00%

Update CBOR comparison charts for v1.3.3 and simplify them

v1.3.3 looks solid, passed 220+ million execs fuzzing with 1000+ corpus files as starting point.

workers: 2, corpus: 1071 (18h46m ago), crashers: 0, restarts: 1/10000, execs: 227582949 (3368/sec), cover: 2011, uptime: 18h46m

The charts don't need 3 bars for each aspect being compared. Just use default build settings for every library compared for every comparison.

I'll post updated speed comparison and size comparison charts as comments to this issue.

Rename milestone v1.4 to v2.0 (tags and 7049bis options)

Timing is good to bump major version since adding CBOR Tags is a big change and there's a need for extended options based on 7049bis. E.g. newer encoding modes and error handling options.

As discussed, renaming v1.4 to v2.0 allows extended option handling to be done in a cleaner and simpler way without dragging along the old options struct.

Improved option handling can be designed with 7049bis and "generic" CBOR library in mind, so it should be more future-proof than the current design.

Comply with RFC 2026 about RFC 7049bis

RFC 2026 says,

Under no circumstances should an Internet-Draft be referenced by any paper, report, or Request-for-Proposal, nor should a vendor claim compliance with an Internet-Draft.

Update README.md:

Remove 7049bis from statements about standards compliance until it is approved.
When mentioning 7049bis for extras beyond compliance, add the word "latest" in front of it. E.g. "Decoding also checks for all required well-formedness errors described in the latest RFC 7049bis, ..."

Split encoding mode booleans into two integer options in v1.x

As discussed in #62 this can be a small release in v1.x before merging in CBOR tags feature.

deprecate but still support boolean encoding modes in v1.4
add SortMode (int) and ShortestFloat (int) as encoding options to replace encoding mode booleans.

The combination of SortMode and ShortestFloat can be used to specify these modes in v1:

default
Canonical
CTAP2
Core Deterministic Encoding Rule 2 (7049bis)
and modes that don't have a name yet

type SortMode int

const (
	SortNone			SortMode = 0	// no sorting
	SortLengthFirst			SortMode = 1	// RFC 7049 Canonical
	SortBytewiseLexical		SortMode = 2	// RFC 7049bis Bytewise Lexicographic
	SortCanonical			SortMode = SortLengthFirst
	SortCTAP2			SortMode = SortBytewiseLexical	
	SortCoreDeterministic		SortMode = SortBytewiseLexical
)

type ShortestFloat int

const (
	ShortestFloatNone		ShortestFloat = 0	// no change
	ShortestFloat16			ShortestFloat = 1	// float16 as shortest form of float that preserves value
	ShortestFloat32			ShortestFloat = 2	// float32 as shortest form of float that preserves value
	ShortestFloat64			ShortestFloat = 3	// float64 as shortest form of float (this may convert from float32 to float64, etc.)
)

Prevent an inappropriate use of BinaryUnmarshaler

Don't allow CBOR byte string (major type 2) as input to Go's Time.UnmarshalBinary.

Time values should only be encoded/decoded using these CBOR data types: pos or neg integer, float, and text string.

For more info, see RFC 7049 section 2.4.1.

Add 87 tests for CBOR data items that are not well-formed

RFC 7049bis describes three kinds of malformed CBOR data:

kind 1 (too much data)
kind 2 (too little data)
kind 3 (syntax error) and 5 "subkinds" of syntax error

I created 87 unit tests based on kind 2 and kind 3 and v1.3.2 passed all of them. 👍

Kind 1 is only an error when the application (not CBOR library) assumes that the input bytes would span exactly one data item.

RFC 7049bis Appendix G:

Too much data: There are input bytes left that were not consumed. This is only an error if the application assumed that the input bytes would span exactly one data item. Where the application uses the self-delimiting nature of CBOR encoding to permit additional data after the data item, as is for example done in CBOR sequences [I-D.ietf-cbor-sequence], the CBOR decoder can simply indicate what part of the input has not been consumed.

Need to review the tests and commit.

Add a Security Policy to README.md

For example:

Security Policy

For v1, security fixes are provided only for the latest released version since the API won't break compatibility.

To report security vulnerabilities, please email [email protected] and allow time for the problem to be resolved before reporting it to the public.

Prevent stack exhaustion exploit by limiting nested levels for CBOR arrays, maps, and tag.

Please consider resolving issue discussed before adding new features like CBOR tags, optimizing for speed, or refactoring to rename functions, etc.

Add benchmarks using WebAuthn data

Since WebAuthn requires CBOR, add some benchmarks using typical WebAuthn data.

Decoding to not-nil interface returns error

The following code returns error "cbor: cannot unmarshal array into Go value of type interface {}"

s := "hello"
var v interface{} = s
cbor.Unmarshal(data, &v)

Decoder should handle not-nil interface in the same way as nil interface by storing appropraite Go type in the interface value. This behavior should be made consistent with encoding/json.

Deprecate bool encoding modes in EncOptions and provide int SortMode

Encoding modes have different aspects, for example:

sorting
if (and how) to shrink floats to smallest form that preserves value

The encoding mode bools should've been one int since there will be more encoding modes in the future than anticipated.

However, providing an integer encoding mode is also inflexible because it would only support known encoding modes.

Simply deprecate the bool encoding modes and add an integer SortMode to EncOptions.

When using encoding modes that do not shrink/expand floats, the sort mode basically determines simple encoding modes like Canonical or CTAP2 Canonical.

This way, future encoding modes not yet known/named today can be supported by setting the required combination of options in EncOptions.

This issue is closed by commit 3b78ee0 and is the first half of existing issue #74.

A separate issue can be opened to add a NewEncOptions or EncOptions.New function (specifying a known encoding mode) that returns EncOptions having proper values for SortMode, ShortestFloatMode, and etc.

type SortMode int

const (
	SortNone			SortMode = 0	// no sorting
	SortLengthFirst			SortMode = 1	// RFC 7049 Canonical
	SortBytewiseLexical		SortMode = 2	// RFC 7049bis Bytewise Lexicographic
	SortCanonical			SortMode = SortLengthFirst
	SortCTAP2			SortMode = SortBytewiseLexical	
	SortCoreDeterministic		SortMode = SortBytewiseLexical
)

Include bar chart of binary sizes in README

Add feature to handle JSON struct field tags

Add support for json struct field tags whencbor struct field tags are absent. This helps CBOR be a drop-in replacement for JSON.

Increase test coverage (currently 87-91%, depending on reporting tools)

Current status:

87% displayed on codecov badge
91.8% reported by go test -cover

Most of remaining coverage involves edge-cases and error handling.

Go struct to CBOR array using `cbor:",toarray"`

This allows:

encoding Go strcut to CBOR array.
decoding CBOR array to Go struct.

This makes encoded data more compact and structs are easier to use.

	type T struct {
		_ struct{} `cbor:",toarray"` 
		A int      `cbor:",omitempty"`
		B string   `cbor:",omitempty"`
	}

Special field "_" is used to specify struct level options, such as "toarray". Any value of T type is encoded as array of 2 elements. "omitempty" is disabled by "toarray" to ensure that the same number of elements are encoded every time.

If "toarray" is omitted, "omitempty" works just like encoding/json.

I came across this at oasisprotocol/oasis-core@ade6a1b. @Yawning has the best commit messages!

Update benchmarks to include non-canonical CBOR encoding

Encoding benchmarks only use canonical CBOR, which is slower. Add benchmarks for the faster encoding.

Separate CTAP2 "canonical CBOR" and RFC 7049 "Canonical CBOR" encoding modes

CTAP2 "canonical CBOR" != RFC 7049 "Canonical CBOR" when map keys have different data types due to sorting rules. If all map keys have same data type, there's no difference.

When "Canonical CBOR" option is specified, this library has been using sorting rules from "Core Deterministic Encoding Requirements" making it effectively sort like CTAP2.

There should be two canonical modes as options: "Canonical CBOR" and "CTAP2 Canonical CBOR". They only differ in sorting rules involving map keys with mixed types.

NOTE: issue edited based on your comments

Fixes required because both RFC 7049 Appendix A and Wikipedia violate RFC 7049 Section 2.3

Data validation should add this rule:
Reject CBOR primitives (major type 7) with additional info 24 if value (next byte) is < 32.

BTW, always verify with RFC 7049 because there's incorrect info about this item even on reputable websites.

Replace example CBOR program in size comparison chart

Program size comparison should use existing 3rd-party program(s), instead of one you wrote for solely for comparison.

If you want, I'll give it a shot and post result(s) here. I'll find IoT and security related projects because they value size, safety and reliability.

Let me know.

Provide info on how to contribute (CONTRIBUTING.md)

Add CONTRIBUTING.md with info on how to contribute to this project.

My favorite C++ JSON library has a nice one: https://github.com/nlohmann/json/blob/master/.github/CONTRIBUTING.md

Improve speed and memory usage

Additional performance improvements for milestone v1.3, that is not already covered by issue #15 and #17.

Improve encoding speed, reduce mem, refactor (commit 8ea465d)
Improve encoding speed (commit d85552b)
Improve encoding speed by caching types (commit 90423eb)
Add fast path to encode fixed length struct (commit 05e6b7c)
Add Fast path to decode to empty interface (commit 23d2052)

And more if time allows.

Skip CBOR array/map elements on incompatible Go type

This was discovered by using a newer cbor-fuzz after adding "toarray" struct tag to fuzzing.

While fixing this, I added an additional type compatibility check when not using "toarray".

Add CBOR encoding/decoding speed comparison chart to README.md

I'll provide a comparison using release versions of CBOR libraries. The input data won't be contrived/biased so it'll be fair and useful.

README says speed isn't a primary design goal and that faster libraries exist, so I think the results will surprise some people.

Unsupported CBOR negative int values should return error

As documented under Limitations section, CBOR negative integers like -18446744073709551616 are unsupported because they cannot fit into Go's int64.

But instead of returning 0 with err=nil, I think cbor.Unmarshal should return an error when trying to decode these values.

Go's encoding/json handles this scenario by returning json.UnmarshalTypeError. So this library should probably return cbor.UnmarshalTypeError.

Improve decoding speed

Spotted several optimizations while working on milestone v1.3.1.

Speedups should be around 4% or 5% faster for COSE and CWT decoding depending on other changes in the same release.

Add benchmark using CBOR Web Token (CWT) and SenML data

CWT (RFC 8392) is encoded in CBOR and uses COSE for added app-layer security. It's derived from JSON Web Token.

https://tools.ietf.org/html/rfc8392

Encoding a new type with float32 as its underlying type causes interface conversion panic

New type causes panic if float32 is the underlying type. It causes "panic: interface conversion: interface {} is cbor_test.myFloat32, not float32 [recovered]"

type myFloat32 float32
cbor.Marshal(myFloat32(0.0), cbor.EncOptions{})

Type alias works fine.

type aliasFloat32 = float32
cbor.Marshal(aliasFloat32(0.0), cbor.EncOptions{})

Add support for CBOR Tags (major type 6)

UPDATED on Jan 30, 2020:

I've been working on adding support for CBOR tags and made good progress. I moved this feature from v2.0 to milestone v2.1.

v2.0 will be released on Feb 2, 2020 and v2.1 about 1 week after that.

encode floating point values using the shortest form that preserves that value
sort using streamlined rules that produce same results as "CTAP2 Canonical CBOR" sorting of map keys
no change needed for integers since they're already using shortest form for all modes

For example, try to convert float64 to float32, if that works, try to convert float32 to float16.

This mode makes serialized data more compact when it contains floating point numbers.

Reject indefinite length byte string if chunks are indefinite length

Enforce this in data validation.

RFC 7049 2.2.2

   For indefinite-length byte strings, every data item (chunk) between
   the indefinite-length indicator and the "break" MUST be a definite-
   length byte string item; if the parser sees any item type other than
   a byte string before it sees the "break", it is an error.

This also affects indefinite length text string.

// RFC 8428 says, "The data is structured as a single array that 
// contains a series of SenML Records that can each contain fields"

// fxamacker/cbor v1.3 has "keyasint" struct tag (ideal for SenML)
type SenMLRecord struct {
	BaseName    string  `cbor:"-2,keyasint,omitempty"`
	BaseTime    float64 `cbor:"-3,keyasint,omitempty"`
	BaseUnit    string  `cbor:"-4,keyasint,omitempty"`
	BaseValue   float64 `cbor:"-5,keyasint,omitempty"`
	BaseSum     float64 `cbor:"-6,keyasint,omitempty"`
	BaseVersion int     `cbor:"-1,keyasint,omitempty"`
	Name        string  `cbor:"0,keyasint,omitempty"`
	Unit        string  `cbor:"1,keyasint,omitempty"`
	Value       float64 `cbor:"2,keyasint,omitempty"`
	ValueS      string  `cbor:"3,keyasint,omitempty"`
	ValueB      bool    `cbor:"4,keyasint,omitempty"`
	ValueD      []byte  `cbor:"8,keyasint,omitempty"`
	Sum         float64 `cbor:"5,keyasint,omitempty"`
	Time        int     `cbor:"6,keyasint,omitempty"`
	UpdateTime  float64 `cbor:"7,keyasint,omitempty"`
}

// When cborData is a []byte containing SenML, 
// it can easily be decoded into a []SenMLRecord.

var v []SenMLRecord
if err := cbor.Unmarshal(cborData, &v); err != nil {
	t.Fatal("Unmarshal:", err)
}

// That's it!  Decoding speed is fast and v contains easy to use SenML Records.

fxamacker / cbor Goto Github PK

cbor's People

Stargazers

Watchers

Forkers

cbor's Issues

Security Policy

Recommend Projects

Recommend Topics

Recommend Org