Git Product home page Git Product logo

Comments (12)

warpfork avatar warpfork commented on August 17, 2024

You're holding it wrong! 😀

In refmt, the theory is that object mapping/marshalling/unmarshalling is completely decoupled from the serial format, so we don't have tags of "cbor" or "json". They're all just "refmt".

So a pretty mundane example struct with tags from another project looks like this:

type OutputSpec struct {
	PackType string `refmt:"packtype"`
	Filters  string `refmt:",omitempty"`
}

The format of tags is the same as the Go standard library json tags; they're just named "refmt".

(Additionally, there's technically support for custom tag names using atlas.AutogenerateStructMapEntryUsingTags(rt reflect.Type, tagName string) *AtlasEntry... but this is buried rather deep and I'm not sure it's a sensibly exposed API at the moment.)

Sidenote (if you didn't already notice): you may also want to note that the default field name mapping is slightly different than Go standard lib. refmt maps the first character of the name to lowercase by default. I'm of the opinion "it's probably what you wanted". So, a field called PackType will be named "packType" in serial form by default.

from refmt.

warpfork avatar warpfork commented on August 17, 2024

OH. <<...out-of-band epiphany...>> You mean CBOR Tags. Sorry; I got hung up on the name similarity with golang struct tags.

Okay, CBOR tags: Yep, not supported right now.

Tag support will be coming up soon though -- working on it now! I'll post progress updates as they occur. Here's the outline on what I currently expect to support:

  • Reading tags into the token stream format and serializing them from tokens: ✅ This will come out pretty quick (and I'm also adding a Pretty Printer feature in parallel, which will make that visible, if nothing else).
  • Handling recursive tags: ❌ The CBOR spec actually allows recursive tags -- there's a snippet in it about "if tag A is followed by tag B"... I'm not planning to support that. A) I think it's dumb. B) I've never seen nor heard of that feature being used. C) It would make a complete mess of performance: turning every situation with tokens into something that may require an unbounded number of memory allocations, because instead of a single slot, it may require a list; I'd rather not. (If someone really needs this someday, maybe I'll reconsider, but I just don't actually expect that to happen.)
  • Tag-triggered behaviors: ❓ I'll need your feedback on what behaviors you want from the library while handling tags. I don't use them, so I'm flying blind here.

More news soon.

from refmt.

whyrusleeping avatar whyrusleeping commented on August 17, 2024

Our primary usecase of tags is to denote where an ipld link is. Basically, when converting from cbor into an in memory object, anything with a certain tag needs to become an instantiation of a certain type, filled with the data the tag is wrapping.

from refmt.

warpfork avatar warpfork commented on August 17, 2024

Oh, wow, thanks github, that's totally an auto-behavior I meant for you to have.

from refmt.

warpfork avatar warpfork commented on August 17, 2024

CBOR tags, both deserializing and serializing, is now in on master. Tags should round-trip in cbor<->tokenstream<->cbor losslessly.

It's not yet wired to any marshaller/unmarshaller stuff.

from refmt.

warpfork avatar warpfork commented on August 17, 2024

So, the behavior you describe sounds defined if deserializing into a &interface{}, map[string]interface{}, or suchlike grabbag:

Let's say we have some cbor message roughly like: {"a": 1, "b":<tag:123>{"c":"d"}}.

If we set up the unmarshaller with a custom behavior to take tag=123 as a hint to produce TypeFoo, and give it that whole message and an &interface{}, it'll produce:

map[string]interface{}{
    "a": 1,
    "b": TypeFoo{"d"},
}

So far so good.

What should the unmarshaller do if you give it that same message and a handle to a &TypeBaz, where

type TypeBaz struct {
    A string
    B TypeQuux
}

The unmarshaller can't stuff a TypeFoo into a TypeQuux. What's the correct behavior?

from refmt.

Stebalien avatar Stebalien commented on August 17, 2024

If TypeQuux implements the unmarshal interface, I'd pass it to that and let it deal with it (is that possible in refmt?). If we're deserializing TypeQuux using reflection, I'd drop it. Basically, I'd treat tags hints unless we're unmarshaling into an interface (in which case we'd lookup the tag in type map).

However, it might be worth making it possible to specify levels of strictness for tags. That is, when registering a tag, one could say specify Hint (the above), NoReflect (error when using reflection with a type mismatch), and Force (the tag must use this type). Honestly, it may make sense to just make this a boolean and not have Force (that's a bit much).

(also, sorry for taking so long to get back to you on this, we've been a bit busy)

from refmt.

warpfork avatar warpfork commented on August 17, 2024

Ok, a bunch of this stuff should now be in and workable on master! You can see an example combined with some other advanced usage here in the test suite:

atlas.MustBuild(
	atlas.BuildEntry(tObjStr{}).UseTag(50).Transform().
		TransformMarshal(atlas.MakeMarshalTransformFunc(
			func(x tObjStr) (string, error) {
				return x.X, nil
			})).
		TransformUnmarshal(atlas.MakeUnmarshalTransformFunc(
			func(x string) (tObjStr, error) {
				return tObjStr{x}, nil
			})).
		Complete(),
)

This atlas building snippet will give you the power to have...

  • the serial format be a string (tagged)...
  • while the in-memory object format is struct{x string}...
  • and when deserializing into an interface{}, you'll still get the struct type, due to the tag hint.

So this is pretty cool.

There may still be paths where support for tag behavior is spotty, but if you uncover any, open an issue and we'll keep expanding the text fixture coverage and the features to match.


Aside: Despite supporting this, I also feel obliged to mention at least in passing that I think using CBOR tags is Probably A Bad Idea in most applications. You can do it. But I wouldn't do it without spending serious thought on the tradeoffs.

One of the major selling points of CBOR is its isomorphism to JSON -- it's easy to convert CBOR to JSON; and in most cases it's easy to convert the other way as well, and thus it's both simple and correct to consider JSON as an easy way for humans to author raw structures (that can then be canonicalized into CBOR). This breaks down with tags: there's no way to take a CBOR object with tags and convert it to a JSON object losslessly short of doing a giant schema expansion where every object is expanded into a tuple of {"tag":777, "real_obj": {...}} or {"tag": null, "real_obj": {...}}... and no one really wants to do that, because it's ugly as all sin. Similarly, there's no generic way to take a JSON object and somehow intuit which fields should end up with some kind of tag if it was converted into a CBOR object; either that massive unconditional object-depth-doubling approach, or some kind of external schema or other customized behaviors would be needed.

The ./refmt cbor=json converter utility I added this morning will currently silently drop the tags from the json output -- but that should almost certainly be considered a bug; what it should do is emit an error and reject the transformation since it cannot be done losslessly -- there's no way that piping the result back into ./refmt json=cbor will yield the original data again.

from refmt.

Stebalien avatar Stebalien commented on August 17, 2024

TL;DR: We're only using one tag and we're using a "reserved" key in JSON to represent it. That is, "name": <our-one-tag>"data" in CBOR maps to "name": {"/": "data"} in JSON.

Despite supporting this, I also feel obliged to mention at least in passing that I think using CBOR tags is Probably A Bad Idea in most applications. You can do it. But I wouldn't do it without spending serious thought on the tradeoffs.

Don't worry, we have. We're only going to use one tag and we're only doing that because we don't want to bend over backwards and make our system worse just to support JSON.

For some background, we're using CBOR as an (well, the "default") encoding for a merkle-linked structured data system we call IPLD. IPLD is basically a meta-system for understanding (reading, writing, traversing, querying, etc.) any merkle-linked data structure like, e.g., git and ethereum. We've chosen CBOR as the "default" encoding for new applications built on top of this system as it's flexible, compact, and schema-less (i.e., JSON but better).

However, to do this, we need a way to efficiently represent merkle-links. In JSON, we're reserving the special key "/" and using {"/": "LinkData"} however, doing that in CBOR would be a waste of space and would force us to reserve that key, even when we don't technically need it. That's why we're using a tag.

One of the major selling points of CBOR is its isomorphism to JSON -- it's easy to convert CBOR to JSON;

Many of our applications need to efficiently store binary data so the ship has already sailed on that to some extent. We've been talking about adding a special syntax for representing binary data in JSON (e.g. {"/": "b64encoded data", "type": "binary"}) but, honestly, we're not too concerned with being able to represent all data in JSON (our main concern is going from JSON to other formats, not the other way around).

Really, in my view, the selling point isn't JSON compatibility, it's JavaScript/Python/etc. object compatibility. That is, CBOR maps cleanly to the standard "object" structures used in most dynamically typed languages.

from refmt.

Stebalien avatar Stebalien commented on August 17, 2024

Also, this is awesome and this library is awesome (and will make our lives so much easier). Thanks!

from refmt.

warpfork avatar warpfork commented on August 17, 2024

Many of our applications need to efficiently store binary data so the ship has already sailed on that to some extent.

I see... Yeah, that's a fair cop.

(The default, generalized admonition stands for Anyone Else on the wide internets who ends up here by searching for "cbor tags" though...)


So... I will tentatively close this issue then? :D I think all the core path for the features you need is in place now. If you find stuff missing, more issues welcome!

from refmt.

Stebalien avatar Stebalien commented on August 17, 2024

Yep. Thanks!

from refmt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.