Git Product home page Git Product logo

Comments (3)

warpfork avatar warpfork commented on August 30, 2024

I think there's some good idea here, but also we should be cautious of overgeneralizing this.

Not all IPLD codecs have an "underlying" codec. For those that do: our specs will still be clearest if we specify what the IPLD codec does first; and specify what relationship this has to any other codecs in the wild second. And in all forms of interop: the practical details matter; and what various tools and libraries do in the wild can be just as interesting as what a spec or a reference implementation says, especially if those things diverge.

I think our codecs specs and documentation should be individually clear, for each codec, about any other widely-known systems they expect to be cross-compatible with, and how, and any conditions and limits there may be on that.

(And we need these detailed statements anyway: they're the other half of fully specifying any increasing strictness an IPLD codec might have versus a general understanding of that codec.)

Having the holistic goal of interop in mind for those codecs which do aim to have interoperability with existing systems is... good? But also tautological. If there's some phrasing of this that will help us write detailed interoperability reports per codec, I'm all for it; I'm just not sure what kind of statement that would be and what kind of explicitness it can really have that will be useful.

from specs.

vmx avatar vmx commented on August 30, 2024

I don't think I made my point clear enough. I don't want to oppose additional constraints on codecs, I'd like to document on the current state.

I think what I describe is how people currently understand, implement and use IPLD Codecs. I want to make sure our specs are precise and remove the chance for misinterpretation.

@warpfork would it help if more people comment/emoji here whether what I describe here matches their expectations about IPLD Codecs or not? I ask, because I observe a disconnect between the IPLD Team and the outside world. And I obviously also only have a limited view of the outside world. So it might help to get people from various backgrounds to chime in here.

from specs.

vmx avatar vmx commented on August 30, 2024

Here's a quick update after talking to various folks (thanks everyone!).

The byte-identical copy of the original data is not possible to enforce. Spec compliant implementations won't be strict enough. Examples:

  • @aschmahmann mentioed that Protocol Buffers might not always get serialized to the same bytes even if the input data is the same
  • @ribasushi mentioned float encoding in CBOR. The CBOR spec has some recommendation that you could use the smallest representation possible. I.e. if you have a 64-bit IEE-754 float that can be represented lossless in an 32-bit IEEE-754 float, then use a 32-bit float (example of such a float). Though implementations can just not do it (we plan for DAG-CBOR to always require 64-bit floats, which is what the Go implementation is already doing). So there could be two implementations encoding it differently, while still both being spec compliant.

Hence I propose a less strict version:

In case there is an underlying codec, the data produced by an IPLD Codec MUST be decodable by any spec compliant/reference implementation of the underlying codec.

"underlying codec" means a existing codec we apply additional constraints on. Examples are DAG-CBOR and DAG-JSON, where the underlying codecs would be CBOR and JSON.


We will likely get into SHOULD vs. MUST discussions. The reasons I'm in favour of" MUST":

So far all cases I've seen where the condition above wasn't met were bugs. Of course there are bugs that lead to data that doesn't comply to a spec (hence violate the MUST), that's the nature of bugs. I don't think those bugs should weaken the spec itself. The reasons those bugs exists is due to the spec not being precise enough. It's now the chance to add this precision. I don't think it should be a "SHOULD", just because there could be bugs.

This doesn't mean that I want to break all the data that was produced due to the bugs. I think it's totally fair to create libraries that can deal with such data. They are free to do so and even should if they care about backwards compatibility with pre-existing data. Though the MUST not produce such invalid data moving forward.

"producing data" is a bit fuzzy and I'd like that we operate with common sense here and not putting too much efforts on tighten the spec language. So if your library deals with invalid data, it might as well write it again. What I'm after is that we try to make those bugs less likely in the future.

This also (to me) does not mean that you always need to ensure that the data you write is valid. You just need to acknowledge that your data you produce might be invalid and might not work with other implementations. This would be similar to what Google is doing with Protocol Buffers. In Go you can produce invalid data (strings with arbitrary bytes), but you won't be able to read that with the Python implementation. Or if you store arbitrary bytes in strings in the Protocol Buffers JavaScript implementation (without doing any special additional work), they will be serialized differently from the ones that the Go implementation would serialize (although the original input bytes were the same).

from specs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.