Git Product home page Git Product logo

Comments (11)

prataprc avatar prataprc commented on August 30, 2024 1

This is a recurring issue with JSON. Hope JSON gets extended with something like json5

from specs.

rvagg avatar rvagg commented on August 30, 2024 1

I'm trying to get broader engagement on this question from others (we might need to be patient due to the holiday period). So far this seems to me to be the question space to consider:

  1. Do we allow IEEE 754 to bleed into the Data Model and therefore we support these notions of Infinity -Infinity and NaN
    a. we could rule them out and reject them at encode/decode time so they aren't even usable
    b. we could treat them as magic values for a specific codec + language combination that you happen to get free but don't bet on them being transferable (this is implicitly the current approach and probably needs to be explicit if we adopt this)
    c. or we could make affordances in other codecs to support them (DAG-JSON first, as per that linked issue)
  2. If yes to the above, do we treat these things as part of "float", or do they become new kinds, like "null" is its own kind?
    • this might mean either saying that, by "float", we mean IEEE 754 and they roll up into the "float" kind and we bind ourselves by IEEE 754's limitations and affordances

from specs.

mvdan avatar mvdan commented on August 30, 2024 1

I admit I don't have a lot of experience with IEEE 754. At least from my point of view, I've never needed to use the special values, so I lean against making them their own kinds in IPLD. It seems to me like IPLD kinds should be very commonly used.

Personally, I think 1a and 1c are our only reasonable choices. 1b is "they might or might not work across codecs/languages", which in my opinion is perhaps even worse than 1a, as it doesn't seem particularly useful given IPLD's goals, and could easily confuse and mislead users into a false sense of security.

Assuming that there are valid use cases for using these special values in IPLD (e.g. scientific data), and that all of our existing codecs and languages can support the special values, I lean towards 1c. If not, then 1a.

from specs.

rvagg avatar rvagg commented on August 30, 2024

I don't know if we've properly addressed whether "float" in the data model maps directly to IEEE 754 and therefore whether it should even include infinites and not-a-number. Instinctively I would say that these are not ideal forms to be encoding anyway, but I don't know if I can justify that when IEEE 754 is so widely supported so these specials already have wide utility.

@ipld/core we should probably resolve that base question first, then whether the proposal above is a good idea. It seems like a reasonable approach to me if we accept the place of IEEE 754 specials in the data model.

(mostly though I'd say to just avoid floats entirely in your encoded data, they're a minefield)

from specs.

patrsc avatar patrsc commented on August 30, 2024

I agree and as far as I know IEEE 754 is the most widely used floating point representation: it is heavily used e.g. in scientific computing and engineering. Also most programming languages support it, so in my opinion it should be available in the IPLD data model to make it also useful in these domains.

from specs.

patrsc avatar patrsc commented on August 30, 2024

I would appreciate if we go for 1c and have these values as part of the float kind, because this is a widely used approach. This would make it possible to directly use outputs of scientific computing applications in the IPLD data model. I don’t know if it is necessary to make the "float" kind mean exactly IEEE 754, but allowing these special values as part of "float" gives at least good compatibility to IEEE 754.

from specs.

warpfork avatar warpfork commented on August 30, 2024

I appreciate @rvagg's breakdown there, and also tend to place my chips around 1a.

If I'm being highly opinionated: Attempting to build application logic around the special values in IEEE754 floats is a bad idea, period, no matter what language and what context you're operating in. Don't do it: you might not regret it; but if I have betting money, I'll bet you'll regret it. If creating sentinel values in an application, do so highly intentionally; don't use interesting corners of the IEEE754 to do it.

If I'm being highly highly opinionated: literally don't do floating point math and bother to preserve the results. Floating point math is a mistake. Floating point math is acceptable only for estimates -- and because it's only acceptable for estimates, in all situations where you have used floating point math to derive some values, you should still store the original numbers in non-floating point form, such that you're ready to re-do any math on that data in more precise ways in the future. (This is logic that I would especially apply in scientific compute, personally. Science has enough reproducibility problems before throwing floating point precision issues into the mix!)

If I'm being less opinionated: I still agree with the considerations of wariness about promoting IEEE754 corner cases into a problem that IPLD has to worry about in every one of our codecs. The problematicness of representing these values in JSON / DAG-JSON alone is cause for pause. We often expect to be able to use JSON / DAG-JSON as a human-readable format -- even in applications that do their defacto data storage and exchange in other formats, because having an isomorphic human-readable format is just so useful for debugging and development -- which means that expanding the IPLD specs in ways that increase the number of documents that are valid in some IPLD codecs, but aren't cleanly transcodable to the JSON / DAG-JSON codecs... doesn't really seem like a move in a desirable direction.

from specs.

mikeal avatar mikeal commented on August 30, 2024

1a is my preference as well.

from specs.

rvagg avatar rvagg commented on August 30, 2024

Will be resolved to 1a if #344 is merged.

from specs.

vmx avatar vmx commented on August 30, 2024

I've created a follow-up issue to think about best practices when encountering non-finite numbers like NaNn or Infinity: #346

from specs.

rvagg avatar rvagg commented on August 30, 2024

OK, this got enough agreement that in #344 we've added additional clarity to what "float" in the data model means, which doesn't include NaN and Infinity. There's some further rationale in there but it should be enough to note that there's a very large number of bit combinations for an IEEE 754 float that will resolve as a NaN (less, but still many, for Infinity and -Infinity), and this extends into CBOR too. With ipld/js-dag-cbor#13 we'll be removing them as an option for DAG-CBOR in JS and Go will probably follow suit. #346 has some additional thoughts on these symbols and how best to deal with them in content addressed data—the summary being - it's perfectly reasonable to want to use these symbols, but doing it using the IEEE 754 specials in content addressed data is not a good idea, best to do it in a way where there's a precise 1:1 mapping between the symbol in memory and what it can be in encoded form.

from specs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.