ipld / specs Goto Github PK

View Code? Open in Web Editor NEW

592.0 59.0 108.0 10.64 MB

Content-addressed, authenticated, immutable data structures

License: Other

JavaScript 84.12% Stylus 1.14% HTML 14.74%

ipld ipfs linked-data content-addressed hash graph

specs's Introduction

!!!

This document has moved.

You'll now find information like this in the ipld/ipld meta-repo, and published to the web at https://ipld.io/ .

All documentation, fixtures, specifications, and web content is now gathered into that repo. Please update your links, and direct new contributions there.

!!!

IPLD Specifications

The goal of IPLD is to enable decentralized data-structures that are universally addressable and linkable which in turn will enable more decentralized applications. These data-structures allow us to do for data what URLs and links did for HTML web pages. Read more about the principles that are guiding the ongoing development of IPLD in IPLD Foundational Principles.

IPLD is not a single specification, it is a set of specifications. Many of the specifications in IPLD are inter-dependent.

IPLD Blocks

The block layer encompasses all content addressed block formats and specifies how blocks are addressed, how they self-describe their codec for encoding/decoding, and how blocks link between each other.

IPLD blocks alone do not define data structures or types, although many codecs may convert these formats into native types, there are no type requirements or assurances about types at the block layer.

Documents:


Concept: Block	block-layer/block.md
Concept: Content Addressability	concepts/content-addressability.md
Concept: Multihash	block-layer/multihash.md
Specification: Content Addressable aRchives (CAR / .car)	block-layer/content-addressable-archives.md
Specification: Graphsync	block-layer/graphsync/graphsync.md

IPLD Codecs

Codecs serve as an intermediary between raw bytes and the IPLD Data Model. They determine how data is converted to and from the Data Model.

Codecs vary in the completeness in which they can represent the IPLD Data Model. DAG-CBOR and DAG-JSON are native IPLD codecs that currently enable the most complete form of the Data Model. Their base codecs, CBOR and JSON, are also valid IPLD codecs, are unable to represent some Data Model kinds on their own, in particular the Link (CID) kind (and Bytes for JSON), so DAG-JSON and DAG-CBOR provide mechanisms to represent these kinds.

IPLD can operate across a broad range of content-addressable codecs, including Git, Ethereum, Bitcoin, and more. DAG-PB is a legacy IPLD format that is still actively used for representing file data for IPFS.


Concept: Serialization and Formats	block-layer/serialization-and-formats.md
Specification: CIDs	block-layer/CID.md
Specification: DAG-CBOR	block-layer/codecs/dag-cbor.md
Specification: DAG-JSON	block-layer/codecs/dag-json.md
Specification: DAG-PB	block-layer/codecs/dag-pb.md
Specification: DAG-JOSE	block-layer/codecs/dag-jose.md

The IPLD Data Model

The Data Model describes a set of base required types to be implemented by a subset of IPLD codecs.

With these basic types authors can create various single-block data structures which can be read with predictable paths and selectors.

With just the data model, several data structures can be authored and put into a single block. These data structures can also link to one another, but a single collection (Map or List) cannot be spread across many blocks with only the Data Model.

Since different systems and transports may impose block size limits (often 2mb or more) in order to control memory usage, larger collections need to be sharded over many blocks at the Schema Layer.

Documents:


Specification: IPLD Data Model	data-model-layer/data-model.md
Specification: IPLD Paths	data-model-layer/paths.md
Specification: IPLD Selectors	selectors/selectors.md

Schemas and Advanced Data Layouts

IPLD Schemas define a mapping from the Data Model to instantiated data structures comprising complex layouts. Schemas add the ability to extend the IPLD Data Model to the wide variety of types required for typical programmatic interaction with a data source without the need to implement custom translation abstractions.

Schemas will also serve as an enabling layer for complex multi-block data structures via Advanced Data Layouts by providing stability and consistency of data model use within individual blocks and defined interaction points for the logic required for building and interacting with advanced data layouts, such as multi-block Maps, Lists and Sets.

Documents:


Concept: IPLD Multi-block Collections	data-structures/multiblock-collections.md
Specification: IPLD Schemas	schemas/README.md
Specification: HashMap	data-structures/hashmap.md
Specification: FlexibleByteLayout	data-structures/flexible-byte-layout.md

Specification document status

Specification documents in this repository fit into one of two categories and have one of three possible statuses:

Prescriptive
- Exploratory
- Draft
- Final
Descriptive
- Draft
- Final

Prescriptive specifications are intended to describe future implementations or, in some cases, changes to existing implementations.

Descriptive specifications describe existing behavior. In many cases these specifications are not intended to drive new implementations and are only written to understand existing behaviors.

Documents labelled "Specification" in this repository will also be labelled with a descriptor that indicates the category and status. e.g. "Status: Prescriptive - Draft" or "Status: Descriptive - Final".

Design documentation & Library recommendations

Included in this repository are some documents which chronicle our process in developing these specs, as well as some documents which are advisory to library authors (but not specifications, per se):

design/... -- gathers all such documents
design/history/... -- gathers research work and pre-spec content and notes
design/libraries/... -- gathers recommendations for library authors

These documents may be useful to read for those who want to participate more deeply in the design and specification processes (as well as implementation processes!) of IPLD.

Contributing & Discussion

Suggestions, contributions, criticisms are welcome.

Discussion of specifications happens in this repository's issues or via pull request. Discussion of IPLD more generally happens in the IPLD repository.

Check out our contributing document for more information on how we work, and about contributing in general. Please be aware that all interactions related to IPLD are subject to the IPFS Code of Conduct.

Governance

All changes to documents must take place via pull request.

Pull requests are governed by different rules depending on the document type and status of that document:

Specifications:

Exploratory Stage
- Authors can merge changes into exploratory specifications at their own discretion
- Exploratory specifications live in the design directory until they reach the draft stage. Specs names should include an alternative friendly naming convention (-A, -B, etc) while in this stage.
Draft Stage
- Authors must attempt to reach a consensus between all active participants before a merge
- If no objections are raised in a 48 hours period changes can be merged
- If objections cannot be resolved the change can be voted on by the IPLD Team
Final Stage
- Improvements that have a consensus can be merged
- Changes to behavior should not be merged unless absolutely necessary and agreed upon by a vote of the IPLD Team

Concepts and other documents (including README.md):

Authors must attempt to reach a consensus between all active participants before a merge
If no objections are raised in a 48 hours period changes can be merged
If objections cannot be resolved the change can be voted on by the IPLD Team

Glossary

DAG: Short for "Directed Acyclic Graph." It's a tree where two branches can point to the same sub-branch, but only in one direction so there's no possibility of recursion.

IPLD Team

The IPLD Team consists of currently active IPLD developers.

License

specs's People

Contributors

Stargazers

Watchers

Forkers

harlantwood stebalien hackervera warpfork sidharder digideskio wangq3635 gsec k56flex makevoid pieper tachyon5 rasmuserik jonnycrunch opencodez icefoxen dchirag andrewdryga kustomzone jorgeyp rmorey chainbook alvarlaigna sighttviewliu mikeal essarthur kjzz anacrolix claasahl millsvonmilski pmartinj roignpar liulinhui kayceesrk sambacha qmutz jayd2446 rklaehn alburtwang rtradeltd quantiply julianocristian kongfanru paulbirnbaumlinux gsiteiovn hinshun sekisamu alexey-n-chernyshov rvagg dholms olizilla d25zozo mishmosh fulldecent creationix leodenale amerameen arg3nt rubenkelevra masterjedy oed chafey jmcs0111 marqueis301 hudsonkavin1 ianopolous iprs-dev irinazheltisheva sts0mrg0 gaybro8777 ikhomyakov jonaskruckenberg vulcanize jley81 jmkim daotlresearch disy-yin gpapi13 hare1039 joskid salespaulo isabella232 danielcieslinski kevincox yotann farisei77 leozch dignifiedquire manny27nyc drgomesp godrivechain

specs's Issues

Captain.log - IPLD v1 spec

31-6 Aug

Fellow explorers, I am jumping on this spaceship, taking control on the pilot cabin and claim myself captain. Thank you to all the people that are contributing and have contributed on IPLD, as well to everyone using and willing to use IPLD. In this repo we will "make IPLD great again" so that it can truly be bring interoperability, secure permanent links, authenticated data structure to the whole web (worldwide and interplanetary).

I will keep this issue updated weekly (or so), I will write about (1) what is happening, (2) where help is needed the most. Please subscribe to this issue, or watch the repo for updates.

What is happening

We are refining the spec and we have a list of issues to be solved and we need to reach on agreements:
- #1 : Relative paths: yes or not?
  - yes: (pro) circular relations!, (cons) new errors (parent does not exist!)
  - no: (pro) simple design!, (cons) no circular relations across objects
- #2 : Properties in links: can link have user properties beyond /?
  - yes: (pro) can have hidden properties, (cons) complex!
  - no: (pro) simple! can use this space for IPLD specific keywords, (cons) NOTHING!
- #3 : Support links to have pointers that are hash + /path/to/whatever
- #5 : Can I specify multiple links with different hash functions in the link object?
  - yes: (pro) hash can break, links still exist (cons) need to detect if links' references are equal, complex typing
  - no: (pro) clean link object (cons) no survival through hash break
- #6 : CBOR should be upgradeable
- #7 : Namespace for IPLD
- #8 : Should IPLD object point to IPLD objects only?
- #9 : How to handle mutable links
- #11 : Errors to be handled
New writing of the spec is happening: here is a draft #10
IPLD moved into its own repo (however it is still in progress ipfs/specs#128 - help needed by @dignifiedquire, @jbenet, @diasdavid in moving all the things)
New proposal: CID and multicodec-packed ipfs/specs#130

Where is the help needed

We are having a call next monday to discuss the above ipfs/team-mgmt#124
There is a need to agree on most of the issues above
Provide examples of your use cases and needs

This is all for now!

Thank-yous

Thank you to @dignifiedquire and @RichardLitt for going over some of the previous PRs and issues
Thanks to @diasdavid and @jbenet for the updates on CID & co

cc @mildred, @Stebalien, @dignifiedquire

IPLD Type Convention

Motivation 1: I'd like to be able to look at an IPLD object and know, approximately, it's intended interpretation (without guessing or using context).

Motivation 2: I'd like to be able to define or extend a type system for my IPLD application without having it completely fail to interop with other IPLD type systems.

Motivation 3: I'd like to buy some time to figure out the perfect type system.

We've been discussing IPLD type systems but these discussions usually boil down to implementing the perfect system. I'd like to propose an alternative: the IPLD type convention.

Proposal: @type: $something denotes a type. What this type means depends on the type's type (if specified).

Why @? It's less likely to conflict but I'm not fixated on it.

Why "the IPLD type convention"? This isn't a specification. Basically, I'm giving in to JSON duck-typing and calling it "good enough".

Why is it good enough? This is a decentralized system so we'll have to check everything anyways. Trying to prescribe structure on users tends to lead to more trouble than it's worth (IMO). If we need more structure, we can always give the type a type to make sure we're operating within the correct type system.

How will this work with existing formats:

CBOR/JSON: Do nothing. For existing objects without a @type, these objects simply don't have types (within this system). Type systems that need to give everything some type can just give these some
Git (tree, commit, etc), Eth, etc: I'd like to retroactively add in a type field. Thoughts? I kind of doubt this will break anything.

We've also talked about adding a new format with the structure <CidOf(type)><data>. That is, introduce a new format where we put all the type and schema information in a separate object, prepending the CID of this separate object to the actual object (the value).

After considering this for a bit, I've realized we should treat these as separate concerns: we're conflating types with schemas. There's no reason we can't introduce this new, compressed format at some later date even if we go with the above "type convention" proposal.

Disclaimer: this was not my idea, I've just finally convinced myself that it's probably "good enough".

Thoughts @jonnycrunch (you're the one who told me to look into the JSON-LD spec), @diasdavid, @davidad, @whyrusleeping?

While I'd like to avoid prescribing too much, I'd like to define a set of conventions that users should follow. For example:

@type: CID - CID points to the actual type.
@type: {}: inline type. This will often be used for type "constructors". For example: {@type: {@type: "generic", constructor: CidOfConstructor, parameters: [...]}.
@type: "string": A human readable string/path. IMO, this should usually be used to specify the type system.
@type: 1234: A multicodec. A reasonable type-of function would look this multicodec up in the multicodec table to map it to a CID.
@type: [thing1, thing2, thing3]: multiple types.

Random notes -- on ipld and higher layers

Sorry for not formatting these prior-- just tracking them here. was explaining some concepts to @wanderer

{
  edge1: {
    weight: 100,
    link: { / : Qmfoofdkslafjkdlasjfgajfpdosa },
  },
  edge2: {
    weight: 200,
    link: { / : Qmfoofdkslafjkdlasjfgajfpdosa },
  }
}

> $root/edge1/weight
100

> #($root/edge1/weight)
<hash-of-100>

> &($root/edge1/weight)
$hash-of-root/edge1/weight

> $root
{
  edge1: {
    weight: 100,
    link: { / : Qmfoofdkslafjkdlasjfgajfpdosa },
  },
  edge2: {
    weight: 200,
    link: { / : Qmfoofdkslafjkdlasjfgajfpdosa },
  }
}

> $root
{
  edge1: { / : Qmlinkto100object },
  edge2: { / : Qmlinkto200object },
}

> $root/edge1
{
  weight: 100,
  link: { / : Qmfoofdkslafjkdlasjfgajfpdosa },
}

> $root/edge1/weight
100

> $root/edge1/link  
{ / : Qmfoofdkslafjkdlasjfgajfpdosa }

> $root/edge1/link
<node of Qmfoofdkslafjkdlasjfgajfpdosa>

> $rootofprotobuf
{
  Data: <buffer>,
  Links: [
    {
      Hash: <buffer>,
      Name: "string",
      Size: int,
    },
    {
      Hash: <buffer>,
      Name: "string",
      Size: int,
    },
  ]
}


> $fileRaw = {
  Data: "dsjfoasfjidos",
  Subfiles: [
    { / : Qmabc... },
    { / : Qmabc... },
    { / : Qmabc... },
  ]
}

> $fileRaw/Data
"dsjfoasfjidos"

> $file = File($fileRaw)
<File >

> $file.cat()
dsjfoasfjidos
contentsofQmabc...
contentsofQmabc...
contentsofQmabc...

# dir example
> $dirRaw = {
  Entries: {
    "foo1": { / : Qmabc... },
    "foo2": { / : Qmabc... },
    "foo3": { / : Qmabc... },
    "foo4": { / : Qmabc... },
  }
  Union: [
    { / : Qmdir1... },
    { / : Qmdir2... },
    { / : Qmdir3... },
  ]
}

> $dirRaw/Entries
{
  "foo1": { / : Qmabc... },
  "foo2": { / : Qmabc... },
  "foo3": { / : Qmabc... },
  "foo4": { / : Qmabc... },
}

> $dir = Dir($dirRaw)

# Ls lists through the unioned dirs.
> $dir.Ls() 
["foo1", "foo2", "foo3", "foo4", "bar1", "baz1", ... ]

# this should resolve through
> $dir/bar1

Ship CID Spec v1

We need a first iteration of the CID Spec, since a few from the community have raised issues on trying to understand CID. Right now the CID material is sparse across issues and repos.

Spec refining: Merkle-paths to an array's index

Hi,

it is likely that this already came up in some other repository. If so maybe just link there and I'll elaborate there. My question is:

Say I have the following ipld object:

{
  "someKey": "andValue",
  "listOfKeys": [
    {
      "someKey": "aValueIWouldLikeToRetrieve"
   },
   ...
  ]
}

Let's assume this object can be addressed using this hash: QmAAAA...AAA.
How can I address the x-th index of /QmAAAA...AAA/listOfKeys/?
Or to formulate a general question: How can specific indexes in a list/array in IPLD using merkle-paths be addressed?

Spec refining: requiring namespace in IPLD pointers

In #3 I specified that IPLD pointers are the following:

hash/object pointer HASH
plus and an attribute pointer (e.g. /friends/0/name)

can it be implicit, or does it have to be explicit by requirement?

The following are pro and cons of having explicit namespaces before hashes and never implicit names

Pro:

no assumption about canonical representation: we can map a particular namespace to a particular canonical representation (see #6)
upgradable: we can have ways to version (?) the protocol
link types: we can have different type of links (IPFS (maybe?), IPLD)

Cons:

Extra characters
Only real reason is the canonical assumption, but the canonical representation may never change..

cc @dignifiedquire, @diasdavid, @jbenet, @Stebalien

Spec refining: Do not bind the canonical representation to CBOR

The world might come up with new standards and a CBOR hashed object might point to a CBOR2 hashed object. In other words, there is no way to differentiate across "current" and "canonical" formats. In IPFS jargon, this needs a sort of multicodec for the hashes.

// implicit: bind canonical representation to the namespace
// ipld means CBOR is canonical, ipld2 means CBOR2 is canonical
{
  name: { '/': '/ipld/hash'}
}
{
  name: { '/': '/ipld2/hash'}
}

// explicit: use multicodec
{
  name: {'/': '/cbor/ipld/hash'} // or just /cbor/ipld
}

// hidden: use link properties
// only possible if this issue goes through: https://github.com/ipld/specs/issues/2
{
  name: {'/': 'hash', canonical: 'cbor'} // or any better property name
}

Pro:

we don't make a choice that is forever

Cons:

links will need to contain encoding (multicodec? multicanonical?) prefixing the linked object, or have a new hidden field (see #2) specifying the encoding

IPLD.md out of date?

Hi, this is the first time I open the issue in ipld. Please correct me if I misunderstand something. When I try to understand the merkle dag, I am looking for relative tools. Then I start read the IPLD.md. I found that some examples are out of date, e.g.ipfs object cat --fmt=yaml. There is no cat cmd in ipfs 0.4.17. And I only find the protobuf, json and xml in "ipfs object get". Thanks in advance.

Supporting existing mime types.

I wanted to start a conversation about the best way to support existing mime types.

Specifically, I want to talk about data that doesn't have links but is often linked to, like images and video. It would be great not to re-invent the entire mime/content-type system for data without links.

Something along the lines of mime[audio/aac].

We also may want to consider the same for addressing compression of the format mime[audio/aac][gzip].

I looked around for a previous discussion around this but couldn't find anything. If there's another thread please point me at it :)

Non-string keys

Are non-string keys allowed in maps? When I try with current ipfs dag put it indicates no, but wondering if that is current implementation or a design from the spec.

Spec refining: specify valid merklepath segments and encoding

Mission

It's important to specify precisely what is a valid merklepath segement in IPLD.

The spec currently contains a "TODO: list path resolving restrictions" and this could be improved :)

Why

First, a quick clarification: "merklepath segments" are a distinct concept from "IPLD Selectors". Merklepaths are a specific and limited implementation of IPLD Selectors; they can only specify a traversal to a single object; and importantly, we want them to be serializable in a way that's easy for humans to operate. To quote the current spec for motivations:

IPLD paths MUST layer cleanly over UNIX and The Web (use /, have deterministic transforms for ASCII systems).

(Perhaps "ASCII" is a little over-constrained there. The spec also says "
IPLD paths MUST be universal and avoid oppressing non-english societies (e.g. use UTF-8, not ASCII)" -- we might want to refine those two lines after we tackle the rest of this issue.)

Second of all, just a list of other issues that are fairly closely related to a need for clarity on this subject:

ipfs/kubo#1710 -- "IPFS permits undesirable paths" -- lots of good discussion on encodings, and examples of valid but problematic characters
#59 -- "Document restrictions on keys of maps"
#58 -- "Non-string keys" -- this one has a particularly interesting detail quoted: "The original intention was to actually be quite restrictive: map keys must be unicode strings with no slashes. However, we've loosened that so that they can contain slashes, it's just that those keys can't be pathed through.". (n.b. this issue here is named "merklepath segments", not "IPLD keys", specifically to note this same distinction.)
#55 -- "Spec out DagPB path resolution"
#37 -- "Spec refining: make sure that an attribute cannot be named ."
#20 -- "Spec refining: Merkle-paths to an array's index"
#15 -- "Spec refining: Terminology IPLD vs Merkle" -- basically, am I titling this issue correctly by saying "merklepath"? Perhaps not ;)
#1 -- "Spec refining: Relative paths in IPLD" -- may require reserving more character sequences as special
ipld/legacy-unixfs-v2#3 -- "Handing of non-utf-8 posix filenames"
ipfs/kubo#4292 -- "WebUI should (somehow) indicate a problematic directory entry"
perhaps more!

As this list makes evident... we really need to get this nailed down.

Mission, refined

Okay, motivations and intro done. What do we need to do?

(1) Update the spec to be consistently clear about IPLD keys versus valid merklepath segments. This distinction seems to exist already, but it's tricky, so we should hammer it.

(2) Define normal character encoding. (I think it's now well established that this is necessary -- merklepath segments are absolutely for direct human use, so we're certainly speaking of chars rather than bytes; and also unicode is complex and ignoring normalization is not viable.)

(3) Define any blacklisting of any further byte sequences which are valid normalized characters but we nonetheless don't want to see in merklepath segments.

(4) Ensure we're clear about what happens when an IPLD key is valid but as a key but not a merklepath segment (e.g. it's unpathable).

(And one more quick note: a lot of this has been in discussion already as part of sussing out the unixfsv2 spec. In unixfsv2, we've come to the conclusion that some of our path handling rules are quantum-entangled with the IPLD spec for merklepaths. Unixfsv2 may apply more blacklistings of byte sequences which are problematic than IPLD merklepath segements, so we don't have to worry about everything up here; but we do want to spec this first, so we can make sure the Unixfsv2 behavior normalizers are a nice subset of the IPLD merklepath rules.)

Progress

Regarding (1): "just a small matter of writing" once we nail the rest...

Regarding (2): We have an answer and the answer is "NFC". (At least, I think we have an answer with reasonable consensus.) We had a long thread about this in the context of unixfsv2, but entirely applicable here in general. Everyone seems to agree that UTF8 is a sensible place to be and NFC encoding is a sensible, already-well-specified normalization to use. And importantly, in practice, NFC is the encoding seen in practically all documents everywhere, so choosing NFC means we accept the majority of strings unchanged. Whew. dusts off hands

Regarding (3): Lots of example choices in ipfs/kubo#1710 . We need to reify that into a list in the spec.

Regarding (4): Open field?

(I'll update this "progress" section as discussion... progresses.)

Implementation: separate IPLD from CBOR

Taken from (meeting-notes/2016-09-19-ipld.md#layers)

Most of our IPLD implementations from js-ipld to js-ipfs-ipld assume that there is only one way to encode an IPLD object and this is CBOR. However, from the new CID conversations, IPLD+CBOR is what we all have implemented.

Generalize our modules
Create a module js-ipld-cbor(-parser|-serializer)?

note: All this is of course abstracted from the IPLD modules that by default will choose to use IPLD+CBOR

Primitives Proposal

So, there are two modes of operation with respect to IPLD:

Deserialization DWIM: In this mode, we take an IPLD object and try to decode it into some struct definition. This is the easy case as the struct tells us which types are acceptable.
Introspection: In this mode, we traverse through arbitrary IPLD data. This is the case where we actually need a set of primitive types as we need to be able to look at a field and figure out its type with no additional
information.

IPLD needs a set of primitives supported by all fully-expressive formats. Not all formats need be fully expressive (i.e., JSON can be a special beast). However, when converting to a non-fully-expressive format, data that can't be expressed without loosing type information should be thrown away.

The ones we can all agree on:

Map
Array
Utf8String
Bytes
Cid
Null (undefined and null both map to this)
Bool

The big question: What number types to we support?

In Crete, those of us who met up to discuss this agreed on a single magical Decimal type (a superset of all number types we might care about except rationals and irrationals). However, @jbenet (reasonably) objected on the basis that we lose important type information this way. This is especially important for systems that use L0 (a type schema system for IPLD).

Unfortunately:

Users like being able to write "numbers" and have them just work. Without forcing users to define a schema, having multiple number-like types will cause trouble.
CBOR already has some type magic around numbers as the canonical variant dictates that integers must be encoded as small as possible. Therefore, it doesn't really distinguish between, e.g., uint8 and uint32. CBOR really just has an int type.

The only way I can think of properly solving this is by saying that CBOR is not, in fact, a fully-expressive format. Instead, L0 would be the only fully-expressive format and CBOR would also be a subset format (joining the ranks of JSON).

In that case, I'd propose the following number primitives:

(u)int{8,16,32,64}
float{32,64}
BigDecimal (byte string)
BigInt (byte string)
BigRational (byte string)
BigReal (byte string)

Eventually, I'd like to implement these in L1 so we don't have so many.

The L1 language will also have tagged enums but those will be expressible in other formats as:

{
  "/tag": "my tag",
  ... data
}

Note: I'm not happy with this. I'd prefer a minimal set of primitives or a maximal (extensible) set. This set is already missing, e.g., int128 (supported by rust) but, as most languages don't have that, I'd rather not include it. Really, I'd prefer to have generic ints (over the size) but no language that I know of except LLVM bytecode support them.

Spec refining: Properties in links

The current spec defines that links are represented using the keyword property /. However, it doesn't specify what are the constraints for the link object.

// Valid IPLD
hash1
{
  hereIsALink: {'/': HASH},
  size: 500
}

// IPLD with property in a link
hash2
{
  hereIsALink: {
    '/': HASH,
    'size': 500
  }
}

Should we support properties in links or should we not support that?

If we decide to support this, the property size cannot be addressed with an IPLD pointer, however, this property can still be accessed by loading the object hash2 and manually/programmatically traverse the object to get the size value.

Reason to support

the difference between ipld data model and json is just reserving the keyword /
developers can hide whatever they want

Reason not to support it

hidden elements make IPLD a little bit more complex (hidden properties cannot be addressed)
makes IPLD not transparent

Idea
A great idea would be to properties that are not specified by our spec, so that we can reuse this space for future reserved keywords

cc @jbenet, @mildred, @dignifiedquire

Why not just disallow duplicate keys?

Duplicate keys are invalid, so why not just disallow duplicate keys and force parsers/handlers to reject them as invalid instead of saying "duplicate keys magically vanish in a pre-defined and consistent way that everything must take care to implement correctly"?

Sure, you can synthesize data with duplicate keys in CBOR or JSON or whatever, but that just makes it a malformed object. You can synthesize data with no link in it at all, that doesn't mean that is a valid IPLD object.

Structural type encoding format

For L0, we need a format for encoding structural type definitions.

Requirements:

Encoded in IPLD.
Defined using L0 (where L0 defines a primitive type type).
Allows compact value representation (but not necessarily compact type representation).
Supports reflection, introspection.
Supports code generation.
Upgradable/extensible without consensus (i.e., no having to agree on field tags).

Good to have:

Generics.
Transformations (i.e., type definitions define transformations for upgrading/interpreting other types).
Pointer/value distinction: allow embedding typed objects by value instead of only by pointer.

Prefered attribute names ..

Is there list, either as part of the IPLD spec or referenced by it, of prefered attributes. This is going to be important if metadata is to be produced by one service and consumed by another.

Specifically ... should we use
contenttype, content-type, mimetype, type, for the Mime type (presuming the value looks like that in http)

And should we use date, time, datetime for the time, and in which format (the example in https://github.com/ipld/specs/tree/master/ipld looks obviously wrong i.e. "1435398707 -0700"
(The obvious correct format would be Zero-timezone ISO. e.g.
JS: new Date(Date.now()).toISOString() # 2017-09-18T21:43:05.038Z'
PY: datetime.utcnow().isoformat() # 2017-09-18T14:42:02.583394

Implementation: Tracking implementation effort for IPLD v1

According to our conversation (meeting-notes/2016-09-19-ipld.md#layers), here are the following layers we have agreed on having:

Layers
- Blobstore
- Format-specific Serializers/Deserializers
  - Separate IPLD from CBOR: #25
  - Write serializer/deserializer modules
- IPLDDagStore (rename needed)
- Resolver
  - Make it into its own module: #26
- IPLD Service
Interop
- Understand and discuss with friends at Dat, Ethereum & co on how to make this effort work or be supported in their systems
- ...

Please, check as soon as the work is done (also, link issues and I will add them in here)

cc @dignifiedquire @flyingzumwalt (also, is this the right place for implementation conversations? should we use ipld/ipld instead?)

Spec refining: Support of IPLD pointers as links

I define the following terms:

IPLD pointers (HASH/path):

hash/object pointer HASH
plus and an attribute pointer (e.g. /friends/0/name)

IPLD links: {'/': HASH}

As far as I remember, current implementations only support links that have hash pointers, however, IPLD links should support full IPLD pointers. So that these things can happen:

hashNicola
{
  name: {
    'first': 'Nicola',
    'last': 'Greco'
  }
}
hashNicola2
{
  fullname: {'/': 'hashNicola/name' }
}

hashNicola2/fullname/nicola === 'Nicola'

cc @jbenet, @mildred, @dignifiedquire

Spec refining: Mutable links in IPLD

As soon as we allow mutable links in IPLD we lose some of the functional authenticated data structure properties. In other words, we cannot any more do reasoning that assume immutable datasets.

Proposal 1: assumption on the name spaces

However, we could allow mutable links, as long as mutable link can be differentiate from an immutable one, say "/ipns/HASH", we are able to specify in the IPLD parser that IPNS is mutable (which add complexity to the parser as soon as we are going to have other addresses)

Proposal 2: explicit mutable links

Being explicit, say that / is immutable, ~ is mutable. However developers can run in a lot of errors by putting immutable links in ~ (which is safe) and viceversa (that is not safe)

{
  test: { '~': '/ipns/hash'}
}

cc @diasdavid, @dignifiedquire, @jbenet, @mildred

Mockups and making clear that they are not working

Bitcoin block in examples is not working (the Qm0000... is invalid multihash). It should be made clear that it is only an mockup not a real world example.

{
  "parent": {"/": "Qm000000002CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8"},
  "transactions": {"/": "QmTgzctfxxE8ZwBNGn744rL5R826EtZWzKvv2TF2dAcd9n"},
  "nonce": "UJPTFZnR2CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8"
}

Why bother with JSON?

You say "The IPLD Data Model defines a simple JSON-based structure for all merkle-dags, and identifies a set of formats to encode the structure into."

But the canonical representation is CBOR, it appears. And you go through some work to try to get around JSON's gotchas. So why not define it in terms of CBOR's data types and structure rather than JSON's? That already handles some of your constraints since, for instance, CBOR is specified to always use UTF-8 strings, CBOR has real integer types, and so on.

Then you have the specification "IPLD MUST be able to import and export to JSON trivially" which can remain, you just need a single lossless CBOR-to-JSON mapping, which already exists.

Spec refining: Relative paths in IPLD

(reposted from ipfs/specs#112)

Originally in interplanetary paths I included the possibility of having relative paths.

Here I am proposing to add in the specs (and in the js-ipld implementation), the ability to resolve relative paths.
Having relative paths saves space (../, instead of a full hash), helps avoiding splitting things in multiple pieces (../ instead of creating a new object) and finally, allow circular relationships: merkle graph!

{
  nicola: { '/': hashNicola },
  david: { '/': hashDavid }
}

{
  name: "Nicola",
  friends: [{'/':'../david' }]
}

{
  name: "David",
  friends: [{'/':'../nicola' }]
}

cc @jbenet, @mildred, @dignifiedquire

Document restrictions on keys of maps

When you store data serialized as CBOR in IPLD there are restrictions on they keys that can be used in maps. This needs to be documented in the spec.

The restrictions are:

Keys must be strings
If they should be "pathable" (traversed with a path), they must not contain a slash

This information is based on the discussions at #58.

Implicit traversal through links in dictionaries

I see an inconsistency in the way link within dictionaries are resolved. I may misinterpret the spec but then one should consider clarifying it (I would be happy to file a PR in this case).

I refer to https://github.com/ipld/specs/tree/master/ipld#link-properties-convention. The command ipld cat --json QmCCC...CCC/cat.jpg yields the content of cat.jpg. However, I would have expected the following output:

{
    "link": {"/": "QmCCC...222"},
    "mode": "0644",
    "owner": "jbenet"
}

The IPLD ROADMAP

C&P from ipfs/specs#115 //cc @nicola

Let's collect the IPLD bigger picture here

cc (@jbenet, @mildred, @dignifiedquire, @Stebalien)

I will clean this up in different issues

Is this thin?

I think this is a good format, and I love the idea that the thin waist of IPFS gets it's own name and spec, but I don't think this format works well as a thin waist.

It's great it use from the application's perspective, but from the perspective of an implementation (storage, networking, etc), imposes extra obligations. The implementation needs to, e.g.:

understand/verify some Unicode encoding (probably UTF-8)
with #9 understand IPNS
with #1 traverse other IPLD objects when dereferencing

But suppose that a bunch of implementations just used "bare-bones" Merkle nodes made up of a) a list hashes of other nodes, and b) a list of bytes. It's straightforward to implement IPLD on top of that: replace hashes in links with indices into the list of hashes, pack into canonical form, and use that as the list of bytes. All of bullet points I provisioned now become the responsibility of the bare-bones Merkle DAG <-> IPNS shim, and not the underlying system. If this is indeed a valid implementation strategy, then it would be nice to specify that bare-bones format so the shim can indeed be reused.

Now, it sometimes to make sense to give lower layers more information than they strictly need for the sake of optimization. But I don't see how a storage / networking layer could leverage the knowledge that nodes have the rich stricture of IPLD rather than the dumb structure of that bare-bones example. If there indeed is none, then all layers should use the bare bone format with a shim, and thus the bare bones format (by it's simplicity) becomes the thin waist.

Migrate the IANA application document to here

This was made by @mildred here: ipfs/specs#75

@nicola can you make sure this is included in the IPLD specs repo and taken care of?

Spec refining: Multiple links in the link object

In my original design I had the possibility to have a set pointers that can be used in the link object.
The reason for this is for example: link using multiple hash functions (in case one breaks)

// one hash function
{
  name: {'/': 'SHA2-Hash/test' }
}

// multiple using arrays
{
  name: {'/': ['SHA2-Hash/test', 'SHA3-Hash/test']}
}
// in this case, the parser will know we are talking about a link
// since the property `/` will define the type (in CBOR as in JSON)

Pro:

Have a way to specify multiple hashes from different hash functions

Cons:

the user has no guarantee that SHA3-Hash points to the same hash as SHA2 (unless they use algebraic hash functions (?))
the user must specify the order of priorities of the hash functions

cc @dignifiedquire, @mildred, @Stebalien

Finalise IPLD spec

The IPLD spec document has the following warning:

UPDATE: we re-drafted this spec to deal with links. We hope to re-finalize it shortly. Sorry for any inconvenience. This was an important change to do before implementations shipped.

The last edit to the spec was 9 months ago. Is the spec finalised and can the warning (which prevents me from taking the spec too seriously) be removed?

Spec refining: Terminology IPLD vs Merkle

I see a lot of different names being used: IPLD pointers/path, (mostly used by me), Merkle-links, Merkle-pointers & so on.

Prefixing things with merkle

I am personally not a big fan of calling all these things merkle-*, since they have little and little to do with what Merkle originally meant with his Merkle trees. However, I do see a general trend in calling these Merke Dags & so on even in different projects. Then, it maybe a good idea to explain IPLD as a standard for Merkle Dags (however this may be very narrow, there is a lot more! authenticated data structures!). However, understanding "Merkle Dag" can be more difficult to understand than "replacing links with hashes".

Prefixing things with IPLD

I would not recommend renaming everything with an IPLD prefix, except if it is IPLD specific - or unless we want the IPLD name to be used more than the Merkle name.

Here is somehow what I suggest (which is very similar to the existing spec)

$hash_pointers/path: $cid_hash + $path (better than merkle path)
$path: /something/but/not/hashes/here.png (already an IETF spec)
hash-link: when using the pointer to link different objects - the hash-link is really the link object {'/': $hash_pointer}
merkle-dag: graph of objects linked with hashes

We could use IPLD pointers/path and IPLD links to imply a specific way to implement Merkle/Hash links.

Other that will keep the IPLD name:

IPLD data model
IPLD serialization

Map the entire Ethereum State into IPFS with ipld

It would be super cool if we could map the entire Ethereum state into ipfs with the help of IPLD.
Here is the proposed layout.

  eth\-block hash-\
                   \-previous block
                   |
                   |-txs-\
                   |     |-tx 0
                   |     |-tx 1
                   |     \-tx N
                   |
                   |-uncles-\
                   |        |- uncle 0
                   |        |- uncle 1
                   |        \ -uncle N
                   |
                   |-header-\
                   |        |-number
                   |        |-mix
                   |        |-nonce
                   |        |-difficulty
                   |        |-bloom
                   |        |-coinbase
                   |        |-gasLimit
                   |        |-timestamp
                   |        \-extraData
                   |
                   \--accounts-\
                                \--address N--\
                                               \--balance
                                               |--nonce
                                               |--code
                                               \--storage--\
                                                            \-key 0
                                                            |-key 1
                                                            |-key N

some initial work has begun by adding RLP to multicodec

I think the next step is to be able to serialize the ethereum merkle patricia nodes correctly. Where do we start? Where exactly can we plug in the logic for to serialize and unserialize ethereum merkle patricia nodes? :)

Idea for permanent mutable links

(Was going to be a comment on #9 but got a bit off topic. Also, it might be a good idea to move this to the ipfs/specs repo).

A problem I haven't yet seen addressed is mutable links that never break. Currently, you can do one of two things:

Use an IPFS/IPLD link and it will never break (assuming you recursively pin the object).
Use a relative link and the target document can be updated, amended, corrected, etc.

From a web perspective, 2 is preferable in the short run but 1 is preferable in the long run. That is, if you, e.g., link to a news article using a permanent link, someone will yell at you for linking to an outdated/incorrect version of the article. However, if you go with 2, someone will eventually yell at you 2+ years down the road when the link fails to resolve (although by then they may not be able to contact you so you may be safe...). I've already seen IPFS users in the wild write down two links (literally "here is my ()" to avoid this problem.

So, it would be really nice to have a simple way to do both at once. That is, link to a mutable object but include a link to a snapshot.

Solution 1:

Embed IPLD links in mutable links. That is, /ipns/XXX+YYY/path/to/object where XXX is the multihash of the IPNS key and YYY is the multihash of the IPLD object.

Pro: It's "just text". That is, if a user copies and emails or bookmarks the link, they'll get both the IPLD and IPNS link at the same time.
Pro: It's very simple and easy to read (and easy to remove/update the IPLD hash).
Pro: It also works with http/https: /http(s)/origin+YYY/path/to/object
Con: The IPLD link is not canonical (short form). That means I'd need to pin all objects along the path to make it perminent.
Con: I'm not sure this plays well with more general multiaddrs but I don't know enough about the spec to know.
Con: It doesn't actually mention that YYY is an IPLD hash.

Solution 2:

Do it at the application level. That is, punt.

Pro: Less work now.
Pro: Flexible.
Con: Less "standardized".
Con: Not "just text" (personally, this is why I really want to do this at the link level).

Other

There have to be other solutions and are probably ways to improve on solution 1. Thoughts? If this doesn't really make any sense, say so and I'll review it when I'm actually awake.

Implementation: Make resolver into its own module

As discussed in (meeting-notes/2016-09-19-ipld.md#layers).

Making eth an application rather than a format

This is the argument on why we should have eth-blocks not as a format and a multicodec, but a namespace and an application.

Note: Personally, I am fine either ways! I just want to make sure we make the right choice.

I tend to think that namespace work best in this particular cases, but of course, if we decide that the destiny of CID is to abstract on this, so that everything is an IPLD object, I am fine with this too (but the case for unixfs that I make below, is still valid)

This answers #27 (and https://github.com/unixfs/notes/issues/173) in a different way than the current proposal: application vs data format

Current state: Eth-block as a data format

Eth-block a data format for IPLD:

eth-blocks will resolve in /ipld
eth-blocks will need to reserve a multicodec number that will be prefixed their hash
eth-blocks will need ipld-parser-eth

Process to transform eth-block into IPLD:

read eth-block hash
spot multicodec
decode binary into IPLD

By having eth-block to be a data format, we are overspecializing a format to only work with a particular application. What if there will be 100 new cryptocurrencies? Will we create new formats?

Proposed: Eth-block as an application

Parallelism with unixfs

Let me start with this unixfs on IPLD parallelism

tl;dr:

unixfs takes IPLD objects and turn it into IPFS blocks (= IPLD binary)
ethfs takes Eth blocks (IPLD binary) and turn it into IPLD objects

unixfs as a data format

Say that we treat unixfs as a data format for IPLD, then:

unixfs would resolve under /ipld
unixfs will need to reserve a multicodec number that will be prefixed..
unixfs needs a ipld-parser-unixfs

Process to transform unixfs into IPLD

read hash, spot multicodec, transform IPLD objects into IPLD binary (= unixfs objects)

unixfs multicodec = 0xIP

// /ipld/HASH
// say that sharding was actually done this way
{
shard1:
{blocks: [{ '/': h1}, { '/': h2}, { '/': h3}]},
shard2: ...
}

// /ipld/0xIPHASH
Hello how is it going this is my long content...

unixfs as an application

Instead, for simplicity, instead of doing that, we made unixfs an application on top of IPLD, not a data format.

unixfs as an application:

will need a namespace /unixfs
/unixfs will transform IPLD objects into IPLD binary (= unixfs objects) as shown before
/unixfs will serve IPLD binary

Process to transform IPLD to unixfs

read path, spot namespace, find application mapping namespace, transform IPLD objects into IPLD binary (= unixfs objects)

Eth-block as an application

Eth-block as an application:

will need /eth namespace
/eth will transform IPLD binary (which is Eth binary block) into IPLD objects
/eth will serve IPLD objects (traversable & so on)

Process to transform Eth-block (= IPLD binary) into IPLD object

read path, spot namespace, find eth application, use eth application to transform a binary into an IPLD object

End of the story

At the end of the day, if you look at the process, it is essentially the same

Differences

addressing: multicodec vs namespace?
make binary into IPLD: parser level vs application level?

Other questions

Isn't Eth-block too application specific to be a data format?
Are we going to create data formats for every cryptocurrencies and future application specific formats?
CID can point to different application specific object?
multicodec has applications-specific format beyond data formats?

cc @diasdavid @jbenet @dignifiedquire @Stebalien

Handle IPLD changes proposed by @davidar and @mildred

There is a long made PR by @davidar in ipfs/specs that was never handled and now the IPLD specs live here. @nicola could you make sure to review the proposed changes and incorporate whatever is needed in the IPLD spec?

Link to the PR - ipfs/specs#111

Spec refining: listing down the possible errors to be handled

Although we should do like many other similar specifications:

This specification does not define how errors are handled. An application SHOULD specify the impact and handling of each type of error (json pointers)

The spec should list out the different possible errors spec implementors and application developers will need to take into account. Let's keep this issue to list down all the possible erros. I am listing a few:

CID has bad syntax (?)
hash function not known (and all the others inherited from the multihash spec)
Path referencing to non existent value

Ship IPLD Spec v1

This must be in the form of guidelines/detailed explanation that should be enough to understand and implement IPLD in different languages

IPC / Transformations

Dear All,

I have been trying to find information on IPLD transformations. If anyone has documentation or code on the matter perhaps link it here. It may help to have a place where all the existing work temporarily resides.

I know I am an outsider, so please consider this a show of interest/curiosity. If this issue is nonconstructive, please let me know and close the issue.

Spec refining: IPLD pointers must point to IPLD objects only

Can pointers be in the IPFS space? or these should not be linked with {/: '/ipfs/hash'} ?

The following is an argument on why we should only have IPLD pointers to IPLD objects (so never to IPFS). The idea is that we can still link to IPFS objects - but not using the IPLD link object ({'/': hash})

Pro:

very clean: all IPLD link to IPLD object only

links to other objects that are not traversable with the IPLD pointer/data model should not be supported (this creates very awkward pathing) instead they should be used in userland (so no IPLD pointers)

// /ipld/hash1
{
pictures: { '/': '/ipfs/folder' }
}

// /ipld/hash1/pictures (this will be an ipfs object)
// hence one could do the following: /ipld/hash1/pictures/picture1.jpg
// is this ipld or ipfs -> namespace conflict!!!

we could make /ipld implicit (but maybe not #7)

Cons:

we need to move IPFS links to the application layer
IPFS cannot be traversed using ipld pathing from an IPLD object (see example above)

However: if IPFS is just a transformation on top of IPLD, then IPFS objects (e.g. folders) are still IPLD objects, hence it might actually make sense to keep IPFS links

cc @dignifiedquire, @diasdavid, @jbenet, @Stebalien

Designing IPLD Selectors

IPLD Selector Plan

It has been quite a long time of exciting conversations about IPLD Selectors, in this document we try and look to understand how we can do them and what we need. Pleas post here your use cases

Design goals

The IPLD selector is a simple language to select subgraph of an IPLD object.
Similarly, CSS selectors select a subtree of the HTML DOM tree to apply some style, Unix globs selects a subtree of folders and files in a file system.

The IPLD Selector must be:

simple: a very simple language
cheap: should have a very small runtime (otherwise IPLD selectors can be used to overload nodes in the network)
provable: every subtree should be presented with a proof (unless the request is performed locally and the computer is trusted)

The IPLD Selector should be:

path-compatible: the selector language should be compatible with the unix-path, this would allow us to reuse the way we address content for this purpose

The IPLD Selector must not be:

full-query: a complex query language that performs transformation of the graph (SUM, JOIN)
high-complexity: beyond regular language

Use cases

Here is a list of use cases that I collected with different members of the community, I would love for @wanderer to input his own here too.

Pointing: Getting known nodes
1. Root node (trivial)
  It can be done via the current path scheme /ipld/hash/
2. Leaf node with known path (trivial)
  It can be done via the current path scheme /ipld/hash/node
Selecting: Getting unknown nodes
1. One subgraph
  1. expand attribute's children
  - Glob: /ipld/hash/attribute/*/
  - CSS: #hash > .attribute > *
  1. expand the entire graph
  - Glob: /ipld/hash/attribute/**/*
  - CSS: #hash > .attribute *
2. Multiple subgraphs
  1. with known properties
    Glob: /ipld/hash/{attribute1,attribute2}
    CSS: #hash > .attribute1, #hash .attribute2
Filtering: Get known attributes of unknown nodes
1. find matching property in attribute's children
  Glob: /ipld/hash/attribute/*/property
  CSS: #hash > .attribute > * > .property
2. find matching property in the entire graph
  Glob: /ipld/hash/attribute/**/property
  CSS: #hash > .attribute .property
3. find graph of a matching property
  Glob: /ipld/hash/attribute/**/property/**
  CSS: #hash > .attribute .property *
4. filter by regex on property
  Glob: /ipld/hash/attribute[0-9]/name
  CSS: #hash > .attribute[id^='s'] (actually not the same!!)
Filtering/Searching: Get known relations of attributes
1. find all the nodes that have attribute1 and parent references to it with attribute2
  Glob: cannot use kleene property across folder
  Regex: /ipld/hash/(.*)/((attribute1/attribute2)/)*
  CSS: #hash .attribute1 > .attribute2

Practical examples

Get the name of all of my friends

root: {
  friends: [h1, h2, h3]
}
h1: {
  name: "Juan",
  ..
}
h2: {
  name: "Jeremy",
  ..
}
h3: {
  name: "David",
  ..
}

// Raw tree: tree filled with missing pieces
// Can be streamed but only as patches (will explain more)
f(root) == {
  friends: [{name: Juan}, {name: Jeremy}, {name: David}]
}

// Transformed tree: only return the selected elements
// Can be streamed
f_transformed(root) == ["Juan", "Jeremy", "David"]

Glob: /ipld/root/friends/*/name
CSS: #root > .friends > .name

Get every block in a sharded data structure

root: {
  filename.txt: {
    shards: {
      shard1: [h1, h2],
      shard2: [h3, h4]
    }
  }
}

h1: XXX
h2: YYY
h3: ZZZ
h4: 000

f(root) == {
  filename.txt: {
    shards: {
      shard1: [XXX, YYY]
    },
    shards: {
      shard1: [ZZZ, 000]
    }
  }
}
f_transformed(root) == [XXX, YYY, ZZZ, 000]

Glob:
- /ipld/root/filename.txt/shards/shard[0-9]/* (or similar)
- /ipld/root/filename.txt/shards/**/* (if guaranteed to only have shards)
CSS:
- #root > filename.txt > .shards > * > hash (assuming that hash is a type)
- #root > filename.txt > .shards > * > *
- #root > filename.txt .shards * (if guaranteed to only have shards)

Get blockchain's entire block headers (@whyrusleeping need)

root: {
  parent: p1,
  block: b1
}
p1: {
  parent: p2,
  block: b2
}
p2: {
  parent: genesis,
  block: b3
}
genesis: {
  parent: null,
  block: b4
}

f_transformed(root) == [
  {parent: p2, block: b2 },
  {parent: genesis, block: b3 },
  {parent: null, block: b4 }
]

CSS: #root .parent > .parent (actually this does not guarantee it, but works in this case)
Glob: /ipld/(parent/)* (this does not work in practice)

Existing work

JSON Path
XPATH
JSON Select
t-regex

Temporary conclusion

I am experimenting with different already existing selectors, I believe that if we want to make our selector language simple, it would be an advantage if it is a selector many are already comfortable with (like in my examples Globbing or CSS selectors).

I have a strong preference for re-purposing globbing for IPLD, since globbing is unix-friendly, meaning that we can use the way we use globbing in the unix shell to select subgraphs in IPLD - hence might be backwards compatible with existing systems

I need use cases! @whyrusleeping, @diasdavid, @dignifiedquire, @wanderer, @jbenet

IPLD and compression

Compression came already up several times. I think there should be some kind of compression for IPLD.

I can think of two different ways which serve different purposes.

On the application level

Add a new container format which is about compression which embeds the original one. There's several ways doing that, one could be by just adding a new format for any compression method, the format implementation would then extract the data and you could access the underlying one. Hashes would then be the one of the compressed data.

Use cases for doing it this way are when you are rather collecting data (e.g. with sensors) and don't look at them too often.

Deep inside IPLD/IPFS

Compression could also happen on a layer below IPLD. Then the hashes would be based on the actual (uncompressed) content. The idea had is that you would implement it on two layers, the storage and the transport.

The storage layer (the repo) would be enabled with compression and just compress all the data that comes in. When you retrieve the data, it would be uncompressed by default. So things would work as they do today. Though there could be also an option to return the compressed data, which could then be passed on the transportation layer

The transportation layer takes an uncompressed block by default, compresses it for the transport and uncompressed it again on the other side. Though there could be an option that it takes the compressed data directly of the storage layer and transmit that data. Then it would only need to be de-compressed on the receiving side to verify the hashes match.

Link information out of date

There are a few examples of link objects in ipld/README.md with an /ipfs/ prefix, but @Kubuxu informed me there shouldn't be.

IRC Logs

[2017-05-05 17:55:47] <joelburget> Having a little bit of trouble with IPLD. I must be doing something wrong here: https://gist.github.com/joelburget/09d4865ef3093465d7eb6dfa78eb9553
[2017-05-05 17:55:48] → fiatjaf joined ([email protected])
[2017-05-05 17:56:56] <joelburget> Is `ipfs dag` the right command? Should I be using `ipfs object` instead?
[2017-05-05 17:58:10] → mahloun joined (~mahloun@2a01:e34:ef8d:2940:f2d5:bfff:fe1c:4cdd)
[2017-05-05 18:00:34] → strykerkkd joined ([email protected])
[2017-05-05 18:02:04] → bwerthmann joined ([email protected])
[2017-05-05 18:02:22] <Kubuxu> joelburget: drop the /ipfs/ refix
[2017-05-05 18:02:58] <joelburget> https://www.irccloud.com/pastebin/YLMtlo0E/
[2017-05-05 18:03:01] <joelburget> that worked
[2017-05-05 18:03:06] <joelburget> thx Kubuxu
[2017-05-05 18:03:40] ⇐ caiogondim quit ([email protected]): Quit: caiogondim
[2017-05-05 18:03:58] <joelburget> are these docs out of date then? https://github.com/ipld/specs/tree/master/ipld (Object with a link at foo/baz)
[2017-05-05 18:05:13] → caiogondim joined ([email protected])
[2017-05-05 18:06:02] <Kubuxu> looks like it 
[2017-05-05 18:06:12] <Kubuxu> want to create issue about this in that repo?
[2017-05-05 18:06:35] <joelburget> yep creating now

Typo in README.md

"transform" misspelled In string:

spec about the language to trasform an IPLD graph into another

Selectors: Use cases (from Q3 Workshop)

Use cases:

Getting a root node
Path of single nodes (chain from root)
Look up in a trie (sharding, route planning)
Ranges on an array (section out of a video)
Predecessor/Successor Query (binary search)
Merkle B-Tree (select a key, peergos)
Prefetch on a VM/Container
Version range selection

Properties:

Provable/Authenticated
No assumption about data store
Graph is immutable
(coubl be) URI-friendly

cc @jbenet, @mikolalysenko, @dignifiedquire, @daviddias, @chriscool, @John-Steidley

Spec refining: make sure that an attribute cannot be named `.`

And like in JSON Path it is escaped

IPLD v2: Laying out Future plans

I think it would be great if we could have a place to discuss future plans for IPLD. I propose we use this issue.

Specifically, I have seen transformations, programmable data & light verifiable proofs mentioned.

For example:

What are or could be important about proof-of-membership, proof-of-non-membership, proof-of-consistency?
How could we use transformations to do sharding, or computing on IPLD objects?
What other communities should we look at, to emulate going forward? (Such as distributed systems community mentioned, ie. Chord and Spark).

If there is a better way to phrase this issue, I am all-ears. Might be good to split it up into a few issues.

Link same file from diffrent block trees as solid big raw block.

GT:

Problem:
IPLD allows you to divide the file into blocks of different sizes and bind them differently. As a result, for the same file there can be a lot of variants of trees and, accordingly, hashes.

Decision:
If the file is divided into raw blocks. Into the root block, the hash of the entire file is written as if it were one large raw block. Accordingly, this hash is additionally announced to the network.

The client receiving the root block additionally searches for the source of the entire file by its hash and checks it by hash of the parts in the root block.

If the file has been fully loaded then it is rechecked by its RawLink. If RawLink does not match then the correct RawLink is written to the root block.

Original [RU]:

Проблема:
IPLD позволяет делить файл на блоки разного размера и по разному связывать их. В итоге для одного и того же файла может быть множество вариантов деревьев и соответственно хешей.

Решение:
Если файл делиться на сырые блоки в корневой блок записывается хеш всего файла как если бы это был один большой сырой блок. Соответственно этот хеш дополнительно анонсируется в сеть.

Клиент получив корневой блок дополнительно ищет источники файла целиком по его хешу и проверяет его по хешам частей в корневом блоке.

Если файл был полностью загружен то перепроверяется его RawLink. Если RawLink не совпадает то в корневой блок записывается правильный RawLink.

protocol buffers format

message PBLink {
    optional bytes  Hash = 1;
    optional string Name = 2;
    optional uint64 Tsize = 3;
}

message PBNode {
    repeated PBLink Links = 2;
    optional PBLink RawLink = 3;
    optional bytes  Data = 1;
}

GT:

RawLink is CIDv1 the whole file as a raw block. It is used to find additional data sources for the file.

Original [RU]:

RawLink это CIDv1 всего файла как сырого блока. Он используется для поиска дополнительных источников данных файла.

GT:

Example:

The first participant publishes the file by selecting the block size of 131072 bytes. And gets a CIDv0 of the root block: QmAAAA...AAAA

Original [RU]:
Пример:

Первый участник публикует файл выбрав размер блока в 131072 байт. И получает CIDv0 корневого блока: QmAAAA...AAAA

{
  "Links": [
    {
      "Name": "",
      "Hash": "zb2rhA2A2A2...A2A2A2",
      "Size": 131072
    },
    {
      "Name": "",
      "Hash": "zb2rhB2B2B2...B2B2B2",
      "Size": 131072
    },
    {
      "Name": "",
      "Hash": "zb2rhC2C2C2...C2C2C2",
      "Size": 131072
    },
    {
      "Name": "",
      "Hash": "zb2rhD2D2D2...D2D2D2",
      "Size": 131072
    },	
    {
      "Name": "",
      "Hash": "zb2rhC1C1C1...C1C1C1C1",
      "Size": 59029
    }
  ],
  "RawLink":{
    "Name": "",
    "Hash": "zb2rhR0R0R0R0...R0R0R0R0",
    "Size": 583317
  },
  "Data": "\b\u0002\u0018\ufffd\u0343\f ... \ufffd\ufffd\u0003"
}

GT:

The second participant publishes the file by selecting the block size of 262144 bytes. And got a CIDv0 of the root block: QmBBBB...BBBB

Original [RU]:

Второй участник публикует файл выбрав размер блока 262144 байт. И получил CIDv0 корневого блока: QmBBBB...BBBB

{
  "Links": [
    {
      "Name": "",
      "Hash": "zb2rhA1A1...A1A1",
      "Size": 262144
    },
    {
      "Name": "",
      "Hash": "zb2rhB1B1B1...B1B1B1",
      "Size": 262144
    },
    {
      "Name": "",
      "Hash": "zb2rhC1C1C1...C1C1C1C1",
      "Size": 59029
    }
  ],
  "RawLink":{
    "Name": "",
    "Hash": "zb2rhR0R0R0R0...R0R0R0R0",
    "Size": 583317
  },
  "Data": "\b\u0002\u0018\ufffd\u0343\f ... \ufffd\ufffd\u0003"
}

GT:

At both one and the same file which has a CIDv1 zb2rhR0R0R0R0...R0R0R0R0. This CIDv1 is written in RawLink.

The third participant received the QmAAAA...AAAA block and additionally searches the network for the sources of the block zb2rhR0R0R0R0...R0R0R0R0.

He finds the second participant by the CIDv1 zb2rhR0R0R0R0...R0R0R0R0 and asks him for the parts of the block zb2rhR0R0R0R0...R0R0R0R0 which are checked with hashes(CIDv1 Links) in the block QmAAAA ... AAAA.

Original [RU]:

У обоих один и тотже файл который имеет хеш zb2rhR0R0R0R0...R0R0R0R0. Этот хеш записан в RawLink.

Третий участник получил блок QmAAAA...AAAA и дополнительно ищет в сети источники блока zb2rhR0R0R0R0...R0R0R0R0.

Он находит второго участника по хешу zb2rhR0R0R0R0...R0R0R0R0 и запрашивает у него части блока zb2rhR0R0R0R0...R0R0R0R0 которые проверяет хешами(CIDv1 Links) в блоке QmAAAA...AAAA.

Spec out DagPB path resolution

Currently, we're playing it fast-and-loose with DagPB path resolution. For context, the structure of DagPB IPLD objects is currently defined to be:

{
  "Data": "binary data",
  "Links": [
      {
        "Name": "thing", // may not be unique, may be omitted
        "Tsize": 1234, // may be omitted
        "Hash": "QmId"
      }
  ]
}

Given this structure, the correct path to QmId would be /Links/0/Hash. However, this isn't very usable.

js-ipfs actually supports this pathing scheme. However, it also supports pathing by name (/Links/thing/Hash). While nice and usable, this is technically transforming the object.
go-ipfs uses the pathing scheme /thing. That is, it treats the object as if it were a single map of names to CIDs (for the purposes of pathing). This is obviously problematic from an IPLD standpoint as many of the fields aren't addressable.

So, we need a scheme that's both consistent between implementations and consistent with other IPLD formats.

Given that we're basically the only real consumer of this format, I believe we have a bit of flexibility.

By the way, @diasdavid, your code organization is 💯. ^{(despite the fact that you're using JavaScript)}

IPLD Crete Resolutions

@Stebalien, @dignifiedquire and I had a set of conversations on:

IPLD types
IPLD changes in the spec ({'/': '...', type:''})
IPLD selector

@Stebalien can you write down the notes that we took?
I don't have the notes with me, I think they are on your notebook

ipld / specs Goto Github PK

specs's Introduction

IPLD Specifications

IPLD Blocks

IPLD Codecs

The IPLD Data Model

Schemas and Advanced Data Layouts

Specification document status

Design documentation & Library recommendations

Contributing & Discussion

Governance

Glossary

IPLD Team

License

specs's People

Contributors

Stargazers

Watchers

Forkers

specs's Issues

31-6 Aug

What is happening

Where is the help needed

Thank-yous

Mission

Why

Mission, refined

Progress

Proposal 1: assumption on the name spaces

Proposal 2: explicit mutable links

Prefixing things with merkle

Prefixing things with IPLD

Solution 1:

Solution 2:

Other

Current state: Eth-block as a data format

Proposed: Eth-block as an application

Parallelism with unixfs

unixfs as a data format

unixfs as an application

Eth-block as an application

End of the story

IPLD Selector Plan

Design goals

Use cases

Practical examples

Get the name of all of my friends

Get every block in a sharded data structure

Get blockchain's entire block headers (@whyrusleeping need)

Existing work

Temporary conclusion

On the application level

Deep inside IPLD/IPFS

IRC Logs

Recommend Projects

Recommend Topics

Recommend Org