ipld / ipld Goto Github PK

View Code? Open in Web Editor NEW

1.3K 86.0 165.0 26.91 MB

InterPlanetary Linked Data

Home Page: https://ipld.io

License: Other

JavaScript 31.31% Nunjucks 33.88% CSS 24.13% Makefile 0.73% Go 9.95%

ipld's Introduction

IPLD

Welcome!

This repository is the entrypoint to the IPLD project: documentation, specifications, the website, and a great deal of the design work all live here.

**The website published from this repo is online at https://ipld.io/ ** -- you'll probably wish to read it there, rather than see the source form here, unless you're aiming to contribute patches.

IPLD stands for "InterPlanetary Linked Data, and is a series of standards and formats for describing data in a content-addressing-emphatic way. The people who work on IPLD do so because we want a world where it's easy to build decentralized, distributed, and inter-operable applications, and we believe robust data formats and a clear story for content-addressing them is a key piece of leverage towards that goal.

Finding Us

For chats with the developers and the community: Join us in any of these (bridged) locations:
- On Discord: join the IPLD community on IPFS Discord.
- On Matrix: #ipld:ipfs.io
On Github:
- Check out all our repos in the https://github.com/ipld/ organization.
- Github issues can be used for discussing designs, documenting user needs, and submitting bug reports.
- Git patches and Github pull requests are welcome! (Although discussing changes via issues or one of the chat venues above first is highly recommended.)

The IPLD project has a Code of Conduct (which is shared with the IPFS project). Collaborators, contributors, and any participants in community spaces are expected to be able to abide by this code.

Docs Development

With Node.js>=16 installed:

Setup: npm install
Build: npm run build
Serve locally: npm run start
Test link integrity: npm test
Cleanup: npm run clean
Review: open a pull request and tag @ipld/reviewers
Publish: Merge to master and Fleek will do the rest

License

SPDX-License-Identifier: Apache-2.0 OR MIT

ipld's People

Contributors

Stargazers

Watchers

Forkers

quaelin jonnycrunch leosa83 tachyon5 yghazi flyingzumwalt crystalsilver cw-ozaki cloud-robotics sbepstein patmaddox technologiespro light-chain ianopolous sshyran mr0grog mikeal kaliwdsn bigdot123456 thelady alvarlaigna sighttviewliu luceas khoazany kjzz baack jbelke farhad312 nazosnare hnhnlyly kustomzone resslerruntime qmutz goopm jayd2446 terichadbourne jacekdaa patrickdlg drags007 go7th bsmoove247 socioprophet switch-media kevleyski michaelsena bxlkm1 anthonymark33 kkc90 puppup420247-org terrorizer1980 hudsonkavin1 redyoung86 othello1111 marten-seemann alfredolopez80 westcope isabella232 erdal-pb stephenplusplus olizilla vulcanize noif i-norden dendisuhubdy adamlaska 0racl3z cryptonightcanada adlrocha robert-mcdowell baby636 kevincox safly-hub 1crazymoney znmead ceramicnetwork brendalee yourkeychen isrox cim-labs yinzc prashant-shahi eteryko iznd listenaddress suud tabcat johnnymatthews placer14 modernportable gmelodie brycewetzel jacquelinevv0693 calmoon808 lekovc54 walkerlj0 ericevenchick noryev galaxyxone gacwr maxblinder-godindustry

ipld's Issues

Thinking of creating a new implementation?

We should add a section to the README about: "Are you thinking of creating a new implementation?" We should:

Congratulate users on their smarts and brilliance
Ask them to open an issue here to coördinate efforts
Ask them to create a repository in their name, and then transfer it here (or create it here directly)
Link it in the README once started
Add it to the Website.

Add implementations list

There should be a list of current and planned implementations. See ipld/website#16.

Will open a PR when that issue is resolved.

depends on ipld/website#16

Probable use case of IPLD

Hi i have been exploring ipld for a project integration so that i could somehow be able to find if ipld can be used to track the pinned data as to who has pinned it and for how long and related relevant info.(this is for some kind of monetization purposes on your(protocol labs)another platform

[use case] Supporting Nori's IPLD

tl;dr: After discussing some of this with @diasdavid and @vmx, they suggested we post an issue here for further conversation. I'm hoping to get some feedback/guidance on our approach in the comments! Thanks in advance to anyone who can help us out 😃

Nori's IPLD Use Case

High level

We are creating a commodity token called a "Carbon Removal Credit" (CRC) on the
ethereum blockchain which represents a certain amount of carbon having been
removed. The data collected during the removal process serves as proof of
removal and so each CRC will be linked to the data that proves it's
legit. Storing all that data in the Ethereum blockchain is cost prohibitive, so
IPFS, IPLD, and eventually FileCoin, become interesting mechanisms for
decentralizing access to that public data.

Example Data

The data we wish to distribute can be represented as a graph of immutable
structured nodes, and binary file data (images and such). As a simple example,
let's say we have various node types included project, plot, crc, data,
validation. Here are examples of what some of those might look like.

project node

This is a node representing a project to remove carbon dioxide from the
atmosphere. We'll call it project-1.

{
  "node": {
    "name": "Paul's Farm",
    "location": {
      "city":"Seattle",
      "state": "WA",
    },
    "createdOn": "2018-01-14",
    "url": "http://paulsfarm.com"
  },
  "edges": {
    "owner": ["user-1"]
  }
}

land plot node

This is a node representing a plot of land belonging to the project owner. We'll
call it plot-1,

{
  "node": {
    "name": "East Plot",
    "area": [
      {"lat": 45.2342, "long": -122.4342"},
      {"lat": 45.3342, "long": -122.5342"},
      {"lat": 45.3342, "long": -122.5342"},
      {"lat": 45.2342, "long": -122.4342"},
    ],
  },
  "edges": {
    "project": ["project-1"],
    "createdBy": ["user-1"],
    "historicalData": ["data-1", "data-2", "data-3", "data-4"]
  }
}

CRC Data node

This is a node with all the info, or links to the info, needed to validate a
certain amount of carbon dioxide having been removed.

{
  "node": {
    "generationDate": "2018-02-14",
    "carbonRemoved": 1.23,
    "removalMethod": "soil",
    "measurements": {
      "complicated": "science stuff",
    }
  },
  "edges": {
    "validationRecords": ["validation-1", "validation-2"],
    "project": ["project-1"],
    "createdBy": ["user-1"],
    "plots": ["plot-1", "plot-2"]
  }
}

How the data will be used

Linked from non-fungible ethereum token

We'll link each newly minted CRC token (just a struct in ethereum) to one of
the CRC data nodes, for example: crc-1. A CID is exactly what we need for this.
Fetched from client-side javascript code

Browser based ethereum clients (metamask/web3) will need to look up the
crc-1 data by the identifier stored with the ethereum token, so it can
display that data on a webpage. It will probably need to fetch many nodes
linked to from the crc-1, such as project-1, plot-1, and user-1 at
the same time and in an efficient way.
Queried with a wide range of query parameters from client-side javascript

Some example queries include:
- The 100 most recently created crc nodes linked to project nodes with
  state="WA".
- The sum of all the carbonRemoved fields of all crc nodes linked to
  project nodes with state="WA" where generationDate is in the year
  2017

Scaling

Near term: 10k-99k nodes
Long term: billions of nodes

How we are doing this now

Centralized cloud database solutions solve all of the above problems except for
the "decentralized" part. So we're just using Google's Cloud DataStore for
everything, but trying to do it in a way that is "compatible" with decentralized
technologies like IPFS:

We write individual nodes to Cloud DataStore (basically mongoDB).
We serialize the same nodes using ipld-dag-cbor and use the CID as the
primary key when writing to Cloud DataStore.
We also store the dag-cbor serialized data in a special attribute on the
Cloud DataStore object.
We store binary data (images and such) into Google's Cloud Storage service
(same as s3) also keyed off the CID of that data.
We make the data stored in Cloud DataStore available to client-side
javascript via a GraphQL API.

This allows us to scale to at least hundreds of millions of nodes (assuming
Cloud DataStore actually works), perform very fast lookups and queries,
easily backup the data into cold storage, and toss it into data warehouses for
more complex analysis.

But of course people will have to rely on us continuing to pay the bills and
operate this centralized service in order to maintain access to this data. And
they will have to trust us to deliver accurate query responses. So we want to
distribute the data and the query indexes to as many people as possible as a way
to secure the data "forever". We could distribute the data by providing links to
gigantic petabyte database backup files which people can theoretically download
and do something with, but IPFS, IPLD, and FileCoin sound like better options :)

Pathway to decentralization

Ultimately we want people to be able to use the platform we are building without
having to go through us. While that is probably quite a ways off, here are the
steps we could take in that direction:

Make the IPLD encoded data in our database available through a public API
(graphql/rest/whatever) hosted on our servers.
- Still centralized, but the data is public. CIDs ensure that the
  data we are serving is accurate, so we can't cheat.
Mirror/pin the IPLD encoded data in our database to a cluster of IPFS nodes that
we host.
- Still centralized, but now it is "easy" for others to pin the data on their
  own ipfs nodes if they want, and make it available to everybody else
  through standard ipfs APIs. Now there is an alternative to our public
  graphql API.
- If our IPFS cluster goes down, we can repopulate it from the database.
- Would cost us a lot of extra time and money to manage an IPFS cluster on
  our own. Doable, but not fun until someone comes along with IPFS
  cluster-as-a-service. Then it would just cost us money, but not time.
- Maybe we can write our own IPFS resolver that points to google cloud datastore directly?
Switch our own client-side javascript over to IPFS (instead of our public
API) for fetching IPLD nodes.
- If public IPFS latency stays the same as it is today, our code would have
  to fall back to our own public graphql API when IPFS doesn't respond fast
  enough.
Release an open source javascript library for interacting with our data over
IPFS.
- Theoretically decentralized, assuming other people choose to run their own
  IPFS based mirrors.
- Now other folks can theoretically build things that use data from our
  platform without relying on our graphql API being available.
Release an open source server application that facilitates the mirroring of
subsets of CRC data.
- Presumably the people who own CRCs or supply CRCs would be incentivized to
  mirror at least the data associated with CRCs that they own or supplied.
- Maybe this is enough to be considered fully decentralized, except you still
  don't have decentralized guarantees of data permanence.
Use transaction fees or some other decentralized funding mechanism to pay for
FileCoin to store this data "permanently".
- Probably about as close as you can get to guaranteed decentrailized
  permanence and availability. Sounds like there are still a lot of hard hard
  problems to solve before this could be a reality.

These steps don't really include anything about decentralized queryability since
I don't really have a good idea of how that would work in practice. But I sure
would like it to happen!

Also, each one of these steps could trigger very long discussions about the various
ways to implement each step. We would certainly like solutions that are low
cost, low latency, high performance, etc. etc. and will look to the community
for guidance on achieving that.

Specify valid ipld path character set

Can i use characters like ":" and "#"?
Can I use unicode?
Can I use RTL languages?
etc etc

please not that this must be compatible with ipld-query language design

Having link objects store summarized information

I copied this example from the older ipfs/ipld repository because I think it illustrates the point I'd like to make pretty well.

Using this JSON, and path access, as an example:

{people: [{name: "Nicola"}, {name: "Juan"}]}

/people
>  [{name: "Nicola"}, {name: "Juan"}]}

/people/0
> {name: "Nicola"}

/people/0/name
> "Nicola"

The only limitation is how to follow/resolve links..

{people: [{@mlink: "hash!?"}, ..]}

I'm for avoiding complete link resolution until the boundary is explicitly crossed.

However, we can include summary information on the object about what's beyond the link.

{people: [{name: "Nicola", @mlink: "hash!?"}, ..]}

/people/0/name
> "Nicola"

If the linked db was a sub-directory entry; the requester is most likely interested in aggregate statistics like the number of files; the total disk space consumed by the directory's contents; the total disk space consumed by the entire subtree; maybe even a summarized collection of thumbnail images; etc.

Given the typical case for following the link is basic summarized information; provide an IPLD database a way to describe/publish a sibling "@summary" database that can be sucked into upstream link objects. This summary db would include a "sequence" number of the original db which also gets sucked in.

Some implementation notes:

the link object has a "sequence" tracker on it so it knows what version of the linked database it was updated from; this can be compared to the current version of the linked database to get an update.
As mentioned the "summary" db is an individually queryable database designed for quick retrieval
This doesn't have to apply to just links; long arrays inside a db might be better served if conditionally provided using a summary first model, the link would point to a child path within its own database ID.

Unixfs v2 discussion?

There's a discussion noted in the readme but the link shows a 404. This is either because the link is incorrect or the repo is still private. Is there a reason this discussion is still private?

Handling unknown codecs

One thing I've noticed as I'm developing dag-json is that, across the board, the IPFS stack does not handle new or unknown codecs.

During development you basically have to just ignore testing or implementation any CID's because they will throw during base encoding.
There's no way to mutate the codec varint registry used by the cids module.
Unknown codec hex values cause exceptions all over the place. We need a designation for "unknown" anywhere that we parse a CID. When we don't understand the codec we should just treat it like a raw node. As it stands, you can't even store the node in the IPFS block store.

This is an immediate issue during development of new codecs but I see this becoming a larger issue down the road as older versions of IPFS are asked to handle new nodes they don't have codecs for.

How can ProtoSchool best meet the needs of the IPLD team?

As we build the roadmap for ProtoSchool, we'd like to take into account the priorities of the IPLD team and plan for some tutorial content that best highlights your most common or most prioritized use cases or features.

For a sense of what ProtoSchool is capable of, please take a look at the existing tutorials, which run in-browser and (with the exception of the first tutorial on the list) offer coding challenges following the introduction of various content step-by-step. Beginner-friendliness is a major priority for this project, and as we consider requests from project teams we will also focus on ensuring that appropriate scaffolding exists to get users to the point where they can successfully approach those proposed topics.

Could you please take a look at your project roadmap and help me understand what ProtoSchool tutorial content might most help you achieve your goals for 2019 and 2020?

Do you have upcoming events where you hope to offer workshops? If you could envision that content fitting with the ProtoSchool tutorial format, please be sure to include these ideas and share the relevant event dates.

cc @mikeal

dag-json VS dag-cbor

During the DWeb Summit, a very pertinent question was made: "Why the heck do we need to add the Object with the / for links when using dag-cbor, it makes it cumbersome to use IPLD".

Example:

Why:

{
  someLink: {
    '/': CID
  }
}

Instead of:

{
  someLink: CID
}

When doing a ipfs.dag.put(ipldNode, {format: 'dag-cbor'})

I want to make sure to address this in writing and through Github so that it becomes part of our record. It was coded this way due to some design decisions made at the time with carried some history/legacy IPLD.

The requirements for the dag-cbor API were:

(1) A user should be able to grab any JSON blob and do a dag.put with it
(2) A node that is retrieved with a dag.get should always yield the same CID if followed by a dag.put. This one might sound obvious but when you have to serialize things back and forth in JSON, there are some tricky cases.
(3) Any number of dag.get and dag.put round trips of the same node should always yield the same node and same CID.

One thing to add to the equation (and confusion) is that in JavaScript we are always dealing with JavaScript Objects (!== JSON) and in Golang we have golang Structs, which make it more easy to reason about as there won't be a dag-golang-struct format, while there is a dag-json (in theory).

Today, what the API is really doing is accepting dag-json and serializing it to dag-cbor. This dag-json is what comes often presented in the IPLD Spec and it is what the users expect to see. So a more correct API would be:

ipfs.dag.put(ipldNode, { inputFormat: 'dag-json', serializeFormat: 'dag-cbor'})

But this doesn't make it nicer to use. It does go on the path of something we need which is specifying the inputEnc and outputEnc so that we stop having problems on using the DAG API over HTTP. Please read issue -- https://github.com/ipfs/interface-ipfs-core/issues/81 -- for full context.

And so, the question was asked. Should we go through the path of a) making the API "smarter" and try to figure out object by object, which values are actual links or b) just make it less prettier but more explicit

const ipldNode = {
  someLink: CID1,
  someOtherLink: {
    '/': CID2
  }
}

ipfs.dag.put(ipldNode, {format: dag-cbor})

This would mean that the API would have to check each path and see which ones are links.

Important node, when doing the dag.get of this node, it would always return as

{
  someLink: {
    '/': CID1
  },
  someOtherLink: {
    '/': CID2
  }
}

As the API will never have a way to tell which structure the user passed it in. Breaking requirement number 3

Always do:

ipfs.dag.put(ipldNode, { inputFormat: 'dag-json', serializeFormat: 'dag-cbor'})

If the users want to put something that looks like:

{
  someLink: {
    '/': CID1
  },
}

question surrounding go-ipld deprecation notice

from Readme: "The Go implementation of IPLD (deprecated)"

Is there a new, un-deprecated repo we should be using instead? Do we need to start one? I might be able to assist w/ surrounding efforts.

developing merkledag in rust: https://github.com/renlulu/rust-merkledag

Encryption layer for IPLD

This is somewhat relates to #63 as it could be an alternative or one could be enabled by the other. At the moment with IPLD all the links are public even if content it links to isn't. However as I pointed out in #63 case could be made that one might want make conceal links and make them only available to selected participants (with whom corresponding keys were shared).

I think it is important to consider this in relation to GraphSync and IPLD Selectors as it would be a shame if peers participating in exchange that happen to have shared key for concealed links were required to do multiple round-trips for data exchange that would defeat the benefit of GraphSync.

Open an IPLD Frequently Asked Questions repository?

We've had a couple of questions that really ought to be logged, if we're going to answer the community and/or put an FAQ page on the website.

Why don't we open up ipld/faq? Thoughts?

IPLD data structure shape consistency

Background

As we’ve been spending more time with IPLD some “best practices” are emerging. One of them has been designing for what I would call “shape consistency.” By this, I mean that the shape of the data structure is consistent even if the block boundaries are not consistent.

One problem we had in unixfs-v2 was trying to design for several different and often competing use cases that would require us to vary how the blocks were constructed and linked together. Trying to do this at the spec level felt endless until we realized that we didn’t really need to do it this way. Instead, we designed the spec and the implementations to use path lookups for properties making the block boundary differences transparent. This allows applications to construct the blocks in whatever way is most performant for their use case.

As we’ve been going down the Selectors rabbit hole this practice is even more important. It appears that in our “IPLD Everywhere” future selectors will likely be the primary way we perform read operations on IPLD graphs. Designing things to be agnostic of the block boundary but with a consistent shape gives use the same benefits I outlined above in the unixfs-v2 example.

Advanced Data Structures (Collections, Indexes)

One question I’d like to raise is “Is varying the block boundary always enough flexibility to create the data structures we want? Could we actually impose a requirement on shape consistency?”

If the answer is “yes” then it makes a big difference in how we design dynamic resolution in more advanced data structures.

For instance, here’s two different interface paths we might take depending on whether or not the statement above is true.

Inconsistent Shape

When traversal hits the root node of a collection we would lookup the resolution function for that collection and:

Pass it the decoded root block and the query (either Key, or Range(start, end))
It would return either
- The value
- The next CID to lookup along with any continuation data it needs passed into the next call.
Each CID would be resolved and passed back into the function, along with the continuation data, until the value is reached.

Consistent Shape

When traversal hits the root node of a collection we would lookup the resolution function for that collection and:

Pass it the decoded root block and the query (either Key or Range(start, end))
It would return a selector, probably with recursive attribute matching, that would resolve to the value once traversed (or would end up not matching at some point, in which case the value does not actually exist in the collection).

As we start to implement string sorted collections we’re going to end up with btree-like structures, and the shape will vary quite a lot depend on the data in the tree. I’d like to figure out if, with an advanced enough selector engine, we could maintain consistent enough shapes that this would be enough.

It would make creating new data structures a lot easier, and much faster to resolve, but I’m a little worried that there are data structures we have not yet considered that would effectively be impossible to implement without a more dynamic resolution interface.

IPLD Roadmap 2016Q4

(Summary of conversations and planning)

Overall Outcomes

Ship IPLD, from libraries to documentation

IPLD support
- Usable libraries
- IPLD-cli support
- IPFS on IPLD
IPLD website
- Examples
- Tutorial
- Playground
- Logo
IPLD paper
IPLD and CID spec

Timeline

Nov 20

Finalize IPLD Paper ( @nicola ) (was Oct 24)
Website Iteration 1 ipld/website#14 ( @victorbjelkholm )
Write down CID spec (super-alpha!)

Dec 5

Write IPLD content (tutorial/example) for the website ipld/website#13 ( @RichardLitt, @nicola )

Dec 12

Ship logo for IPLD #5 ( ?, @flyingzumwalt, @dignifiedquire, @jesseclay ?)
Ship Website+Playground ( @victorbjelkholm )

Dec 19

Ship Spec for IPLD and CID ipld/specs#31 ipld/specs#32 ( @nicola )

Need timing

The following must happen by this quarter deadline (end of December), it would be great to correctly place on the timeline and link them to the current issues happening

Need timing:

Release IPLD libraries (@diasdavid)
Add IPLD-cli support (dag) https://github.com/ipfs/interface-ipfs-core/issues/81 (@diasdavid, @whyrusleeping ?)
Create IPLD namespace (@jbenet ?)
Complete IPFS on IPLD (@diasdavid, @lgierth ?)
Complete Playground (@dignifiedquire, @victorbjelkholm )

Designing the IPLD logo

Although we currently have one taken from @jbenet slides,
we want to have an official IPLD logo (a memorable one!)

I am not sure who should handle this, of course proposals welcome!

cc @jbenet, @dignifiedquire, @jesseclay

Libp2p and IPLD in TCP/IP stack

Hi
I want to implement mechanisms Decentralized on Internet (tcp/ip),
But I have a question?
Can I do it with better instructure on internet?
my goal use of instructure (libp2p-IPLD) in network stack,But I see that for routing packets in internet need to Add headings to packets for find the path to destination.
Are governments able to filter and restrict Internet users? (becuase for routing need to ip address in routers).

IPLD + Search

This issue summarizes what was discussed at the IPFS Developer Meetings 2018 in Berlin at a discussion seesion about IPLD + Search.

Please use this issues for all things related to IPLD + Search. Feel free to go crazy!

Use cases

I’d like to ask the attendees to post their uses cases as comment. Especially from @b5, @jonnycrunch and @bigs.

Protoype proposal

Build up an index that is separate from the DAG and local to the peer. For a prototype Noise will be used. As Noise only supports JSON, you need to have a JSON compatible representation of the object that is stored in IPLD. For js-ipld this is a call to resolve() with the root / as path.

Noise is an embedded search index written in Rust with a Node.js API. There is no proper C-API yet. It's WIP (see pipedown/noise#63), help/comments are appreciated. Until then, the easiest way is to use it through a HTTP wrapper. There is one for storing data (https://github.com/vmx/noise-dataload) and one for serving up the data (https://github.com/vmx/noise-indexserve). More details on how to use them can be found at my Exploring data with Noise blog post.

Next steps

Have people play with Noise
Get Noise C-API done
Write a Go binding to the C-API (as most people seem to use go-ipfs and not js-ipfs).

IPLD Use Cases and Call for Demos

IPLD is implemented, and has been deployed for a while, in both js-ipfs and go-ipfs. It's there - we haven't made an announcement about it, though. We are working on the website. Now, we need some good demos to nail down the interface for operating over IPLD

There is a lot of work to be done with regards to interface design, too. See ipfs-inactive/interface-js-ipfs-core#99 and https://github.com/ipfs/interface-ipfs-core/issues.

@jonnycrunch: You mentioned you're interested in exploring use-cases for identity management with W3C and cross-over with JSON-LD/Verifible Claims Task Force. Do you think that would be a good demo for IPLD?

Ideas: Build some demos, and PR them here or in ipld/examples or ipld/demos. Another idea would be to have a blog post.

If you know any good demos, discuss below. Or if you know anyone who would be interested in this thread, direct them here.

Clarify exactly what IPLD is.

Yesterday in Protocol Labs Slack, there was a discussion about people’s (including my own) lack of understanding about what IPLD is. @whyrusleeping, @aboodman, @lanzafame, and I think we hit on a clearer understanding, but I want to use this issue to make sure that understanding is right before submitting a bunch of PRs :)

@whyrusleeping’s working definition we started from is:

IPLD technically refers to really any content addressable data with links to other content addressable data.

…so there’s really no concrete format or structure representing “IPLD data” or an “IPLD node” — only a conceptual structure which is not meaningfully different from a merkle DAG/tree (that is, the generalized concept, not the [deprecated?] IPFS spec).

The way we understand it, IPLD is really software for querying/traversing across any merkle DAG.

Where IPFS can be reasonably described as principally a set of protocols for many machines to communicate with each other about a single distributed file system/information space, IPLD is principally a software program/library/set of algorithms that aims to map a query over any set of hash-linked objects, in any format, over any communications protocol (not just IPFS…?). That distinction is really important because most folks arrive at IPLD within the head-space of IPFS and are primed to be thinking protocols and formats.

A valid IPLD Node could really be anything here (or “any node in any merkle DAG/tree”), from an IPFS file object to a Git commit to an Ethereum transaction to an SSL Certificate Transparency log. IPLD is just a system of combining pluggable modules (parsers for each possible type of IPLD node) to resolve a path/selector/query※ across many linked nodes (of varying formats!).

But we don’t really just say all that anywhere. The above might not be exactly the right words, but that general explanation (specifically with examples of different types of nodes) is what we should always lead with before diving into formats, interfaces, protocols and other particulars.

Anyway! I would like to know if the above understanding is right. If so, I’ll draft up some PRs around that. /cc @jbenet @whyrusleeping @Stebalien @daviddias @aboodman @lanzafame

※ The spec uses the term “merkle path,” while in most discussions, people seem to be using “selector” (most common) and “query” (common, but not as much as selector). What’s the right term? (Do we know? If not, we should pick one.) (Edit: ignore this; I straightened myself out in the comments below.)

IPLD resolver composition

I have had some interesting conversation with Textile folks here textileio/go-textile#566 which led me to think of use case for IPLD resolver composition. For convenience outlining summary below:

Textile has notion of threads, for convenience it can be considered as a message feed (in practice it's multi user join feed but that's irrelevant here). It is pretty much structured as a linked list where head is IPLD link to the encrypted message, and tail is an IPLD link to the rest of the list.

Even though actual messages are encrypted, that does not conceal links to encrypted messages nor links to the rest of the feed which means adversary could potentially draw some conclusions. To prevent that another layer of encryption is required which would enable two kind of participation:

Invited replicators (cafes in textile) that can replicate messages by decrypting feed, but have no way to decrypt messages themselves.
Invited participants (friends) that can decrypt feed and messages it links to.

Problem today is that by encrypting {head, tail} block links are lost. Obviously you can overcome that by interpreting result as IPLD block but that has several drawbacks:

Ceremony involved in this.
Future GraphSync won't be able to cut through the encryption layer.
IPLD Selectors won't be able to see the links.

This lead me to think that this would have being trivial to address if IPLD resolvers supported composition similar to function composition to pipeline two resolvers kind of like ipld-crypto . ipld-dag-pb

Webassembly Interface Notes

Webassembly DAG interfaces

ipld/specs#35
ipld/specs#38

Overview

This note presents an initial Webassembly interface for generic Merkle DAGs. There are two interfaces proposed here.

The translator (resolver) API which allows the wasm VM to parse links from binary data.
The Selector API allow for traversal of the graph and selecting of a subgraph

These interfaces are low level and should allow for upper layers to be implemented on top of them.

glossary

Vertex: a map of names to Edges
Edge: Contains data and/or a link
Link: a merkle link

Rational

It would be conducive to implement Selectors and Transforms in webassembly so that

they can be loaded at runtime reducing the bootstrap binary size
enable universal implementations of Selectors and Transforms
reduces the trusted computing base
present an deterministic way to limit resource usage based on metering

Challenges

Its still the early days of Webassembly and Webassembly VM usage outside the browser is still largely uncharted territory
Language support is still limited to C/C++/rust although potential anything that llvm can compile can be used in a wasm VM

API Overview

Data types

We define the following Webassembly data types:

i32: same as i32 in Webassembly
i32ptr: same as i32 in Webassembly, but treated as a pointer to a Webassembly memory offset
i32ref: same as i32 in Webassembly, but treated as an opaque reference and should be replaced with an opaque reference after it has been specified in Webassembly
i64: same as i64 in WebAssembly

Tables

A table named 'callback' must be exported if any callbacks are used. All callbacks functions have a parameter of a single i32 which will contain the error code of the orginal operation. If there where no errors then the return value will be 0 other wise it will be 1. All operation that involve state look ups are asynchronous

Translator API

A translators maps some given data into method that can be consumed by the underlining IPFS implementations.

the Webassembly binary MUST export the following methods to be compatible DAG service. Like wise the host environment must also provided the following API to wasm binary that are intended to be translators. The translator's imports MUST use the namespace "translator".

createVertex

Creates a mutable vertex reference

Returns
vertexRefence i32ref

createEdge

Given a link and some metadata this creates an edge.

Parameters

link
metaDataOffset i32ptr
metaDatalength i32

Returns
edgeReference i32ref

addEdge

Adds an edge to a vertex given an edge reference and a label

Parameters

linkRef
edgeRef
labelOffset
lableLength

Selector API

Selectors traverses the graph and selects some subset of vertices. The Selector's imports MUST use the namespace "selector".

select

Adds a vertex reference to the array of reference to be returned

parameters

vertexReference i32ref

root

the root vertex of the DAG we are operating on

Returns

result i32ref an opaque reference to the root vertex

resolve

Given an edge reference returns a vertex reference

Parameters

link i32ref a reference to a merkle link
callBackIndex i32 an index of the callback function

Callback Signature

error i32 reserved for error code
vertex i32ref a reference to the resolved vertex

getEdge

Gets an edge reference given the edge name

Parameters

namePtr i32prt
length i32

return

edgeReferance i32ref

getEdgeDataLength

Gets the metadata attached from to an edge

Parameters

edgeReference i32ref an opaque reference to the edge

return

length i32 the length of the an edge

getEdgeData

Gets the data from an edge and writes it to memory

Parameters

edgeReference i32ref an opaque reference to the edge
writeOffset i32ptr the memory location to write the data

getLink

Unwrap a link from an edge

returns

link i32fer

isNullLink

checks if a link is null or not
Parameters

linkReference i32ref an opaque reference to a link

returns

isNull i32

Iterator API

This enables iteration of a vertex's edges

edges

Return

edgeIterator i32ref a reference to an edge iterator

Advances the iterator

Parameters

edgeItr i32ref

getEdgeName

Parameters

edgeItr i32ref
writeOffset i32ptr

getEdgeNameLength

Parameters

edgeItr i32ref

Return

length i32

Question regarding the dag-cbor encoding of objects (trying to reproduce in Python)

Hi,

I am trying to re-implement the CID encoding in Python and I am having trouble reproducing the results given by ipfs dag put.

My understanding is that ipfs dag put does the following steps (assuming default choices for hash and format):

Start with a cbor-encodable object ob (read by default in JSON format)
Get the CBOR representation.
Compute the SHA256 of those bytes.
Create a multihash (code=18).
Create a CID, with parameters version = 1, encoding = dag-cbor, and the multihash obtained above.

My translation of the above as Python code is as follows.
Here I use the libraries multihash and py-cid.

def get_cbor_dag_hash(ob):
    """ Returns a base58 string of a CID pointing to ob expressed in CBOR format"""
    import cbor2
    import hashlib
    import multihash
    from cid.cid import make_cid
    ob_cbor = cbor2.dumps(ob)
    ob_cbor_hash = hashlib.sha256(ob_cbor).digest()
    mh = multihash.encode(digest=ob_cbor_hash, code=18)
    cid = make_cid(1, 'dag-cbor', mh)
    return cid.encode().decode('ascii')

With this code I get the expected result for the object {} (empty dictionary), which codes to zdpuAyTBnYSugBZhqJuLsNpzjmAjSmxDqBbtAqXMtsvxiN2v3.

However I get some divergence on some other test vectors:

object = 42
ipfs dag put -> zdpuAvVUB18y6k8jZTZUHSq4FctExZm4Mxkt3xEyYW8ptQqru
my code -> zdpuAu1CyNiDwRXh6uSwRYdAHZjuhrC7pbQi4jvQHuNTPWxAe

object = {"a": 1}
ipfs dag put -> zdpuAxTWBh3d49ZCgPP44BT8AAa9qiqxB167yZCFdxHxVg6gN
my code -> zdpuB2H7FsgxU1PVuGcwk4TDYGgv7xrQ7naFJ1YHfRV71t7Rf

I am particularly puzzled because my code agrees with ipfs dag put on one instance - so I know that most of the code is right, and probably I only miss some detail about how objects are encoded.

Could you provide some hint on what I misunderstand about the format?

thanks!

Provide instructions for how to load IPLD content onto IPFS

Note: this is based on the spec in https://github.com/ipld/specs/tree/master/ipld

Provide instructions (which I think already exist somewhere) showing how to do the following (ideally from the command line)

Given IPLD json like:

{"article1": {
     “author”:  {"/": “/ipfs/QmYLXcLBTmUJdyocyiRBJT235Wot7AAfYgu974YqCQXUgb”},
     “publisher”:  {"/":  “/ipfs/QmRhVkWJo2vqn6AhdP3JBLsJow8aM7nS9RnUavgojspXXT"}
  }
}

I want to resolve hash-referenced content using paths like /ipld/Qm.../article1/publisher

Completion State

These instructions should answer:

How do I add the info to IPFS so that it can be resolved as IPLD
How do I use IPFS/IPLD to resolve the links?

IPLD Messaging Exercise

After some good early feedback I'm posting some messaging basics for IPLD.

These aren't meant to provoke some conversations about how to best describe IPLD in more accessible language and possibly to start to define a clearer scope.

This exercise should provide some go-to language for how we describe the value of IPLD rather than just describe what it is technically.

Tagline/Headline

Taglines catch someone’s interest. They are often not a full or accurate description and are designed to be fun and easy to remember.

Welcome to the Internet of Data-Structures.

Slogan

The slogan should sum up the project in as few words as possible. It does not need to be complete but it does need to accurate.

Decentralized Data-Structures.

Positioning Statements (25 words or less)

Positioning statements are short statements that describe the project to a specific audience.

Blockchain / Dapp Community
- IPLD allows you to create content-addressable data-structures off-chain that can be referenced by a single hash-based identifier.
Web Developers
- Decentralized data-structures are the next step in web applications being able to easily handle offline, edge, and p2p use cases.
Decentralized Web Community
- As we go beyond simply delivering files on the decentralized web we need to re-create the flexibility and network effects of URLs and links for the data that is currently behind applications in the cloud to create a web of data-structures.

Description (25 words or less)

IPLD is a set of standards and implementations for creating decentralized data-structures that are universally addressable and linkable. These structures will allow us to do for data what URLs and links did for HTML web pages.

Move go-ipld implementations here

As far as I know (thanks @diasdavid), these are the go-ipld implementations:

https://github.com/ipfs/go-ipld-btc
https://github.com/ipfs/go-ipld-zcash
https://github.com/ipfs/go-ipld-cbor
https://github.com/ipfs/go-ipld-git
https://github.com/ipfs/go-ipld-node

They should be moved to this organization, I think. cc @whyrusleeping

IPLD selector based replication wire protocol

I'm going to try and separate what IPLD is doing for this already and what we would need support from IPFS and/or libp2p for.

There's a draft spec for "IPLD Selectors." The IPLD team has updated their OKR's to prioritize this selector work this quarter. Essentially, in JS and in Go we will have:

A Selector parser that produces AST for a given selector.
A library that, given a Block retrieval method and a Selector AST, will produce all the blocks matching that Selector.

What is left for IPFS and/or libp2p to help with implementing is the wire protocol. It will need to:

Accept a Selector AST and send all the matching blocks to the requester.
Inform a requester of the Selector grammar it supports. This is so that, as we add new grammar, the network replicator can create selectors supported by different endpoints.

In our last discussion there was not a clear consensus on how the wire protocol should be implemented. However, given that the heavy lifting of the parser and dag traversal are separated out, replacing this wire protocol in the future shouldn't be too difficult. IMO, we should go with whatever the people implementing this think is the fastest path to completion this quarter.

Implementation of IPLD in Python

Hey guys,

I am interested in helping to implement the IPLD spec in Python. It will be great to have some pointers about where to start and if there is any ongoing effort already in place.

Thanks.

following up from ipfs-shipyard/py-ipfs#1 (comment)

Formats and Types in IPLD

this might be a bit bikeshedding, but I hope to reduce it to the minimum

I once argued that eth-block is not a format, but a specific transformation. Then, I understood why you are still considering it a format. So, I went through some thinking and here I have a proposal.

I consider formats to be used for "serializations/deserialization" of some abstract representations: CBOR, JSON, YAML can all be different format of some abstract data model.

For example dag-cbor is a valid format since any IPLD object can be transformed into a dag-cbor. Differently however, not any IPLD object can be transformed into a eth-block.

So what I propose here is to still address them in the same way you decided, but instead of calling things like eth-block "formats", I would call them "types" or "structs".

So that when picking what "format" to use, you don't have problem in choosing instead a "type".
Note: this is just a name fix that solves mine and other people understanding of this.

Max node size limitations

First, I'd like to try and catalog/document the world we have today, then talk about what I think we should move towards.

Today

There is no size limitation in any IPLD spec. AFAIK, there's not even a recommendation.
The implementation of dag-cbor in JS (does Go have a limit?) has a hard limit at something like 500K with the current defaults.
- The only way to know if you've hit this limit is to attempt to create a node that is too big and catch the throw.
Bitswap has a 2MB limit. (what happens when you have a node larger than this limit?)

I've now seen several issues where these limits are thrown around and selectively enforced. While we have not documented a hard size limit the current limits in our implementations are used as an excuse not to fix limitations elsewhere. There are a lot of reasons why we want to keep nodes small and many performance issues we can hit if node's are too large. However, there's no consistent limit on all nodes and we already know that, at some level, we'll have to support arbitrary sizes to support git.

More importantly, the developer impact of these size limitations are quite punitive. There's no way to know how large a node will be once it is serialized until you serialize it. If a developer wants to implement sharding once a node gets to a particular size they have no way of predicting when the node hits that limitation. We often throw around "use sharing" or "use hamt" as a solution to this problem but there just isn't a good way to predict when this is necessary based on the size of the serialized node. It's totally reasonable to tell developers that "once you have 1K keys you should be sharding" but it's actually not reasonable to say "once the serialized CBOR representation is over 500K" because that means they'll always have to wrap serialization in a try/catch and they'll always be attempting to serialize gigantic nodes in order to figure this out.

Even worse, this creates an incentive to start doing compression at the node layer. I did this in some of the gharchive work and it's not a solution we should drive people towards. It means the compression gains we might see at the transport and storage layer will be redundant and possibly even punitive.

Solution (Future)

Document best practices for node sizes.
- Show why smaller nodes are more efficient and the cases in which this matters most.
- Standardize hamt sharding and recommend best practices.
No hard size limitations anywhere.
- Remove size limits in codecs.
- Remove size limits in GraphSync as it replaces BitSwap.

IPLD Transformations for Ethereum state trie

@kumavis needs IPLD transformations to implement the nice ethereum state trie traversal pathing.

We DO NOT have to define:

the transformations computational model

We DO have to define:

the transformations input / output
any constraints.

Because this transformation is -- in general -- very, very simple, we can just define the set of input paths and output paths. (i.e. what we want the high level API to look like, and how it maps to low level). This can be written as native resolvers, and shipped, before any of the computational stuff comes in. We're already doing this in go-ipfs with the HAMT/CHAMP sharding.

Review Notes on IPLD

https://github.com/ipfs/notes/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3AIPLD+ has 33 notes on topics that are related or directly on IPLD domain. It would be great if the IPLD Core Team could review them together and give follow up to those conversations.

Add Captains

Like in multiformats, it makes sense to have captains for each IPLD repo. These captains - also known as leads and project maintainers - would be in charge of their respective repo, making sure that PRs aren't outstanding too long, that issues don't grow stale, and that the code is in good shape. We can add a section to each Readme specifying the captain, and maintain a global list in this README.

See first mention of this here: #1 (comment).

Moving projects off the org

I think only projects which are actively maintained (have a lead maintainer) should be under the official IPLD organisation. Other could be move e.g. to https://github.com/ipfs-shipyard.

Potential candidates:

[use case] Ethereum Light Client

Last Update: 2018.02.21

Introduction

In Metamask we are working on making the Ethereum Light Client for Browsers (and eventually mobile and IoT devices).

In this write up we discuss the storage problem. Namely, the problem of distributing the ethereum state trie over js-ipfs peers.

The main idea is to arrive to the spec from here. There will be several iterations involved, my plan is to edit this very write up, mentioning the changes in the comments below.

Ethereum Light Client Use Case

Context

The Ethereum Database consists on a large Patricia Merkle Tree, we are calling The Trie. It can be understood as a state trie, which root is registered on each block header of the ethereum blockchain, and its leaves are the ethereum accounts. A number of the ethereum accounts carry code on them, these are called smart contracts, and the storage code of each contract is stored as well as a trie called the storage trie, where the account stores its root.
Since the trie elements are content addressable, i.e. stored into a key-value database using the hash of their content, we can distribute them in a p2p network of storage: Each peer containing a portion of the whole state, and updating their assigned segment of data on notification.

The model

The problem in one sentence

How can we distribute the ethereum storage data into browser js-ipfs peers updating it efficiently at each blockchain head publishing?

Three areas for working on the MVP

Obtaining data from devp2p
Bridges
Peers

Obtaining data from `devp2p`

Features

Ethereum data is synchronized from devp2p and put into a convenient data store (may be a Redis DB Cluster.
For an MVP status, we are interested in (3 subjectss)
- p2p node discovery,
- block header synchronization, and
- storage snapshot download.
Eventually we will cover transactions, transaction receipts and other.

The Future

Is important to stress that this must be seen as a "non-ipfs" issue, i.e. We are designing our system with the assumption we already have this data in some store. At the beginning of this research, We started designing with some focus on simultaneity on both networks. It has proven more useful to separate concerns in order to have a more useful vision with the future (everybody goes to libp2p).
We are counting with the major ethereum clients to eventually incorporate libp2p features, facilitating the process of obtention of this data. In other words, while is key to make developments on this area, this step should be seen as an intermediate in the full vision of "Ethereum on top of libp2p".

Bridges

Features

Bridges will be deployed with the following 4 features (libp2p, cache, indexes, publish, respond)

Connected to the network with libp2p.
Able to cache the stored ethereum data as IPLD nodes.
Maintaning metadata indexes to assist the fetching of IPLD nodes
- A list of examples is displayed below
Responsible to publish when there is a new block head, so peers engage into the updating process.
Respond to specific requests for data.
- Using IPLD Selectors.
- Using a different protocol?

The Future

As libp2p starts being getting incorporated into ethereum clients, bridges will evolve to "Hubs", with similar features to above, only that now, data will be obtained from the very libp2p network.
In any case, in this scenario, a hubs would be seen "a more beefy" peer, as they take more data in than a peer, as well as maintaining and publishing indexes.
We cannot discard an approach were Ethereum Clients will have the feature of producing these indexes, as the browser light client gains importance in the ecosystem.
Subscription to published indexes are meant to be opt-in features: Anybody should be able to set up their own. Eventually, the emergence of "networks of trust" will define the best indexes to follow (and how an index can lose its reputation if it broadcast the wrong information).

Examples of indexes to be maintained by the bridges

"All the children of the IPLD node z43AaGF9erQz" (NOTE: Selector feature could be of use in this problem.
"The IPLD node for account 0x281055afc in the block 4,389,678".
Mapping of logs to transactions.
Mapping of transactions to blocks.
Mapping of ethereum data segment to peer.
Variations on the storage from one block to the immediately former one. We are calling this "a delta" (for lack of a better term). This feature is important for peers, as they can build the required nodes list and schedule their download from peers.

Peers

Peers are the reason why we are here.

Features

The following 5 features will be needed for an MVP (connect, cache, subscribe, update, respond)

Connect to libp2p.
Cache one or several segment(s) of the Ethereum Snapshot.
- Using a deterministic approach peer id vs data subset #.
- Using a discrete approach and maintaining an index, both in the bridges/hubs as well as distributed over the very peers.
Subscribe to Bridge Publishing
- Most important: Blockchain head update. Will trigger updates.
- Other notifications (ex: new blocks, version updates, etc).
Update the ethereum snapshot segment assigned or chosen by this peer.
- Using a delta computation, to avoid requesting and traversing from the head(s) of the ethereum snapshot segment(s).
- May use a metadata index, maintained by a trusted bridge, to quickly determine needed parts.
- Schedule request of needed parts to peers known for maintaining ethereum data segments (including bridges)
  - To accomplish a behavior not too unlike to torrenting a file.
Respond to specific requests for data whithin the ethereum data segment it is managing.
- Using IPLD Selectors.
- Using a custom protocol?

The Future

The future goes beyond the MVP. These are some of the features to have

Transaction broadcast into libp2p.
Use IPLD Transformations, to perform ethereum state changes over peers having the needed data.

A raw list of problems / stuff to hack / trello stickies

Based on the description above. These should be the practical problems to ~~hack~~ solve (again, this list will be subject to many iterations):

The source of data (at least for MVP)

Get ethereum data from devp2p (good devp2p nodes, block header and state snapshot) as FAST as possible
- By all means, "a non-ipfs" issue, but key for everything.
- Quick and dirty solution: Take go-ethereum, and take apart functions of it to build a microservice.
- Should become a non-issue when clients start incorporating libp2p features.
This data should be stored in the convenient database cluster, as the bridges will be accessing it.

Basic Networking (concerning to bridge and peers)

Create a p2p browser network
- Who are the signalling servers?
  - Bridges?
  - Dedicated processes?
- Sending and collecting status data for testing purposes
- Should we start testing messaging using a custom protocol?

Bridges

go-ipfs nodes
Connection to the libp2p network
- Will the bridges be connected to the signalling servers?
- Or, should we expect the peers to connect to bridges?
Able to read the devp2p fetched data from the convenient database setup for the source of data above.
- Should we import this data into the ipfs datastore of each node?
- Should we implement a bypass that parses the dag request, and fetches into our devp2p store?
  - (as opposed to jsut putting the info into each node datastore)
Develop the maintaining indexes
- Need to define which indexes are going to be.
- VARIATION INDEX (or "delta"). How do we work a way to tell the peers "These are the variations in your data"?
- Index ethereum data subset to peer.
Able to publish an update of the blockchain head.
Requests to respond
- dag requests (ETH IPLD)
- metadata index requests (should we have a custom protocol here?)
- IPLD selector requests (when available)

Peers

js-ipfs nodes
Connection to the libp2p network
- Bootstrap to signalling servers
Subscribe to pubsub topics
- Blockchain head update
Maintain an Ethereum Data Segment
- Deterministic or Discrete TBD
- First Download comprehends a head, and all their children.
- Trigger on Blockchain head update
- Deltas? How do we compute them? Do we get them from the bridges?
- Schedule and download, using a relative location from the head of the data segment.
Requests to respond
- dag requests (ETH IPLD)
- IPLD selector requests.

As stated above, this post is to become the most recent writeup to the solving of this problem, and will pass over a number of iterations. Editions will be mentioned above in the comments.

Special shoutouts to @diasdavid, @dryajov, @kumavis and @whyrusleeping.

Add badges to IPLD Repos

Similar to:

Take inspiration from https://github.com/ipld/js-ipld#the-javascript-implementation-of-the-ipld but stick with one color https://github.com/ipld/ipld#ipld

Videos explaining IPLD

For the next version of the website, it would be good if we have some concise videos talking about what IPLD is and how it works.

Ref: ipld/website#17

IPLD Redux (Scope, Definitions and Dependencies)

After a long and productive sit down with @Stebalien and @jbenet we figured out a workable path forward for IPLD.

Some of the prior terms have been reduced in scope and some of the previous "IPLD Next" work has been pushed out to the future. The biggest changes:

The advanced type system we've been talking about is being pushed into the future and is not part of any of the definitions below. We expect the world to look very different once we have WebAssembly accessible and available in IPLD and the potential benefits and changes to this advanced type system are big enough that it's best to just wait to build it until we can rely on WebAssembly.
We are punting on generic transcoding between codecs. If you want to turn a dag-json object into a dag-cbor object you'll need to write code specific to translating between those codecs. The prior "canonical JSON" threads are being closed out and punted.
"IPLD Paths" needs to be its own spec. Defining how escaping works is going to be painful :(
The "IPLD Data Model v1" DOES NOT describe existing formats like git and dag-pb. The "IPLD Data Model v1" is strictly for new codecs like dag-cbor and dag-json to support and for complex data-structure specifications like HAMT and ordered indexes to depend on.
- A key result of re-scoping the data model to new codecs is that we can require a broader set of basic types. This raises the bar for implementing a codec that supports the data model (dag-json has to figure out how to represent 64bit integers, Binary, etc) and it gives a much better foundation for complex data-structure specifications to be able to build and rely upon.
- While this raises the bar for implementing a codec that supports the data model it reduces the requirements on implementing a codec that does not. A codec only really has to have a conceptual understanding of a link and be able to represent it as a CID, and understand the Path spec. Since replication is built strictly from codecs and blocks this also makes writing replicators easier and broadens the nodes that are easily replicated even with more advanced graph syncing.

       IPLD Dependency Graph

+---+                          +-----+        +---+
|CID+-----------+-------------->Block+-------->Raw|
+---+           |              +--+--+        +---+
         +------v-------------+   |
+----+   |Links (Conceptually)|   |
|Path|   +------+-------------+   |             +-----------+
+-+--+          |                 +------------->Replication|
  |    Codecs   |                 |             +-----------+
+-v-------------v-----------------+---+
|                                     |
| +---+    +-----------------------+  | Complex Data-Structures
| |Git|    |     Data Model v1     |  | +--------------v-------+
| +---+    |                       |  | |                      |
|          | +--------+ +--------+ +----> +----+ +-----------+ |
| +------+ | |dag|json| |dag|cbor| |  | | |HAMT| |Sorted Tree| |
| |dag|pb| | +--------+ +--------+ |  | | +--+-+ +----+------+ |
| +------+ |                       |  | |    |        |        |
|          +-----------------------+  | +----------------------+
|                                     |      |        |
+-------------------------------------+      |        |
                                             |        |
                +----------------------+     |        |
                | File System (unixfs) <-----+        |
                +----------------------+              |
                +--------------------+                |
                |                    |                |
Structured Data | VR, Geo, SQL, etc. <----------------+
  w/ indexes    |                    |
                +--------------------+

While our dependency chain isn't laid out in a perfect layering system it does show how powerful the separations are when you think about things from the point of view of a consumer of each component.

Structured data people use in applications and file system representations rely on some generic complex data-structures. Because those are built on the Data Model they work with any codec that implements the data model, they aren't locked into dag-cbor and can mix and match different dag implementations without re-implementing the data-structures or transcoding the nodes. This means a file system in unixfs-v2 could have sub-directories that are still unixfs-v2 but are encoded using entirely different codecs.
All of these advanced data-structures and existing content addressed data like git conform to the same codec interface which means the graphs can all be easily replicated and linked to each other.

Future (Currently out of scope)

For the purposes of the above dependency graph and for the next quarter or so the following work is considered out of scope. However, I would like to capture some of the key things we've talked about.

Mutability
- We eventually need a "mutable link." @mikeal thinks this should be much more generic than just a public key that points to a root node, we should support different log formats that, in addition to the latest root, have a notion of history.
Paths in Links.
- The current Link Type in the data model is strictly a CID, it does not include a Path. This may be added in a future version of the Link Type but we're punting for now.
- Similarly, "conceptual links" for the purpose of codecs that don't support the data model are strictly CIDs.
Higher level paths
- The current IPLD Path spec will operate below the layer that HAMT and other sharded data-structures operate. In the future we'll need a higher level solution, especially once we have mutable links and paths in those links because we can't guarantee that the low level path will remain the same between mutations of sharded data-structures.
- There's not alignment as to whether or not this needs to be a new scheme or not.
- If we wait for WebAssembly, and the next iteration of a Data Model, we can load the implementation for a sharded data-structure at its root node. This would future-proof high level paths to any future data-structure.
- This does not effect ipfs:// paths into MFS, the spec for unixfs-v2 should include an extension to the Path spec for any sharded data-structure it supports (currently just hamt).
  - Yes, this means that if we want to add a sharded data-structure to unixfs we'll need to release a new version of the spec but in this pre-WebAssembly world we need to do that anyway to ensure compatibility between implementations.

Where does "Block" live?

Out of necessity, we have implementations of "Block" level objects and interfaces in IPFS. This includes an HTTP API for block level access and block level objects like require('ipfs-block').

However, Blocks are a necessary IPLD level concept. Any documentation or tutorial about IPLD ends up needing to define a block as a CID + data.

In a practical sense, you can't build much on IPLD w/o a Block object unless you're going to use [cid, data] tuples. As we imagine an ecosystem on top of IPLD that works w/o IPFS we need a Block object. In fact, some of the libraries I've already written using only IPLD end up requiring ipfs-block.

I'd like to potentially move these implementations and docs to the IPLD layer/org and write up more uniform definitions. I hope that this can also clear up some of the confusion that leads to issues like this and put us on a path towards future block services that operate without a dependence on IPFS (or IPFS repo).

Update the README with the newly created tutorials, demos and tools

@alanshaw and @olizilla you have shipped multiple demos of IPLD in the last few months. Do you feel like sharing on the README to be entry points for others?

DiagramJS - Potential fun tool to serve as a base for an IPLD Playground

Found it today, haven't tested it but the live demo looks very promising https://github.com/slothking-online/diagram#live-demo

Add IPLD to multiformats website

https://multiformats.io/

Centralized seed services, inspired by WebSeeds

First, some background.

Bittorrent WebSeeds

Bittorrent has this great feature called "WebSeeds." It's rather simple, instead of having to spin up a service of bittorrent nodes in order to keep content alive you can simply add an HTTP or FTP URL as the fallback location of the content.

Bittorrent clients try to pull content from other peers but if none are available, or nobody has a complete copy of the file, or if they are just too slow, the client has the option of pulling the content out of the centralized service.

This allows people to keep content up much more simply that they would otherwise be able to if they had to manage a cluster of bittorrent nodes, without sacrificing the other benefits of bittorrent when the content is popular and the network around it is healthy.

Centralized Block and Graph services

The data-structures in IPLD offer some big benefits and upgrades compared to bittorrent. Instead of associating a specific URL with a specific file we could actually have fallback services which are known to have large caches of IPLD blocks.

These services could provide not just a fallback but also get us out around certain performance penalties in IPFS/IPLD. For instance, take the simple case of pulling up a website for the first time using IPNS/IPFS:

IPNS resolves to a CID.
IPFS looks in the DHT for that CID and establishes a network.
As IPFS connects to peers it begins to pull through the graph to grab the content.

This is always slower for the first load than centralized solutions because it takes longer to establish the network to retrieve that content than it would to just connect to a central server.

However, if you could configure a block/graph service in IPFS it could look more like this:

IPNS resolves to a CID.
In parallel:
- IPFS looks in the DHT for that CID and establishes a network.
- Existing HTTP2 connections to block/graph services are queried for the CID.

Now you're already pulling in content while the network is established and you can continue to parallize/optimize grabbing the content as the connections are made. This is the best of both worlds and could actually beat the performance of existing web fetches.

Applications that use IPFS could either configure a set of known services in the node or provide a list of seed services when they ask for specific pieces of content.

I'm thinking that there are two distinct sets of services:

Block Service (stores content by multihash)
Graph Service (stores content by CID, returns the Block data and meta information that includes whether or not the service contains the full graph referenced by the CID)

If something like this was available it would make a lot of our infrastructure challenges a lot simpler. We could define a very simple REST API that is more or less compatible with S3 and people could literally just stick Cloudflare in front of it for global caching.

Yes, this is a centralized service, but hosted IPFS clusters are also centralized services, they are just available in the DHT. As long as the data-structures are content-addressed centralized services are no more than a caching layer and do about as much to "centralize" the data-structure as an offline cache does.

Thoughts? @daviddias @alanshaw @olizilla @vmx @eefahy

Invisible IPLD nodes

How should we deal with degenerate IPLD nodes that are just CIDs?

c := dag.add("value")
b := dag.add({"/": c })
a := dag.add({"/": b})

The question is: what do /ipld/$a, /ipld/$b resolve to?

Option 1: /ipld/$a resolves to {"/": b} (same for b). This is inconsistent with the rest of IPLD because it's usually impossible to resolve a path to a CID.

Option 2: /ipld/$a, /ipld/$b, and /ipld/$c all resolve to "value". This would make a and b disappear. Furthermore, a blockstore may have $a but be unable to resolve /ipld/$a.

Option 3: Disallow them. That is, valid IPLD nodes may not be bare CIDs.

Thoughts?

What is the best way to link IPLD/IPFS to RDF URI references?

I would like to use IPLD and IPFS to reference a semantic web using the Resource Description Framework (RDF) standard. Ideally it would be to point the IPFS/IPLD object to the RDF URIs references. How would I go about doing this? Additionally, I would like the same IPFS/IPLD to reference a file or directory of files with versioning. What would be involved in doing something like that? Thanks for assist - this is exciting work!

support for Bitcoin Cash

Define the paths an IPLD format implementation should support

@Mr0grog makes a good point in #44 (comment)

I think that, because path definition is a really important part of IPLD, it’s important for each codec to specify the set of paths it supports. We haven’t done a good job of that. For example, ipld/js-ipld-ethereum#25 is in part caused by not having a clear definition of what fields/paths an Ethereum resolver must present/resolve (regardless of implementation language, since we want the same path to work in JS and in Go and in Rust and in… etc.).

It would be useful to have a resource that defines the transformations that an IPLD format provider performs to make things path-able, and to know what the available paths are. Should such a resource be an extension of https://github.com/ipld/interface-ipld-format/ ?

IPLD Website update call #2

Let's try to have an update call early this coming week

Agenda:

Plan content with Richard
Updates from @victorbjelkholm

Interested: @victorbjelkholm, @RichardLitt, @nicola

Related: #3

cc @dignifiedquire

Dead link to multicodec-packed

Its target was removed here: multiformats/multicodec#16

ipld / ipld Goto Github PK

ipld's Introduction

IPLD

Finding Us

Docs Development

License

ipld's People

Contributors

Stargazers

Watchers

Forkers

ipld's Issues

Nori's IPLD Use Case

High level

Example Data

How the data will be used

Scaling

How we are doing this now

Pathway to decentralization

Background

Advanced Data Structures (Collections, Indexes)

Inconsistent Shape

Consistent Shape

Overall Outcomes

Timeline

Need timing

Use cases

Protoype proposal

Next steps

Webassembly DAG interfaces

related

Overview

glossary

Rational

Challenges

API Overview

Data types

Tables

Translator API

createVertex

createEdge

addEdge

Selector API

select

root

resolve

getEdge

getEdgeDataLength

getEdgeData

getLink

isNullLink

Iterator API

edges

next

getEdgeName

getEdgeNameLength

Completion State

Tagline/Headline

Slogan

Positioning Statements (25 words or less)

Description (25 words or less)

Today

Solution (Future)

Introduction

Ethereum Light Client Use Case

Context

The model

The problem in one sentence

Three areas for working on the MVP

Obtaining data from devp2p

Features

The Future

Bridges

Features

The Future

Examples of indexes to be maintained by the bridges

Peers

Features

The Future

A raw list of problems / stuff to hack / trello stickies

Obtaining data from `devp2p`