dat-ecosystem-archive / book Goto Github PK

Documentation on how to implement the Dat Protocol [ DEPRECATED - see https://github.com/hypercore-protocol/new-website/tree/master/guides for similar functionality. More info on active projects and modules at https://dat-ecosystem.org/ ]

Home Page: https://datprotocol.github.io/book/

License: Apache License 2.0

Shell 100.00%

book's Introduction

See hypercore protocol guides for similar functionality.

More info on active projects and modules at dat-ecosystem.org

The Dat Protocol

This repository contains the source for "The Dat Protocol" - an implementers guide.

Requirements

Building the book requires mdbook.

$ cargo install mdbook

Building

To build the book locally and preview:

$ mdbook serve

License

MIT OR Apache-2.0

book's People

Contributors

Stargazers

Watchers

Forkers

khernyo yerkopalma damons aral decentral1se zhouhansen higgins selesy salespaulo

book's Issues

"Big Picture" Outline of what needs to be implemented

Thanks for this great resource.

For the purpose of getting a first impressions of all the necessary moving parts for completing an implementation of DAT, it would be nice to have a full list in one place on the book somewhere.

That would also be important to see the order in which things should be done so that learning can be more easily directed for getting into this space and doing the implementing.

Explicitly ention root hash is not stored anywhere

In the Merkle Tree & Terminology sections we should emphasize the root hashes aren't persisted anywhere, but only the signatures of the hashes are persisted.

Also emphasize index for the signature is the index of the right-most node in the tree, divided by two. We should update our images to the following:

tree (stays same)

0: #0 = hash(data[0]) ─┐
                       #1 = hash(tree[0] + tree[1])
1: #2 = hash(data[1]) ─┘

2: #4 = hash(data[2])

signatures (added indexes)

0: #0 sig(#0)
1: #1 sig(#1)
2: #2 sig(#1 + #4)

Add optimization for full_roots function

**🙋 feature request

I started my career as an embedded systems engineer which meant that I had a long opportunity to become an expert at bit-wise operations. I've created an algorithm that dramatically improves the performance of finding (or in this case calculating) the full roots for a flat-tree - the reference implementation (in Javascript) iterates over all nodes in the tree with an O(n) performance while the new algorithm is based on the number of bits needed to store the record count and has O(n * log2(1)) performance.

If you'd like a paragraph describing this optimization, feel free to assign this issue to me.

Possible Solution

Additional optimization is possible as the roots for a tree only change when records are appended. It's also possible to cache the root list for a tree of a given size after calculating it and then use that value for any tree of the same size.

Code Sample

Here's my Go code for calculating the list of full roots for a flat-tree. Note that the size of the underlying array is retrieved as len(t.Nodes) but it would be possible to create the same function with a signature of func Roots(size int) []int to match the signature of the Rust example given in the book.

/*
Roots calculates the list of "full roots" for the FlatTree.  
*/
func (t FlatTree) Roots() []int {
	recs := (len(t.Nodes) + 1) >> 1
	roots := []int{}

	// If there are an odd number of leaf nodes (records) then the last
	// record is also a root and can be calculated as the value of the
	// other set bits times two.
	if recs&1 != 0 {
		recs ^= 1
		roots = append(roots, recs<<1)
	}

	// For the rest of the set bits, the root can be calculated as the
	// as the sum of the current bit minus one plus the value of the
	// other set bits times two.  If the current bit is zero, the loop
	// short circuits.
	for bit := 2; recs > 0; bit <<= 1 {
		if recs&bit == 0 {
			continue
		}

		recs ^= bit
		roots = append([]int{(recs << 1) + (bit - 1)}, roots...)
	}

	return roots
}

Bitfield is 512 bytes

The bitfield size was updated from 256 to 512 bytes. See holepunchto/hypercore#184.

Remove Google surveillance included by default by mdbook

After posting a copy of the book on my site, I was alerted that I was unknowingly contributing to giving Google eyes on the web and exposing my visitors to surveillance by Google/Alphabet, Inc. because mdbook apparently includes a bunch of Google surveillance by default (https://github.com/rust-lang-nursery/mdBook/search?utf8=%E2%9C%93&q=google&type=).

We should remove this from our build and not expose people to surveillance while reading the book.

Clarification request: bitfield deserialisation (to tree form) logic

Update 2: I just made a quick animation that we could add to the book or use in presentations to illustrate the flattening of the tree:

flat-tree.key.zip

(Original Keynote file attached under Creative Commons Attribution-ShareAlike 4.0 International. Please feel free to use in your own presentations if you want to.)

Update 1: OK, having now seen the diagram at the top of the Storage section, I now get how the tree is serialised. It would make sense to have that diagram in the Bitfield chapter to make it clear how the serialised version is arrived at.

So my deeper tree:

            01
      01          00
  01     11    00    00
11  00 11 11 00 00  00 00

Would flatten (now that I can see it visually it makes perfect sense, as we are literally squashing the tree down from the top node) to:

110100011111110100000000000000

I’ll prepare a PR to add a note about how the flattening happens to the Bitfield chapter.

Read on only if you want to see my late-night ramblings from last night and an alternative way to serialise a tree :)

A 🔦 question regarding bitfield serialisation logic:

In the Bitfield chapter, we have the following tree structure (top-down)

       01
  01       00
01  11   00  00

That is serialised as:

01 01 11 01 00 00 00

The logic I could work out for that was:

Start at the top (01) then take left (01), left (01), right (11). Having exhausted the left arm of the tree from the root, then proceed (from root) to (I’m assuming, following the previous pattern), right (00), left (00), and right (00) until you’ve exhausted the right arm of the tree.

While I can see how, given the tree structure, you can serialise it as such. I don’t understand how, given the serialised form, you can recreate the tree without having a delimeter for where the left arm of the tree ends.

Is it because the bitfield is always guaranteed to be balanced that you know to delimit after the (n-1)/2 th position?

What if there was another level to the tree?

Would:

            01
      01          00
  01     11    00    00
11  00 11 11 00 00  00 00

Serialise as:

01 01 11 01 00 11 11 11 00 00 00 00 00 00 00

In which case, dividing at (n-1)/2 would give:

01 | 01 11 01 00 11 11 11 | 00 00 00 00 00 00 00 |

And again, recursively:

01 | 01 ( 11 01 00 ) ( 11 11 11) | 00 ( 00 00 00 ) ( 00 00 00 ) |

And, finally:

01 | 01 ( 11 [01 00] ) ( 11 [11 11] ) | 00 ( 00 [00 00] ) ( 00 [00 00] ) |

Which, dropped into tree form, becomes:

            01
      01          00
  11     11    00    00
01  00 11 11 00 00  00 00

Which differs from the original… telling me that my original assumption of the serialisation logic was incorrect. The deserialisation logic makes sense, though, so, to backtrack, the original tree:

            01
      01          00
  01     11    00    00
11  00 11 11 00 00  00 00

Should have been serialised as:

01 | 01 ( 01 [11 00] 11 [11 11] ) | 00 (00 [ 00 00 ] 00 [ 00 00] ) |

Right, that makes sense. And it also makes sense in that the moment we read the 6th bit-pair, we know that we don’t have to look any further down that branch.

I think I just rubber-ducked my own question.

If the conclusion is sound, I think it would benefit the chapter for me to write it up in a concise few paragraphs. Thoughts?

Related: #1

Looking forward to read how to verify a merkle tree

Looking forward to read it soon.

https://datprotocol.github.io/book/ch01-02-merkle-tree.html#verifying-a-merkle-tree

Normalize cross-project terminology

Having explicit terminology is a great help, especially with enabling conceptual understanding.

Cross-project however, one projects idea of a "feed" may be similar but distinctly different than another projects. In hypercore there's a good pinning of what "feed" means in that context; in hyperdb we may refer to "feed", but this may underneath be a collection of feeds that are multiplexed into a single feed. Understanding the constraints and variations of the terminology between projects would help with a more top-to-bottom understanding of how the projects interrelate, where limitations are and what is extensible, etc.

Need more basic explanations and examples for bitfield

These are some 🔦questions:

I red up bitfield, sparse-bitfield though. I've still struggled to figure out the bitfield module for few days. The source of indexed-field and hypercore/bitfield are both hard to read. And sleep.md/bitfiled is similar to this document. So I decide to ask questions now.

Is this module for sparse file, or something similar to it？
In Types of Bitfields
"Data Bitfield: Indicates which data you have, and which data you don't. " What does 'the data you have' means? Does this means data entries I've already downloaded, then when update coming, I don't need to download them again? I don't know. Hypercore is a append-only log. So if there are 3 entries in metadata.data, the Data Bitfield will be 111000000?

I want to figure out this one first.

Link to HyperDB deep dive

https://github.com/lachenmayer/hyperdb-authorization-guide

Publish the book somewhere

This a 🙋 feature request.

It would be nice to make the book available online in built form.

Possible Solution - GitHub Pages

One solution would be to use GitHub Pages. According to docs we could use the approach of having the book built into /docs directory on the master branch.

It can be build like this:

$ mdbook build -d docs

And once docs directory is checked into master it can be made available under https://datrs.github.io/book.