dao-xyz / peerbit Goto Github PK

View Code? Open in Web Editor NEW

168.0 10.0 11.0 46.12 MB

P2P database framework with encryption, sharding and search

Home Page: https://peerbit.org

License: Apache License 2.0

TypeScript 99.24% JavaScript 0.39% HTML 0.28% CSS 0.08%

database peer-to-peer orbitdb distributed e2ee libp2p

peerbit's People

Contributors

Stargazers

Watchers

Forkers

tabcat threshold862543 pynchmeister groundbasesoft erudition zivanovicb benjaminpreiss parajbs ivkan mueslie tephyrnex

peerbit's Issues

Mitigate ambiguities that arise when replicators can not decrypt

Related #126, #15

Now if a Node takes on a replicator role (default) and peers put documents they can not decrypt, search results that are collected throughout the network might be incomplete because there is an incorrect assumption that replicators can decrypt every document.

Should this issue be mitigated by having a canReplicate callback? Or something else?
What other solutions exist that does not impose too much restrictions?

```document-store``` indexBy subfield

Currently
indexBy: string

But we want
indexBy: string[]
So you can index documents based on a subfield

canAppend re-check on load (?)

Currently, load does re-check canAppend since, it is assumed that if someone had the permission to append, the commit should stay, no matter how many times you "load()" your database. However, if the canAppend depends on some revokable permission, you might want to update your local log to reflect that as soon changes occur.

What is the expected behaviour?
Does the current solution provide the expected behaviour?

Evaluate and prevent "replay attacks"

Currently no real work has been perform to test, and validate that "replay" attacks can not be done, i.e. some can send/spam an existing message multiple times to a store, or perhaps take a message from one store with no "nexts" and post it in another store

Write test(s) to ensure that obvious replay attacks are not possible (most likely in the ipfs-log package)
Implement changes necessary (if any) to prevent replay attacks

Using a fully featured encrypted filesystem

I am wondering if this project would be better using one of the existing encrypted filesystem for a private dapp's filesystem. The benefit of using one that has already been written is that it has undergone peer review and performance and security testing.

https://github.com/MatrixAI/js-encryptedfs

The "ephemeral key" generated by peerbit could be the AES-GCM key used by an encrypted filesystem. Arbitrary access, streaming, and better privacy through random fragmentation.
https://github.com/dao-xyz/peerbit/blob/master/packages/utils/crypto/src/encryption.ts#L97

It is better to use separate keys separate needs. One KDF can be used for data at rest and another for transport. The libp2p world developed the noise protocol for their encryption needs because it supports broadcasting and multicasting using a shared key.
https://www.wolfssl.com/tls-1-3-versus-noise-protocol/

Some libp2p clients are already using noise for encrypted broadcast/multicasting. But there isn't a really good encrypted filesystem for IPFS. One of the best e2e protocol is of course the Signal protocol, which is what Berty is using with orbitdb and there is a javascript port:

https://github.com/signalapp/libsignal-protocol-javascript
https://berty.tech/docs/protocol

E2EE with a public relay/replicator

E2EE encryption where peers are offline needs a middle man that can provide latest heads to peers without knowing its content. This is achieved right now by allowing this replicator to be able to decrypt the clock, signature but not the content (hence is able to build a log). Out of the box, this is not private, since the signature contains information about the sender.

Possible solutions

Allow multiple senders per message. Where one sender is for ACL for the relay the other one if for the end receiver (Cons: developer experience might be offputting)
Create a Document store where each document contain the latest head, and the id is the address (though problematic since the IPFS blocks still needs to be store somewhere, (1) solves this since the relay will store the full tree).
...

Trim DIDs

The DID in Peerbit is simple the PublicKey prefixed with a number indicating the elliptic curve signature algorithm. Right now it is prefixed with two additional bytes that are unnecessary

Remove unnecessary prefix bytes from the identity

Query protocol bloaty

AnySearch module can be merge into DQuery and Document and DString and be removed
Search results from any DQuery should return context (the address)
Search should allow for querying any state "created at" and "last edited at"

Merkel-CRDTs for data consistency

How does peerbit make sure that a user can add a data record to the database, and make sure no one else removes it or modifies it? If i am not mistaken peerbit and orbitdb expects users to be trusted when modifying a shared collection.

A Merkel-CRDT can be used to make it hard to deny that data was ever seen and provides some source of truth without a full blockchain:
https://research.protocol.ai/publications/merkle-crdts-merkle-dags-meet-crdts/psaras2020.pdf
https://github.com/ipfs/go-ds-crdt

DString implementation needs rework

The current implementation of DString is like a canvas. Anyone can write anywhere on the canvas by inserting string at an offset with length. The current implementation does not support efficient deletion by pruning the DAG, the operation data type itself does not out of the box support this.

Consider instead the xi-editor that uses the rope data structure

Regarding xi-editor, here is some interesting discussion/critique on the implementation that xi-editor brings forward.

`direct-sub` disconnection tests fails in CI if only closing the protocol handler

Following test fails in ci if we do

await session.peers[X].services.directsream.stop();

instead of

await session.peers[X].stop();

but succeeds when running locally.

Expected

Same behaviour in CI as locally
stop calls to the protocol handler should be treated as a disconnection event
Test only that one unreachable event is emitted

Remove crypto key abstraction and rely on libp2p PeerId?

Blockers/Questions

Can we use libp2p sepc256k1 with Metamask? (Perhaps sign a message and recover publickey)
How big are the performance penalties?
Utility methods for going fro ed25519 to x25519? How will the API look like?

```direct-sub``` message deduplication improvment

Add tests to make sure relays are only publishing messages in untraveled paths

Add benchmarks

Start with:

Add X documents locally
Add X documents and wait for replication.

Merge store with log

Store class is redundant, more or less all functionality can be absorbed in the Log class. This will simplify documentation work

Remove all dependencies on 'Buffer'

For all code that might be running in the browser

DocumentStore put fails sometimes in test suite

On rare occasions, the test suits yield following error

Needs to be initialized before loaded

  343 |     async load() {
  344 |             if (!this.initialized) {
> 345 |                     throw new Error("Needs to be initialized before loaded");
      |                           ^
  346 |             }
  347 |
  348 |             if (this._cache.status !== "open") {

  at HeadsCache.load (packages/log/src/heads-cache.ts:345:10)
  at HeadsIndex.load (packages/log/src/heads.ts:44:41)
  at Log.load (packages/log/src/log.ts:1163:39)
  at Log.join (packages/log/src/log.ts:840:14)
  at DocumentIndex.sync [as _sync] (packages/programs/data/document/src/document-store.ts:116:14)
  at packages/programs/data/document/src/document-index.ts:449:19
      at Array.map (<anonymous>)
  at initFn (packages/programs/data/document/src/document-index.ts:446:16)
  at packages/programs/data/document/src/document-index.ts:487:27
      at async Promise.all (index 0)
  at DocumentIndex.queryDetailed (packages/programs/data/document/src/document-index.ts:491:20)
  at DocumentIndex.query (packages/programs/data/document/src/document-index.ts:514:21)

Documents store: Sort and "Pagination" of query results

Currently sorted and paginated/limited search results are not supported, but it is an important features where search results could yield thousands of results.

Docs

It's crazy to me that a product with as much potential as this one only has 49 stars. But there's a reason: you have no docs. People don't have time to wade through the source to find out how to get stuff done.

I think it's worth figuring out what features people would like this product for. And then documenting "how to" get those features into their app. Your stars will go to 4.9 K almost overnight! In any case -great effort so far & good luck!

Not-equal-to compare queries

Something that could be useful is adding Compare.NotEqual to FieldBigIntCompareQuery, to find all documents where the value of the given field is not equal to the specified value.

Something analogous could also be useful with FieldStringMatchQuery.

`document-store` Rename canAppend -> canWrite ?

canWrite validation makes more sense, if we have a canRead callback for document stores.

Re-evaluate RPC topics

Currently RPC topic addresses are decided by the document store as: log id + "/" + some string.

This was a quick fix when moving away from program addresses.

Evaluate a more neat approach that ensures uniqueness and simplicity in its generation

Simplify Results API for document-store

Issue #56 is a sympton of that Results API of the document-store is too complicated. The boilerplate code that shows how to extract an element from the Results object requires an explanation itself (!) https://github.com/dao-xyz/peerbit-getting-started/blob/b198598557e146a509dfc9c028e37ad101e31494/src/index.test.ts#L51

A solution is to simplify the object returned from index.query(...) such that it either is a single loop, where the results are already concatenated and deduplicated.

Entry hash as ```Uint8array``` instead of ```string```

For performance reasons it does not make sense to store hashes as string on Entries, but to store as Uint8arrays instead. With this solution we can more quickly encode and decode data from and to Entry objects

Querying should respects shards

Currently querying a db will send query request to all subscribers. This does not scale perfectly.
A better solution is only to send query messages to 1-2 peers in each shard.

`direct-block` tests fails when running in CI

For some reason (CPU reasons, or race condition) the direct-block tests fails when running in CI

One thing to try is different values of maxInboundStreams and maxOutboundStreams

Enhance MinReplicas

Currently the MinReplicas option is on the program level i.e. peer.open(program, {minReplicas: 123}). It would be ideal that this would be on the commit level, so that different content can be stored with different permanence.
Relative min replicas. One should be able to choose relative min replicas like "100%" or "50%" of the network, not just in absolutes

Support for query operators

It would be great if Peerbit supported query operators such as those used in MongoDB, such as:
{$in: ['abc', 'def']}
to find documents, for example, with field values that match any of those in the array.
Here's a list of common ones:
https://www.mongodb.com/docs/manual/reference/operator/query/
(some of these already exist for FieldBigIntCompareQuery)

Evaluate 'Roles'

Currently there are two roles

Observer
Replicator

This makes sense for databases but not for other general services that have different peers that participate for certain reasons.

How can this be made future proof?

Encrypted Stream Multiplexing

There is a unique feature in libp2p - that is encrypted multiplexing with noise:
https://github.com/libp2p/specs/blob/master/noise/README.md
https://docs.libp2p.io/concepts/multiplex/overview/

Noise has been vetted, and has been around for a while. All other things being equal, the approach above is the most efficient way to do realtime updates as one peer with a good network connection (like a skype node) could multiplex the database update stream and/or a pub/sub event stream.

Signal and TLS + HTTP/2 do not do this. noise + multicast appears to be a better fit for a near-realtime peerbit swarm. If you were doing a realtime application like a video game, that still needs to be private and decentralized then I correct me if i'm wrong but I think this is currently the best solution from a performance and network traversal perspective.

Encrypt `next` of entries

Right now the next property is public. It would make sense to optionally be able to encrypt this property so that only trusted peers can access the complete tree.

Questions:

How would this infer with IPLD?
Should maxChainLength also be encrypted?
Should forks also be encrypted?

This issue should be taken in consideration with #15

Improve "can read" filtering of Documents

Related #126

Introduce a filter function that one can use to control why can search (canSearch (?))
Use canRead as a filter that is used before returning results.
Pass results in the canRead function, not just the publicKey

Ability to run tests in parallel

Currently when running all tests in parallel you will run into test errors due to side-effects from cache and IPFS network traffic. This should be fixed so we can quickly run through all the tests and be determined that the results are correct.

Make sure no side-effects happen when running tests in parallel
Make sure that yarn test works as expected and is running tests in parallel
(Extra) when this is working as expected add CI to run tests on all PRs and merges on GitHub

Improve ```canRead``` documentation

Related #126

Add documentation
Should canRead be a mandatory fn to provide on open to Documents to mitigate leaks?

Support for browser nodes to become full replicators

Currently, browser nodes are not assumed to do any replication work. Instead there will be non-browser nodes that are subscribing to some replication topic. This assumption is bad instead

Make a replication topic to be of two forms e.g. "topic/observe" (for observers only) and "topic/replicate" (for observers and replicators)
Redefine the findLeader method to aggregate peers by topic subscribers rather than direction connections

Add simplified swarm connect utility for Peerbit nodes

Instead of connecting to the full identifier (swarm connect)

e.g.

/dns4/xyz123.peerchecker.com/tcp/4002/wss/p2p/12D3KooWQVtriWH37wD9sQBzayHbCvc2nn626maaSzKWta7XbWe8Translate

one should be able to to connect to

xyz123.peerchecker.com

solely and the utility would find out how to properly address the address

Add vue and react examples.

Can we get this code into a repo to show that peerbit works easily with vue and react. Maybe publish bindings to npm to promote adoption?

samples that can be featured:
https://gist.github.com/djmaze/9e99382f6ad364f0d77830f826c01b55
https://gist.github.com/denzuko/cdf88e39f53d609ee8393fd0296f6273

It will make it easier to have libs like this:
https://www.npmjs.com/package/define-orbit
https://www.npmjs.com/package/react-orbitdb
https://github.com/DeFUCC/gun-vue

Reevaluate whether Lamport clock is necessary and how to implement physical time

Since the order of the entries in log could be determined by the "nexts". The lamport clock purpose is ambiguous. Evaluate whether it is necessary to include it at all in the log.

Credit to @tabcat and Opal for sharing this idea.

Main issue with removing the clock on Peerbit as of now is how to order unrelated documents in a document store, if they are not connected to each other in any meaningful way (like through a shared clock). How to we make sure that new entries are submitted with a truthful timestamp?

Find pros and cons
Implement solution if clock is to be removed

Support for setting log level and log path for Pino logger

Allow users using the CLI to set the log level and output path of the Pino log

Make test domains ip transparent

Currently it is hard to know which server corresponds to which addresses unless you go into the server an manually check the configuration

Make the generated test domain more transparent (make the ip visible)

Multiple signers per commit

Good solutions to #19 #15 expect that a commit can be signed by multiple parties.

For #19 we want a trusted "clock service" to sign root commits that they have correct timestamps.

For #15 we can keep sign messages with 2 identities, 1 identity that allows the message to be stored on a replicator. 1 identity that proves that for the end receiver that you are you

Sharding capacity/resource specific

Right now sharding assume that all peers have more or less same capacity for storage (ram, disc, cpu etc). This can be optimized so that we can use powerful peers more than weak ones.

Make sharding alg. to respect resource capabilities of peers in a meaningful way
Make sure this does not become an vulnerability for peers as they when/if they share resource information

`direct-sub` performs better if `autoDial = false`

The benchmark

without autoDial

size: 1kb x 1,722 ops/sec ±1.89% (82 runs sampled)
size: 1000kb x 107 ops/sec ±2.02% (85 runs sampled)

with autodial

size: 1kb x 1,396 ops/sec ±2.86% (81 runs sampled)
size: 1000kb x 57.36 ops/sec ±3.13% (65 runs sampled)

Expected behaviour (?). Directsub performs better just Directstream and Directblock if autoDial = true

Consider being a backend for RxDB and abstract-level

Peerbit is a backend for rxdb and could be a great abstract-level database. Writing a backend module for these platforms will generate buzz within these communities and generate interest.

https://rxdb.info/offline-first.html
https://github.com/Level/levelup (abstract-level)

rxdb uses pouchdb as an adapter:
https://rxdb.info/adapters.html
which is supported by orbit:
https://www.npmjs.com/package/pouchdb-orbit

peerbit could be a backend for pouchdb which could then be an rxdb adapter. Getting more compatibility.

```rpc``` make sure promises are resolved before close

Add test to ensure that event handlers are removed correctly of a RPC is started and terminated immediately

Broken encryption when direct dialing

The snippet below demonstrates a scenario where encryption is not upheld. In this example, there are three clients involved. The first client creates a database and inserts a Post entry into it. The payload of the Post entry is encrypted specifically for client3. Subsequently, client2 contacts client1 and synchronizes the post. At this point, one would expect the post to be encrypted. Finally, client3 also contacts client1 and synchronizes the post.

When fetching the data, the expected behavior sometimes functions correctly, while other times it does not:

I don't know what going on here, is this an internal race-condition?

Running the following POC with rm peerbittest; ts-node-esm documentstorelate.ts:

import { field, variant } from "@dao-xyz/borsh";
import { Program } from "@peerbit/program";
import { Peerbit } from "peerbit";
import { DeleteOperation, Documents, Observer, PutOperation, SearchRequest } from "@peerbit/document";
import { X25519Keypair } from "@peerbit/crypto";


function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}

@variant(0) // version 0
class Post {
	@field({ type: "string" })
	id: string;

	@field({ type: "string" })
	message: string;

	constructor(id: string, message: string) {
		this.id = id;
		this.message = message;
	}
}

@variant("posts")
class PostsDB extends Program {
	@field({ type: Documents })
	posts: Documents<Post>;

	constructor() {
		super();
		this.posts = new Documents();
	}

	async open(): Promise<void> {
		await this.posts.open({
			type: Post,
			index: { key: "id" },
			canAppend: async (entry) => {
				await entry.verifySignatures();
				const payload = await entry.getPayloadValue();
                console.log('GOT PAYLOAD')
				if (payload instanceof PutOperation) {
					const post: Post = payload.getValue(
						this.posts.index.valueEncoding
					);
					console.log('PUT POST', post)
					return true;
				} else if (payload instanceof DeleteOperation) {
					return false;
				}
				return true
			}
		});
	}
}

const client1 = await Peerbit.create({directory: "./peerbittest/client1"});
const client2 = await Peerbit.create({directory: "./peerbittest/client2"});
const client3 = await Peerbit.create({directory: "./peerbittest/client3"});

const store = await client1.open(new PostsDB());

const post = new Post('ID1', "hello world")

await store.posts.put(post, {
	encryption: {
        keypair: await X25519Keypair.create(),
        reciever: {
            // Who can read the log entry metadata (e.g. timestamps)
            metadata: [
				// client1.identity.publicKey,
				// client2.identity.publicKey,
				// client3.identity.publicKey
            ],

            // Who can read the references of the entry (next pointers)
            next: [
				// client1.identity.publicKey,
				// client2.identity.publicKey,
				// client3.identity.publicKey
            ],

            // Who can read the message?
            payload: [
				// client1.identity.publicKey,
				// client2.identity.publicKey,
				client3.identity.publicKey,
			],

            // Who can read the signature ?
            // (In order to validate entries you need to be able to read the signature)
            signatures: [
				// client1.identity.publicKey,
				// client2.identity.publicKey,
				// client3.identity.publicKey
            ],

        },
    },
});

async function printPosts(store:any) {
    const responses: Post[] = await store.posts.index.search(
        new SearchRequest({
            query: [], // query all
        })
    );
    console.log(responses)
}


console.log('Dialing client2 with client1')
await client2.dial(client1.getMultiaddrs());

console.log('Dialing client3 with client1')
await client3.dial(client1.getMultiaddrs());

//////////////////////
const store2 = await client2.open<PostsDB>(store.address)
// await store2.waitFor(client1.peerId);
//////////////////////

//////////////////////
const store3 = await client3.open<PostsDB>(store.address)
// await store3.waitFor(client1.peerId);
//////////////////////

await sleep(5000)

console.log('Store1:')
printPosts(store)
console.log('Store2:')
printPosts(store2)
console.log('Store3:')
printPosts(store3)

await sleep(5000)

console.log("END")

The automated release is failing 🚨

🚨 The automated release from the `master` branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this 💪.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.

Cannot push to the Git repository.

semantic-release cannot push the version tag to the branch master on the remote Git repository with URL https://[secure]@github.com/dao-xyz/peerbit.

This can be caused by:

a misconfiguration of the repositoryUrl option
the repository being unavailable
or missing push permission for the user configured via the Git credentials on your CI environment

Good luck with your project ✨

Your semantic-release bot 📦🚀

If ```canRead``` returns false, nodes will timeout

Related ##126.

When canRead returns false, a response should nevertheless be returned to notify the user that they can not read.

Peer type protocol needs rework

Currently there are two types of peers

Replicators
Observers

Replicators are distinguished from Observers because the will subscribe to an additional topic that lets other peers find them by "pubsub.peers(topic)"

This solution is not beautiful. It requires two topics, and its undefined behaviour where two different network use same same topic.

TODO

Define a protocol that peers use where they "onPeerConnect" callback share there intent on subscribing to the topic ("why are you here?")

```document-store``` distributed sort

Implement distributed sorting for the document store

dao-xyz / peerbit Goto Github PK

peerbit's People

Contributors

Stargazers

Watchers

Forkers

peerbit's Issues

🚨 The automated release from the master branch failed. 🚨

Cannot push to the Git repository.

Recommend Projects

Recommend Topics

Recommend Org

🚨 The automated release from the `master` branch failed. 🚨