Git Product home page Git Product logo

peerbit's People

Contributors

allberg avatar erudition avatar github-actions[bot] avatar marcus-pousette avatar tabcat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

peerbit's Issues

Mitigate ambiguities that arise when replicators can not decrypt

Related #126, #15

Now if a Node takes on a replicator role (default) and peers put documents they can not decrypt, search results that are collected throughout the network might be incomplete because there is an incorrect assumption that replicators can decrypt every document.

Should this issue be mitigated by having a canReplicate callback? Or something else?
What other solutions exist that does not impose too much restrictions?

canAppend re-check on load (?)

Currently, load does re-check canAppend since, it is assumed that if someone had the permission to append, the commit should stay, no matter how many times you "load()" your database. However, if the canAppend depends on some revokable permission, you might want to update your local log to reflect that as soon changes occur.

  • What is the expected behaviour?
  • Does the current solution provide the expected behaviour?

Evaluate and prevent "replay attacks"

Currently no real work has been perform to test, and validate that "replay" attacks can not be done, i.e. some can send/spam an existing message multiple times to a store, or perhaps take a message from one store with no "nexts" and post it in another store

  • Write test(s) to ensure that obvious replay attacks are not possible (most likely in the ipfs-log package)
  • Implement changes necessary (if any) to prevent replay attacks

Using a fully featured encrypted filesystem

I am wondering if this project would be better using one of the existing encrypted filesystem for a private dapp's filesystem. The benefit of using one that has already been written is that it has undergone peer review and performance and security testing.

https://github.com/MatrixAI/js-encryptedfs

The "ephemeral key" generated by peerbit could be the AES-GCM key used by an encrypted filesystem. Arbitrary access, streaming, and better privacy through random fragmentation.
https://github.com/dao-xyz/peerbit/blob/master/packages/utils/crypto/src/encryption.ts#L97

It is better to use separate keys separate needs. One KDF can be used for data at rest and another for transport. The libp2p world developed the noise protocol for their encryption needs because it supports broadcasting and multicasting using a shared key.
https://www.wolfssl.com/tls-1-3-versus-noise-protocol/

Some libp2p clients are already using noise for encrypted broadcast/multicasting. But there isn't a really good encrypted filesystem for IPFS. One of the best e2e protocol is of course the Signal protocol, which is what Berty is using with orbitdb and there is a javascript port:

https://github.com/signalapp/libsignal-protocol-javascript
https://berty.tech/docs/protocol

E2EE with a public relay/replicator

E2EE encryption where peers are offline needs a middle man that can provide latest heads to peers without knowing its content. This is achieved right now by allowing this replicator to be able to decrypt the clock, signature but not the content (hence is able to build a log). Out of the box, this is not private, since the signature contains information about the sender.

Possible solutions

  1. Allow multiple senders per message. Where one sender is for ACL for the relay the other one if for the end receiver (Cons: developer experience might be offputting)
  2. Create a Document store where each document contain the latest head, and the id is the address (though problematic since the IPFS blocks still needs to be store somewhere, (1) solves this since the relay will store the full tree).
  3. ...

Trim DIDs

The DID in Peerbit is simple the PublicKey prefixed with a number indicating the elliptic curve signature algorithm. Right now it is prefixed with two additional bytes that are unnecessary

  • Remove unnecessary prefix bytes from the identity

Query protocol bloaty

  • AnySearch module can be merge into DQuery and Document and DString and be removed
  • Search results from any DQuery should return context (the address)
  • Search should allow for querying any state "created at" and "last edited at"

Merkel-CRDTs for data consistency

How does peerbit make sure that a user can add a data record to the database, and make sure no one else removes it or modifies it? If i am not mistaken peerbit and orbitdb expects users to be trusted when modifying a shared collection.

A Merkel-CRDT can be used to make it hard to deny that data was ever seen and provides some source of truth without a full blockchain:
https://research.protocol.ai/publications/merkle-crdts-merkle-dags-meet-crdts/psaras2020.pdf
https://github.com/ipfs/go-ds-crdt

DString implementation needs rework

The current implementation of DString is like a canvas. Anyone can write anywhere on the canvas by inserting string at an offset with length. The current implementation does not support efficient deletion by pruning the DAG, the operation data type itself does not out of the box support this.

Consider instead the xi-editor that uses the rope data structure

Regarding xi-editor, here is some interesting discussion/critique on the implementation that xi-editor brings forward.

Add benchmarks

Start with:

  • Add X documents locally
  • Add X documents and wait for replication.

Merge store with log

Store class is redundant, more or less all functionality can be absorbed in the Log class. This will simplify documentation work

DocumentStore put fails sometimes in test suite

On rare occasions, the test suits yield following error

Needs to be initialized before loaded

  343 |     async load() {
  344 |             if (!this.initialized) {
> 345 |                     throw new Error("Needs to be initialized before loaded");
      |                           ^
  346 |             }
  347 |
  348 |             if (this._cache.status !== "open") {

  at HeadsCache.load (packages/log/src/heads-cache.ts:345:10)
  at HeadsIndex.load (packages/log/src/heads.ts:44:41)
  at Log.load (packages/log/src/log.ts:1163:39)
  at Log.join (packages/log/src/log.ts:840:14)
  at DocumentIndex.sync [as _sync] (packages/programs/data/document/src/document-store.ts:116:14)
  at packages/programs/data/document/src/document-index.ts:449:19
      at Array.map (<anonymous>)
  at initFn (packages/programs/data/document/src/document-index.ts:446:16)
  at packages/programs/data/document/src/document-index.ts:487:27
      at async Promise.all (index 0)
  at DocumentIndex.queryDetailed (packages/programs/data/document/src/document-index.ts:491:20)
  at DocumentIndex.query (packages/programs/data/document/src/document-index.ts:514:21)

Docs

It's crazy to me that a product with as much potential as this one only has 49 stars. But there's a reason: you have no docs. People don't have time to wade through the source to find out how to get stuff done.

I think it's worth figuring out what features people would like this product for. And then documenting "how to" get those features into their app. Your stars will go to 4.9 K almost overnight! In any case -great effort so far & good luck!

Not-equal-to compare queries

Something that could be useful is adding Compare.NotEqual to FieldBigIntCompareQuery, to find all documents where the value of the given field is not equal to the specified value.

Something analogous could also be useful with FieldStringMatchQuery.

Re-evaluate RPC topics

  • Currently RPC topic addresses are decided by the document store as: log id + "/" + some string.

This was a quick fix when moving away from program addresses.

  • Evaluate a more neat approach that ensures uniqueness and simplicity in its generation

Simplify Results API for document-store

Issue #56 is a sympton of that Results API of the document-store is too complicated. The boilerplate code that shows how to extract an element from the Results object requires an explanation itself (!) https://github.com/dao-xyz/peerbit-getting-started/blob/b198598557e146a509dfc9c028e37ad101e31494/src/index.test.ts#L51

  • A solution is to simplify the object returned from index.query(...) such that it either is a single loop, where the results are already concatenated and deduplicated.

Querying should respects shards

Currently querying a db will send query request to all subscribers. This does not scale perfectly.
A better solution is only to send query messages to 1-2 peers in each shard.

`direct-block` tests fails when running in CI

For some reason (CPU reasons, or race condition) the direct-block tests fails when running in CI

One thing to try is different values of maxInboundStreams and maxOutboundStreams

Enhance MinReplicas

  • Currently the MinReplicas option is on the program level i.e. peer.open(program, {minReplicas: 123}). It would be ideal that this would be on the commit level, so that different content can be stored with different permanence.

  • Relative min replicas. One should be able to choose relative min replicas like "100%" or "50%" of the network, not just in absolutes

Evaluate 'Roles'

Currently there are two roles

  • Observer
  • Replicator

This makes sense for databases but not for other general services that have different peers that participate for certain reasons.

How can this be made future proof?

Encrypted Stream Multiplexing

There is a unique feature in libp2p - that is encrypted multiplexing with noise:
https://github.com/libp2p/specs/blob/master/noise/README.md
https://docs.libp2p.io/concepts/multiplex/overview/

Noise has been vetted, and has been around for a while. All other things being equal, the approach above is the most efficient way to do realtime updates as one peer with a good network connection (like a skype node) could multiplex the database update stream and/or a pub/sub event stream.

Signal and TLS + HTTP/2 do not do this. noise + multicast appears to be a better fit for a near-realtime peerbit swarm. If you were doing a realtime application like a video game, that still needs to be private and decentralized then I correct me if i'm wrong but I think this is currently the best solution from a performance and network traversal perspective.

Encrypt `next` of entries

Right now the next property is public. It would make sense to optionally be able to encrypt this property so that only trusted peers can access the complete tree.

Questions:

  • How would this infer with IPLD?
  • Should maxChainLength also be encrypted?
  • Should forks also be encrypted?

This issue should be taken in consideration with #15

Improve "can read" filtering of Documents

Related #126

  • Introduce a filter function that one can use to control why can search (canSearch (?))
  • Use canRead as a filter that is used before returning results.
  • Pass results in the canRead function, not just the publicKey

Ability to run tests in parallel

Currently when running all tests in parallel you will run into test errors due to side-effects from cache and IPFS network traffic. This should be fixed so we can quickly run through all the tests and be determined that the results are correct.

  • Make sure no side-effects happen when running tests in parallel
  • Make sure that yarn test works as expected and is running tests in parallel
  • (Extra) when this is working as expected add CI to run tests on all PRs and merges on GitHub

Support for browser nodes to become full replicators

Currently, browser nodes are not assumed to do any replication work. Instead there will be non-browser nodes that are subscribing to some replication topic. This assumption is bad instead

  • Make a replication topic to be of two forms e.g. "topic/observe" (for observers only) and "topic/replicate" (for observers and replicators)
  • Redefine the findLeader method to aggregate peers by topic subscribers rather than direction connections

Add simplified swarm connect utility for Peerbit nodes

Instead of connecting to the full identifier (swarm connect)

e.g.

/dns4/xyz123.peerchecker.com/tcp/4002/wss/p2p/12D3KooWQVtriWH37wD9sQBzayHbCvc2nn626maaSzKWta7XbWe8Translate

one should be able to to connect to

xyz123.peerchecker.com

solely and the utility would find out how to properly address the address

Reevaluate whether Lamport clock is necessary and how to implement physical time

Since the order of the entries in log could be determined by the "nexts". The lamport clock purpose is ambiguous. Evaluate whether it is necessary to include it at all in the log.

Credit to @tabcat and Opal for sharing this idea.

Main issue with removing the clock on Peerbit as of now is how to order unrelated documents in a document store, if they are not connected to each other in any meaningful way (like through a shared clock). How to we make sure that new entries are submitted with a truthful timestamp?

  • Find pros and cons
  • Implement solution if clock is to be removed

Make test domains ip transparent

Currently it is hard to know which server corresponds to which addresses unless you go into the server an manually check the configuration

  • Make the generated test domain more transparent (make the ip visible)

Multiple signers per commit

Good solutions to #19 #15 expect that a commit can be signed by multiple parties.

For #19 we want a trusted "clock service" to sign root commits that they have correct timestamps.

For #15 we can keep sign messages with 2 identities, 1 identity that allows the message to be stored on a replicator. 1 identity that proves that for the end receiver that you are you

Sharding capacity/resource specific

Right now sharding assume that all peers have more or less same capacity for storage (ram, disc, cpu etc). This can be optimized so that we can use powerful peers more than weak ones.

  • Make sharding alg. to respect resource capabilities of peers in a meaningful way
  • Make sure this does not become an vulnerability for peers as they when/if they share resource information

`direct-sub` performs better if `autoDial = false`

The benchmark

without autoDial

size: 1kb x 1,722 ops/sec ±1.89% (82 runs sampled)
size: 1000kb x 107 ops/sec ±2.02% (85 runs sampled)

with autodial

size: 1kb x 1,396 ops/sec ±2.86% (81 runs sampled)
size: 1000kb x 57.36 ops/sec ±3.13% (65 runs sampled)

Expected behaviour (?). Directsub performs better just Directstream and Directblock if autoDial = true

Consider being a backend for RxDB and abstract-level

Peerbit is a backend for rxdb and could be a great abstract-level database. Writing a backend module for these platforms will generate buzz within these communities and generate interest.

https://rxdb.info/offline-first.html
https://github.com/Level/levelup (abstract-level)

rxdb uses pouchdb as an adapter:
https://rxdb.info/adapters.html
which is supported by orbit:
https://www.npmjs.com/package/pouchdb-orbit

peerbit could be a backend for pouchdb which could then be an rxdb adapter. Getting more compatibility.

Broken encryption when direct dialing

The snippet below demonstrates a scenario where encryption is not upheld. In this example, there are three clients involved. The first client creates a database and inserts a Post entry into it. The payload of the Post entry is encrypted specifically for client3. Subsequently, client2 contacts client1 and synchronizes the post. At this point, one would expect the post to be encrypted. Finally, client3 also contacts client1 and synchronizes the post.

When fetching the data, the expected behavior sometimes functions correctly, while other times it does not:

I don't know what going on here, is this an internal race-condition?

image

Running the following POC with rm peerbittest; ts-node-esm documentstorelate.ts:

import { field, variant } from "@dao-xyz/borsh";
import { Program } from "@peerbit/program";
import { Peerbit } from "peerbit";
import { DeleteOperation, Documents, Observer, PutOperation, SearchRequest } from "@peerbit/document";
import { X25519Keypair } from "@peerbit/crypto";


function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}

@variant(0) // version 0
class Post {
	@field({ type: "string" })
	id: string;

	@field({ type: "string" })
	message: string;

	constructor(id: string, message: string) {
		this.id = id;
		this.message = message;
	}
}

@variant("posts")
class PostsDB extends Program {
	@field({ type: Documents })
	posts: Documents<Post>;

	constructor() {
		super();
		this.posts = new Documents();
	}

	async open(): Promise<void> {
		await this.posts.open({
			type: Post,
			index: { key: "id" },
			canAppend: async (entry) => {
				await entry.verifySignatures();
				const payload = await entry.getPayloadValue();
                console.log('GOT PAYLOAD')
				if (payload instanceof PutOperation) {
					const post: Post = payload.getValue(
						this.posts.index.valueEncoding
					);
					console.log('PUT POST', post)
					return true;
				} else if (payload instanceof DeleteOperation) {
					return false;
				}
				return true
			}
		});
	}
}

const client1 = await Peerbit.create({directory: "./peerbittest/client1"});
const client2 = await Peerbit.create({directory: "./peerbittest/client2"});
const client3 = await Peerbit.create({directory: "./peerbittest/client3"});

const store = await client1.open(new PostsDB());

const post = new Post('ID1', "hello world")

await store.posts.put(post, {
	encryption: {
        keypair: await X25519Keypair.create(),
        reciever: {
            // Who can read the log entry metadata (e.g. timestamps)
            metadata: [
				// client1.identity.publicKey,
				// client2.identity.publicKey,
				// client3.identity.publicKey
            ],

            // Who can read the references of the entry (next pointers)
            next: [
				// client1.identity.publicKey,
				// client2.identity.publicKey,
				// client3.identity.publicKey
            ],

            // Who can read the message?
            payload: [
				// client1.identity.publicKey,
				// client2.identity.publicKey,
				client3.identity.publicKey,
			],

            // Who can read the signature ?
            // (In order to validate entries you need to be able to read the signature)
            signatures: [
				// client1.identity.publicKey,
				// client2.identity.publicKey,
				// client3.identity.publicKey
            ],

        },
    },
});

async function printPosts(store:any) {
    const responses: Post[] = await store.posts.index.search(
        new SearchRequest({
            query: [], // query all
        })
    );
    console.log(responses)
}


console.log('Dialing client2 with client1')
await client2.dial(client1.getMultiaddrs());

console.log('Dialing client3 with client1')
await client3.dial(client1.getMultiaddrs());

//////////////////////
const store2 = await client2.open<PostsDB>(store.address)
// await store2.waitFor(client1.peerId);
//////////////////////

//////////////////////
const store3 = await client3.open<PostsDB>(store.address)
// await store3.waitFor(client1.peerId);
//////////////////////

await sleep(5000)

console.log('Store1:')
printPosts(store)
console.log('Store2:')
printPosts(store2)
console.log('Store3:')
printPosts(store3)

await sleep(5000)

console.log("END")

The automated release is failing 🚨

🚨 The automated release from the master branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this 💪.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


Cannot push to the Git repository.

semantic-release cannot push the version tag to the branch master on the remote Git repository with URL https://[secure]@github.com/dao-xyz/peerbit.

This can be caused by:


Good luck with your project ✨

Your semantic-release bot 📦🚀

Peer type protocol needs rework

Currently there are two types of peers

  • Replicators
  • Observers

Replicators are distinguished from Observers because the will subscribe to an additional topic that lets other peers find them by "pubsub.peers(topic)"

This solution is not beautiful. It requires two topics, and its undefined behaviour where two different network use same same topic.

TODO

  • Define a protocol that peers use where they "onPeerConnect" callback share there intent on subscribing to the topic ("why are you here?")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.