dao-xyz / peerbit Goto Github PK
View Code? Open in Web Editor NEWP2P database framework with encryption, sharding and search
Home Page: https://peerbit.org
License: Apache License 2.0
P2P database framework with encryption, sharding and search
Home Page: https://peerbit.org
License: Apache License 2.0
Now if a Node takes on a replicator role (default) and peers put documents they can not decrypt, search results that are collected throughout the network might be incomplete because there is an incorrect assumption that replicators can decrypt every document.
Should this issue be mitigated by having a canReplicate
callback? Or something else?
What other solutions exist that does not impose too much restrictions?
Currently
indexBy: string
But we want
indexBy: string[]
So you can index documents based on a subfield
Currently, load does re-check canAppend since, it is assumed that if someone had the permission to append, the commit should stay, no matter how many times you "load()" your database. However, if the canAppend depends on some revokable permission, you might want to update your local log to reflect that as soon changes occur.
Currently no real work has been perform to test, and validate that "replay" attacks can not be done, i.e. some can send/spam an existing message multiple times to a store, or perhaps take a message from one store with no "nexts" and post it in another store
I am wondering if this project would be better using one of the existing encrypted filesystem for a private dapp's filesystem. The benefit of using one that has already been written is that it has undergone peer review and performance and security testing.
https://github.com/MatrixAI/js-encryptedfs
The "ephemeral key" generated by peerbit could be the AES-GCM key used by an encrypted filesystem. Arbitrary access, streaming, and better privacy through random fragmentation.
https://github.com/dao-xyz/peerbit/blob/master/packages/utils/crypto/src/encryption.ts#L97
It is better to use separate keys separate needs. One KDF can be used for data at rest and another for transport. The libp2p world developed the noise protocol for their encryption needs because it supports broadcasting and multicasting using a shared key.
https://www.wolfssl.com/tls-1-3-versus-noise-protocol/
Some libp2p clients are already using noise for encrypted broadcast/multicasting. But there isn't a really good encrypted filesystem for IPFS. One of the best e2e protocol is of course the Signal protocol, which is what Berty is using with orbitdb and there is a javascript port:
https://github.com/signalapp/libsignal-protocol-javascript
https://berty.tech/docs/protocol
E2EE encryption where peers are offline needs a middle man that can provide latest heads to peers without knowing its content. This is achieved right now by allowing this replicator to be able to decrypt the clock, signature but not the content (hence is able to build a log). Out of the box, this is not private, since the signature contains information about the sender.
Possible solutions
The DID in Peerbit is simple the PublicKey prefixed with a number indicating the elliptic curve signature algorithm. Right now it is prefixed with two additional bytes that are unnecessary
How does peerbit make sure that a user can add a data record to the database, and make sure no one else removes it or modifies it? If i am not mistaken peerbit and orbitdb expects users to be trusted when modifying a shared collection.
A Merkel-CRDT can be used to make it hard to deny that data was ever seen and provides some source of truth without a full blockchain:
https://research.protocol.ai/publications/merkle-crdts-merkle-dags-meet-crdts/psaras2020.pdf
https://github.com/ipfs/go-ds-crdt
The current implementation of DString is like a canvas. Anyone can write anywhere on the canvas by inserting string at an offset with length. The current implementation does not support efficient deletion by pruning the DAG, the operation data type itself does not out of the box support this.
Consider instead the xi-editor that uses the rope data structure
Regarding xi-editor, here is some interesting discussion/critique on the implementation that xi-editor brings forward.
Following test fails in ci if we do
await session.peers[X].services.directsream.stop();
instead of
await session.peers[X].stop();
but succeeds when running locally.
Expected
stop
calls to the protocol handler should be treated as a disconnection eventBlockers/Questions
Start with:
Store
class is redundant, more or less all functionality can be absorbed in the Log
class. This will simplify documentation work
For all code that might be running in the browser
On rare occasions, the test suits yield following error
Needs to be initialized before loaded
343 | async load() {
344 | if (!this.initialized) {
> 345 | throw new Error("Needs to be initialized before loaded");
| ^
346 | }
347 |
348 | if (this._cache.status !== "open") {
at HeadsCache.load (packages/log/src/heads-cache.ts:345:10)
at HeadsIndex.load (packages/log/src/heads.ts:44:41)
at Log.load (packages/log/src/log.ts:1163:39)
at Log.join (packages/log/src/log.ts:840:14)
at DocumentIndex.sync [as _sync] (packages/programs/data/document/src/document-store.ts:116:14)
at packages/programs/data/document/src/document-index.ts:449:19
at Array.map (<anonymous>)
at initFn (packages/programs/data/document/src/document-index.ts:446:16)
at packages/programs/data/document/src/document-index.ts:487:27
at async Promise.all (index 0)
at DocumentIndex.queryDetailed (packages/programs/data/document/src/document-index.ts:491:20)
at DocumentIndex.query (packages/programs/data/document/src/document-index.ts:514:21)
Currently sorted and paginated/limited search results are not supported, but it is an important features where search results could yield thousands of results.
It's crazy to me that a product with as much potential as this one only has 49 stars. But there's a reason: you have no docs. People don't have time to wade through the source to find out how to get stuff done.
I think it's worth figuring out what features people would like this product for. And then documenting "how to" get those features into their app. Your stars will go to 4.9 K almost overnight! In any case -great effort so far & good luck!
Something that could be useful is adding Compare.NotEqual
to FieldBigIntCompareQuery,
to find all documents where the value of the given field is not equal to the specified value.
Something analogous could also be useful with FieldStringMatchQuery
.
canWrite
validation makes more sense, if we have a canRead
callback for document stores.
This was a quick fix when moving away from program addresses.
Issue #56 is a sympton of that Results API of the document-store is too complicated. The boilerplate code that shows how to extract an element from the Results object requires an explanation itself (!) https://github.com/dao-xyz/peerbit-getting-started/blob/b198598557e146a509dfc9c028e37ad101e31494/src/index.test.ts#L51
For performance reasons it does not make sense to store hashes as string on Entries, but to store as Uint8arrays instead. With this solution we can more quickly encode and decode data from and to Entry objects
Currently querying a db will send query request to all subscribers. This does not scale perfectly.
A better solution is only to send query messages to 1-2 peers in each shard.
For some reason (CPU reasons, or race condition) the direct-block tests fails when running in CI
One thing to try is different values of maxInboundStreams and maxOutboundStreams
Currently the MinReplicas option is on the program level i.e. peer.open(program, {minReplicas: 123})
. It would be ideal that this would be on the commit level, so that different content can be stored with different permanence.
Relative min replicas. One should be able to choose relative min replicas like "100%" or "50%" of the network, not just in absolutes
It would be great if Peerbit supported query operators such as those used in MongoDB, such as:
{$in: ['abc', 'def']}
to find documents, for example, with field values that match any of those in the array.
Here's a list of common ones:
https://www.mongodb.com/docs/manual/reference/operator/query/
(some of these already exist for FieldBigIntCompareQuery)
Currently there are two roles
This makes sense for databases but not for other general services that have different peers that participate for certain reasons.
How can this be made future proof?
There is a unique feature in libp2p - that is encrypted multiplexing with noise:
https://github.com/libp2p/specs/blob/master/noise/README.md
https://docs.libp2p.io/concepts/multiplex/overview/
Noise has been vetted, and has been around for a while. All other things being equal, the approach above is the most efficient way to do realtime updates as one peer with a good network connection (like a skype node) could multiplex the database update stream and/or a pub/sub event stream.
Signal and TLS + HTTP/2 do not do this. noise + multicast appears to be a better fit for a near-realtime peerbit swarm. If you were doing a realtime application like a video game, that still needs to be private and decentralized then I correct me if i'm wrong but I think this is currently the best solution from a performance and network traversal perspective.
Right now the next property is public. It would make sense to optionally be able to encrypt this property so that only trusted peers can access the complete tree.
Questions:
maxChainLength
also be encrypted?forks
also be encrypted?This issue should be taken in consideration with #15
Related #126
canSearch
(?))canRead
function, not just the publicKeyCurrently when running all tests in parallel you will run into test errors due to side-effects from cache and IPFS network traffic. This should be fixed so we can quickly run through all the tests and be determined that the results are correct.
yarn test
works as expected and is running tests in parallelRelated #126
canRead
be a mandatory fn to provide on open to Documents to mitigate leaks?Currently, browser nodes are not assumed to do any replication work. Instead there will be non-browser nodes that are subscribing to some replication topic. This assumption is bad instead
Instead of connecting to the full identifier (swarm connect)
e.g.
/dns4/xyz123.peerchecker.com/tcp/4002/wss/p2p/12D3KooWQVtriWH37wD9sQBzayHbCvc2nn626maaSzKWta7XbWe8Translate
one should be able to to connect to
xyz123.peerchecker.com
solely and the utility would find out how to properly address the address
Can we get this code into a repo to show that peerbit works easily with vue and react. Maybe publish bindings to npm to promote adoption?
samples that can be featured:
https://gist.github.com/djmaze/9e99382f6ad364f0d77830f826c01b55
https://gist.github.com/denzuko/cdf88e39f53d609ee8393fd0296f6273
It will make it easier to have libs like this:
https://www.npmjs.com/package/define-orbit
https://www.npmjs.com/package/react-orbitdb
https://github.com/DeFUCC/gun-vue
Since the order of the entries in log could be determined by the "nexts". The lamport clock
purpose is ambiguous. Evaluate whether it is necessary to include it at all in the log.
Credit to @tabcat and Opal for sharing this idea.
Main issue with removing the clock on Peerbit as of now is how to order unrelated documents in a document store, if they are not connected to each other in any meaningful way (like through a shared clock). How to we make sure that new entries are submitted with a truthful timestamp?
Allow users using the CLI to set the log level and output path of the Pino log
Currently it is hard to know which server corresponds to which addresses unless you go into the server an manually check the configuration
Good solutions to #19 #15 expect that a commit can be signed by multiple parties.
For #19 we want a trusted "clock service" to sign root commits that they have correct timestamps.
For #15 we can keep sign messages with 2 identities, 1 identity that allows the message to be stored on a replicator. 1 identity that proves that for the end receiver that you are you
Right now sharding assume that all peers have more or less same capacity for storage (ram, disc, cpu etc). This can be optimized so that we can use powerful peers more than weak ones.
The benchmark
without autoDial
size: 1kb x 1,722 ops/sec ±1.89% (82 runs sampled)
size: 1000kb x 107 ops/sec ±2.02% (85 runs sampled)
with autodial
size: 1kb x 1,396 ops/sec ±2.86% (81 runs sampled)
size: 1000kb x 57.36 ops/sec ±3.13% (65 runs sampled)
Expected behaviour (?). Directsub performs better just Directstream and Directblock if autoDial = true
Peerbit is a backend for rxdb and could be a great abstract-level database. Writing a backend module for these platforms will generate buzz within these communities and generate interest.
https://rxdb.info/offline-first.html
https://github.com/Level/levelup (abstract-level)
rxdb uses pouchdb as an adapter:
https://rxdb.info/adapters.html
which is supported by orbit:
https://www.npmjs.com/package/pouchdb-orbit
peerbit could be a backend for pouchdb which could then be an rxdb adapter. Getting more compatibility.
RPC
is started and terminated immediatelyThe snippet below demonstrates a scenario where encryption is not upheld. In this example, there are three clients involved. The first client creates a database and inserts a Post entry into it. The payload of the Post entry is encrypted specifically for client3
. Subsequently, client2
contacts client1
and synchronizes the post. At this point, one would expect the post to be encrypted. Finally, client3
also contacts client1
and synchronizes the post.
When fetching the data, the expected behavior sometimes functions correctly, while other times it does not:
I don't know what going on here, is this an internal race-condition?
Running the following POC with rm peerbittest; ts-node-esm documentstorelate.ts
:
import { field, variant } from "@dao-xyz/borsh";
import { Program } from "@peerbit/program";
import { Peerbit } from "peerbit";
import { DeleteOperation, Documents, Observer, PutOperation, SearchRequest } from "@peerbit/document";
import { X25519Keypair } from "@peerbit/crypto";
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
@variant(0) // version 0
class Post {
@field({ type: "string" })
id: string;
@field({ type: "string" })
message: string;
constructor(id: string, message: string) {
this.id = id;
this.message = message;
}
}
@variant("posts")
class PostsDB extends Program {
@field({ type: Documents })
posts: Documents<Post>;
constructor() {
super();
this.posts = new Documents();
}
async open(): Promise<void> {
await this.posts.open({
type: Post,
index: { key: "id" },
canAppend: async (entry) => {
await entry.verifySignatures();
const payload = await entry.getPayloadValue();
console.log('GOT PAYLOAD')
if (payload instanceof PutOperation) {
const post: Post = payload.getValue(
this.posts.index.valueEncoding
);
console.log('PUT POST', post)
return true;
} else if (payload instanceof DeleteOperation) {
return false;
}
return true
}
});
}
}
const client1 = await Peerbit.create({directory: "./peerbittest/client1"});
const client2 = await Peerbit.create({directory: "./peerbittest/client2"});
const client3 = await Peerbit.create({directory: "./peerbittest/client3"});
const store = await client1.open(new PostsDB());
const post = new Post('ID1', "hello world")
await store.posts.put(post, {
encryption: {
keypair: await X25519Keypair.create(),
reciever: {
// Who can read the log entry metadata (e.g. timestamps)
metadata: [
// client1.identity.publicKey,
// client2.identity.publicKey,
// client3.identity.publicKey
],
// Who can read the references of the entry (next pointers)
next: [
// client1.identity.publicKey,
// client2.identity.publicKey,
// client3.identity.publicKey
],
// Who can read the message?
payload: [
// client1.identity.publicKey,
// client2.identity.publicKey,
client3.identity.publicKey,
],
// Who can read the signature ?
// (In order to validate entries you need to be able to read the signature)
signatures: [
// client1.identity.publicKey,
// client2.identity.publicKey,
// client3.identity.publicKey
],
},
},
});
async function printPosts(store:any) {
const responses: Post[] = await store.posts.index.search(
new SearchRequest({
query: [], // query all
})
);
console.log(responses)
}
console.log('Dialing client2 with client1')
await client2.dial(client1.getMultiaddrs());
console.log('Dialing client3 with client1')
await client3.dial(client1.getMultiaddrs());
//////////////////////
const store2 = await client2.open<PostsDB>(store.address)
// await store2.waitFor(client1.peerId);
//////////////////////
//////////////////////
const store3 = await client3.open<PostsDB>(store.address)
// await store3.waitFor(client1.peerId);
//////////////////////
await sleep(5000)
console.log('Store1:')
printPosts(store)
console.log('Store2:')
printPosts(store2)
console.log('Store3:')
printPosts(store3)
await sleep(5000)
console.log("END")
master
branch failed. 🚨I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.
You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this 💪.
Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.
Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master
branch. You can also manually restart the failed CI job that runs semantic-release.
If you are not sure how to resolve this, here are some links that can help you:
If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.
semantic-release cannot push the version tag to the branch master
on the remote Git repository with URL https://[secure]@github.com/dao-xyz/peerbit
.
This can be caused by:
Good luck with your project ✨
Your semantic-release bot 📦🚀
Related ##126.
When canRead
returns false, a response should nevertheless be returned to notify the user that they can not read.
Currently there are two types of peers
Replicators are distinguished from Observers because the will subscribe to an additional topic that lets other peers find them by "pubsub.peers(topic)"
This solution is not beautiful. It requires two topics, and its undefined behaviour where two different network use same same topic.
TODO
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.