Git Product home page Git Product logo

Comments (6)

marcus-pousette avatar marcus-pousette commented on August 11, 2024 1

Another thing I did not think about before is that the lamport clock can actually help to easier understand read/query/search responses if you aggregate results from multiple nodes (lamport clocks does provide value when you look at both write and read, not only write)

from peerbit.

tabcat avatar tabcat commented on August 11, 2024

Lamport clocks don't add much value IMO. As you said, using nexts they don't provide anymore information. They can also be played with a lot; which shouldn't actually affect ordering that much funnily enough since ordering by nexts always come first. Meaning, whether the clocks are being used correctly or after being poisoned, it should make no difference as long as there are are not completely disjoint and even then the logical clock can be emulated with the merkle one.

I could see in some situations where the logical clock may be more efficient to depend on but I don't think the trade off is worth. It complicates things and opens up area for shenanigans without offering much and less reliable ordering than the merkle clock.

Main issue with removing the clock on Peerbit as of now is how to order unrelated documents in a document store, if they are not connected to each other in any meaningful way (like through a shared clock). How to we make sure that new entries are submitted with a truthful timestamp?

It sounds like in this situation the database is sharded and updates do not link to updates in other shards (makes sense) but does use a share clock. Shards in peerbit sound dynamic so deterministic sharding/partitioning databases doesnt work which is probably why the ordering is needed cross-shard.

It might be worth investigating randomness beacons like drand for either referencing to directly from shard entries or creating a separate beacon log with ordered drand randomness and referencing to that from shards. I havent look at drand too much, and maybe its not needed, but it might provide the ordered randomness needed.

Using a shared logical clock for now doesnt seem that bad either.

from peerbit.

marcus-pousette avatar marcus-pousette commented on August 11, 2024

I agree with you. Good reasoning.

It sounds like in this situation the database is sharded and updates do not link to updates in other shards (makes sense) but does use a share clock. Shards in peerbit sound dynamic so deterministic sharding/partitioning databases doesnt work which is probably why the ordering is needed cross-shard.

Yes exactly.. In a sharded database, the clock is even more ambiguous since the clock is defined (ish) by the length of the logs, and in a sharded setup, a database is represented with multiple logs. If you have multiple shards locally, you could sync them yourself, but that does not help with the synchronization across peers. (I had a "exchange clock" routine implemented before so you could have synchronization cross peers, but it did not play out nicely because you are introducing new problems, like what happens if not all peers are connected, what happens if someone provides the wrong time, etc. So I removed it some time ago.

It might be worth investigating randomness beacons like drand for...

Smart! Will check it out.

My gut feeling as now of now is to remove it from the log entry and rather have it on the database level. For example, if you have a social media Post database, you would have a timestamp perhaps on each post. Then, it would be the access controllers job to approve or reject new posts depending on whether new posts have reasonable timestamps (based on previously observed posts and the current time). This would at least let you order things in time somewhat "okay" for real world applications. In practice, no matter if lamport clock would exist or not, you will still need to have a real world clock on posts anyway because a lamport clock can not easily be translated into a normal date, hence, this kind of timestamp checking has to be implemented no matter what.

(A related note, I previously removed "refs" from the log entry and shared "refs" on the "exchange heads" pubsub level instead, so you are not paying for that storage for that kind of links and keeping the door open for more dynamic behaviour and changes. The reduced complexity from that change makes me keen/interested in the clock problem)

from peerbit.

tabcat avatar tabcat commented on August 11, 2024

it would be the access controllers job to approve or reject new posts depending on whether new posts have reasonable timestamps (based on previously observed posts and the current time).

I don't usually like the idea of an update being valid or invalid based on predecessors because it is not easy to verify its correctness. It's a similar problem to verifying that the logical clock has been set correctly which requires traversal all the way to the root which isn't possible if entries have been lost.

I wonder what the best option is for handling an incorrectly set timestamp and it's successors. You can either add the successors until finding the entry with the invalid timestamp and then removing them; or only adding entries after verifying the correctness of their clocks.

from peerbit.

marcus-pousette avatar marcus-pousette commented on August 11, 2024

I agree with you. You have valid points and have changed my mind. Since yesterday I have been deep diving into the clock issue and actually found that a lot of distributed DBs use a HLC (hybrid logical clock) that keeps track of both physical and logical time in the same clock instance. The implementation seems to be fairly easy. Here is a good article about the subject

I am not sure if you have to traverse to the root to verify the clock for every new entry, I mean since its append only, you only have to verify an entry once. So, if you have previously verified three entries, linked in a chain: 1 <- 2 <- 3, and then entry 4 comes in with next to entry 3. You only have to verify the clock of entry 4 in relation to 3 (and not 3 to 2 again since its has already been verified before)?

However, there is an additional complexity layer when you can not trust peers (that is not present for traditional distributed DBs). And this is mainly a problem for the first entry (I think), of any log since it is not possible to verify it against some previous entries.

Last night I played around with the idea is that you would do a simple Peerbit clock program that is hosted by a cluster of nodes (whom are somewhat blindly trusted) just like a NTP. And the only thing they do is signing clock/time messages as a service so that entries without any nexts can get a somewhat ok physical time in the HLC.
Another solution could perhaps to be to have an ACL for the first entry that compares local time and allows some error margin..

from peerbit.

marcus-pousette avatar marcus-pousette commented on August 11, 2024

Physical time is now implemented with Hybrid logical clock on commits and you can use a ClockService to verify that root (or any) commit have timestamp that is accurate to an error threshold.

For the document store you can now search for documents that have been created or edited on a particular date

    it("modified between", async () => {
                let response: Results<Document> = undefined as any;

                const allDocs = writeStore.docs.store.oplog.values;
                await stores[1].docs.index.query(
                    new DocumentQueryRequest({
                        queries: [
                            new ModifiedAtQuery({
                                modified: [
                                    new U64Compare({
                                        compare: Compare.GreaterOrEqual,
                                        value: allDocs[1].metadata.clock
                                            .timestamp.wallTime,
                                    }),
                                    new U64Compare({
                                        compare: Compare.Less,
                                        value: allDocs[2].metadata.clock
                                            .timestamp.wallTime,
                                    }),
                                ],
                            }),
                        ],
                    }),
                    (r: Results<Document>) => {
                        response = r;
                    },
                    { waitForAmount: 1 }
                );
                expect(
                    response.results.map((x) => x.context.head)
                ).toContainAllValues([allDocs[1].hash]);
            });

Reference

from peerbit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.