Git Product home page Git Product logo

Comments (13)

Geod24 avatar Geod24 commented on August 19, 2024

Depends on #209

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

- [ ] Validators are distributed among quorums in an unpredictable manner;

I think this sentence is not entirely correct. It should be predictable given another number X, which itself is unpredictable (and based on the preimage). So the algorithm to select the quorum is predictable, but its inputs are not predictable.

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

Quorums get regularly shuffled (how often?);

According to the yellow paper: Quorum balancing events happens once every 6 rounds.. So, every hour.

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

I would actually add that past performance of a validator should be taken into account when assigning quorums. For example:

The top X (X=5?) stakers are placed in different quorums and used as 'seed';

This could be gamed by a new player that spins up 5 new nodes and stakes more than the next 5 highest staked nodes. If we could incorporate the number N, where N is the number of times a node's public key participated in a finalized (accepted) vote for a block, then we could take into account the past performance of a node too. It should be a sort of "weight" to the algorithm.

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

Some research notes:

This post describes the new quorum config in Stellar, and also has a general overview of how quorums in Stellar work: https://medium.com/stellar-developers-blog/why-quorums-matter-and-how-stellar-approaches-them-547336c1275

The two PRs implementing semi-automatic quorum configuration and an "intersection-checker"
stellar/stellar-core#2125
stellar/stellar-core#2127

Here's a paper linked to from the blog post: https://arxiv.org/pdf/1902.06493.pdf. From the paper:

The Disjoint Quorums Problem answers the question whether a given instance of Federated
Byzantine Agreement System contains two quorums that have no nodes
in common. We show that this problem is NP-complete.


In our case we wouldn't mark any nodes as belong to an "organization" and the "quality" of the nodes would not be hardcoded, but instead we would just use the amount of stake as the marker of quality. After all, if a node misbehaves then it will get slashed, and its staked amount becomes lower (and therefore the "quality" of the node drops).

The blog post describes a commit we're not using yet (https://github.com/stellar/stellar-core/releases/tag/v11.2.0) which was released in June 28th 2019. We're using a commit from May 20th 2019.

It might be possible to reuse / adapt these Stellar routines (Config::generateQuorumSet / Config::generateQuorumSetHelper), if they don't depend on any Stellar-specific state. But we do have to take into account the random value and the quorum re-shuffling too.


Edit: Another useful article: https://www.stellar.org/developers-blog/intuitive-stellar-consensus-protocol

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

Apparently the quorum intersection checker is so expensive, that they've implemented an interrupt mechanism to halt the worker thread while it's calculating the intersection of quorums: stellar/stellar-core#2454

In any case, I will try to use the code in v11.2.0. Although there's still an issue because we're using a slightly older commit, and there seem to be quite a few files which were touched between 324c1bd61b0e9bada63e0d696d799421b00a7950 (commit we're using) and v11.2.0.

So as a first step, I'll see if we can update to v11.2.0 first.

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

This commit is very sad: stellar/stellar-core@f6a6567

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

It looks like the code already depended on App and Herder, so instead of removing those dependencies they just moved the checker into the Herder module and made it more stellar-dependent. It's a shame..

But then again, stellar-core was never meant to be a library. Well at least the devs are aware SCP is definitely used outside of stellar: stellar/stellar-core#2152

from agora.

Geod24 avatar Geod24 commented on August 19, 2024

Ah yeah, I looked at it a while back (as you can see).

From what I remember the dependency is limited to the config struct. If you invert the dependency, a la BanManager, it could work.

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

So far I've managed to almost make it work by removing the Config dependency. But then I ran into this:

ource/scpp/build/QuorumIntersectionCheckerImpl.o: In function `std::__detail::_Hash_code_base<stellar::PublicKey, std::pair<stellar::PublicKey const, unsigned long>, std::__detail::_Select1st, std::hash<stellar::PublicKey>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_hash_code(stellar::PublicKey const&) const':
/usr/include/c++/9/bits/hashtable_policy.h:1292: undefined reference to `std::hash<stellar::PublicKey>::operator()(stellar::PublicKey const&) const'
source/scpp/build/QuorumIntersectionCheckerImpl.o: In function `std::__detail::_Hash_code_base<stellar::PublicKey, std::pair<stellar::PublicKey const, unsigned long>, std::__detail::_Select1st, std::hash<stellar::PublicKey>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_bucket_index(std::__detail::_Hash_node<std::pair<stellar::PublicKey const, unsigned long>, false> const*, unsigned long) const':
/usr/include/c++/9/bits/hashtable_policy.h:1304: undefined reference to `std::hash<stellar::PublicKey>::operator()(stellar::PublicKey const&) const'
source/scpp/build/QuorumTracker.o: In function `std::__detail::_Hash_code_base<stellar::PublicKey, std::pair<stellar::PublicKey const, std::shared_ptr<stellar::SCPQuorumSet> >, std::__detail::_Select1st, std::hash<stellar::PublicKey>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_hash_code(stellar::PublicKey const&) const':
/usr/include/c++/9/bits/hashtable_policy.h:1292: undefined reference to `std::hash<stellar::PublicKey>::operator()(stellar::PublicKey const&) const'
source/scpp/build/QuorumTracker.o: In function `std::__detail::_Hash_code_base<stellar::PublicKey, std::pair<stellar::PublicKey const, std::shared_ptr<stellar::SCPQuorumSet> >, std::__detail::_Select1st, std::hash<stellar::PublicKey>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_bucket_index(std::__detail::_Hash_node<std::pair<stellar::PublicKey const, std::shared_ptr<stellar::SCPQuorumSet> >, false> const*, unsigned long) const':
/usr/include/c++/9/bits/hashtable_policy.h:1304: undefined reference to `std::hash<stellar::PublicKey>::operator()(stellar::PublicKey const&) const'

It looks like it needs an operator() for hashing support. And we seem to have removed / not ported the hashing code from stellar-core. It might be this one https://github.com/stellar/stellar-core/blob/f3857733a9b67da4528df59bb616ea84ba539a1a/src/crypto/ECDH.cpp#L71, but I'll have to see.

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

Ah I found it: https://github.com/bpfkorea/agora/blob/ab8b39f475a29d584f736fbdd4662edb8f2b87a3/source/scpp/src/crypto/SecretKey.h#L131

It's declared, but never defined.

Edit: yes I should be able to resolve this.

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

Status update of this feature

What's been done

  • Add ability to provide a custom genesis block.
  • #662 - Add bindings to unordered_map
  • #663 - Add bitset implementation
  • #545 - Workaround for SCPEnvelope memory handling
  • #546 - Support hashing SCPEnvelopes
  • #659 - Add missing hashing support in SCP bindings
  • #655 - Updated SCP bindings to v11.2.0 as it was required to be able to include the quorum intersection checker tool.
  • #645 - Add registerListener() API so nodes not in a node's quorum also receive SCP messages.
  • #643 - Connect to all quorum set validators. This is a safety vs liveness issue. For safety reasons a validator node will attempt to connect to all of its quorum nodes. It resolves some test-suite failures.
  • #637 - Added pretty-printing of SCPEnvelope - This improves the debugging experience a lot.
  • #629 - Reject outdated SCP messages. If a block was externalized, we do not care about nomination / balloting for blocks of that height or older.
  • #618 - Refactor network manager to later allow easier retrieval of quorum set hashes (for #621)
  • #648 - Fixed a broken code path of network discovery
  • #549 - Improved isQuorumSetSane() to make debugging easier
  • #577 - Decoupled Nominator and Ledger classes.
  • #712 - Custom genesis block. Needed in order to add Enrollments that define the preimages, and we need the preimages in order to derive the random seed for the quorum balancing algorithm.
  • #664 - Quorum intersection checker. Needed to verify the generated quorums have good intersection properties.
  • #621 - (needs a rebase) Fix missing quorum set hashes when reaching consensus. Nodes need to be able to look up hash => quorum set, otherwise they may reject messages outright.
  • #737 - Port hashes to 64-byte (support code for #702)
  • #741 - Use our own hashes instead of SCPs (for #702)
  • #889 - Rework NetworkManager to use Tasks
  • #845 - Implement Shell Quorum balancing

What still needs to be worked on

  • SCPEnvelopes are likely not propagated properly. #716 has a failing test-case, and adding proper propagation seems to resolve part of the issue.
  • Fixing SCPEnvelope propagation reveals a secondary issue with timers firing out of order. SCP panics when an outdated timer fires, leading it to throw with an error: https://github.com/bpfkorea/agora/blob/5ffc4574b99ebafc7817801ad2770ea2253cd126/source/scpp/src/scp/BallotProtocol.cpp#-L640-L642. We've had these errors sporadically before, but with the propagation fix they happen frequently. I think it's related to how we implemented timers. We spawn fibers and then call fiber.sleep, but that's not an ideal way to write timers.
  • #684 will need more work. Integration tests have been delayed by the issues above. Additionally the algorithm might need to be more fine-tuned as the integration tests are written.
  • #723 need to examine the usage of toVec() because it returns pointers to either stack-allocated or GC-allocated memory, and this is a problem for memory management across the C++ boundary.
  • Quorums need to be shuffled every N blocks, there needs to be support code that handles this case. In the current design the NetworkManager establishes connections at bootup.

PRs still in progress, but may be reviewed.

#684 - Implement quorum balancing algorithm.

from agora.

AndrejMitrovic avatar AndrejMitrovic commented on August 19, 2024

Currently the only part that is not fully implemented is:

  • Quorums get regularly shuffled (how often?);

Right now the quorums only get reshuffled when the validator set changes (new enrollment, expired enrollment). But we may want to shuffle quorums regularly, for example every N blocks (N would be defined by the protocol).

However I think this will require more preimage support. In our unittests we have tests such as:

  • Set validation cycle to 20
  • Generate genesis
  • Create 19 blocks
  • Start the nodes

The nodes won't have any preimages on startup except the commitment in the enrollment in the genesis block. If we set the "periodic shuffle" N parameter to 10, then on boot-up the nodes would generate the quorums for block height 0 and then on block height 10. But it's not possible to test this right now because the preimages are missing for height 10, so this would lead to an assertion failure.

from agora.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.