Git Product home page Git Product logo

hotstuff's People

Contributors

aeonsw4n avatar bondeking avatar dependabot[bot] avatar hanish520 avatar johningve avatar leandernikolaus avatar meling avatar oncethor avatar ttorgersen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hotstuff's Issues

Implement a better round-robin pacemaker

The current round-robin pacemaker implementation is simple and is deterministic, but it is not very useful when faults occur. In particular, the current implementation will keep trying a faulty leader until the view height increases to the term of the next leader. We should write a new round-robin pacemaker that avoids this problem.

Doc updates needed

It would be nice to have better documentation in the main README.md and in the scripts folder and other places, where things that need advanced command lines to run things for those not immediately familiar with such tools, e.g. docker, ansible and others... this includes pointers to tutorials or documentation elsewhere, e.g. about the network emulator things.

chained hotstuff?

hi, thanks for your wonderful work. I have a question for you guys. this is a chained hotstuff implementation? thanks.

BlockChain module problems

I noticed that there are some problems with the BlockChain module:

First, PruneToHeight only removes entries from the blockAtHeight map, not the blocks map, so blocks are never entirely removed.

Second, the Get method requests a block from any replica. If one replica responds, it will store the requested block in the maps. This could be problematic because the entry in blockAtHeight for height h might be overwritten by a malicious replica. This could for example be done by sending an invalid vote referencing a block that the malicious replica has created. The block will be fetched before the vote is invalidated. If this block specifies view h, it will be stored in blockAtHeight for height h. I don't think this is a threat to consensus, but it interferes with the detection of forked blocks and may cause blocks to remain unpruned.

Third, it might be nice to be able to configure how many blocks should be kept in memory.

Add reconfiguration

It should be possible to add or remove replicas from the running configuration without requiring a restart.

Add checkpoint feature in HotStuff protocol

I am planning to add checkpoint feature in the HotStuff protocol, I came up with two plans to implement this feature.

PiggyBack on the proposal

  • Upon committing every 'k' requests/views, the leader of the next view adds the checkpoint request in the proposal of the current view.
  • Replicas after verifying the proposal, updating the state from the QC of the previous view.
  • if the proposal contains the checkpoint request then verify the checkpoint request and perform a "snapshot of the application".
  • After checkpointing, send the vote to the leader of the next view.
  • If a quorum certificate was created and shared in the proposal of the next view then the checkpoint procedure is marked as complete.
  • If a quorum certificate is not created from the previous view then the leader of the current view conducts the same procedure again until success.

Pros

  • No additional communication cost in the protocol for checkpointing

Cons

  • Interferes with the protocol execution
  • Implementation would be tightly coupled with the protocol.

Separate Checkpointing service/module

  • After every 'k' commit, the checkpointing service is invoked to send the checkpoint message to all other replicas.
  • Replica waits for 2f+1 responses and then saves the application snapshot.
  • This is repeated on all replicas until all of them can checkpoint, this can become a source of DOS attacks, need to limit the number of times a replica can request checkpoint service.

Pros

  • No interference with the protocol, runs as a separate service of a replica.
  • Can be switched on/off based on the requirement.

Cons

  • A additional communication cost of n*n messages.

Refactor module system into separate package to reduce coupling between modules

Right now the module system is defined in the consensus package and requires that many interfaces and structs must be defined in this package resulting in stronger coupling between the different modules than what is desirable. Ideally, the module system should not be required to know about the different module types that it can support up front.

It is not clear to me if what I'm proposing is possible... That is, I'm creating this issue without having reviewed the code in-depth, and so do not have good advice to offer at the moment. However, it might be possible to take advantage of generics in such a refactoring effort.

Perhaps we could prepare a design doc to better understand the requirements, challenges, and possible design alternatives.

Unable to download the metric logs from the nodes in the controller node.

When we run the experiment on the remote nodes with metrics enabled, the controller node cannot download these logs.

This issue is because the path used in the host structure for the "data-dir" value is malformed. An example value for this value is "tmp/hotstuff.BpLnfgDs/data", missing a starting "/". The problem is the same for "test-dir."

I modified the below line to "src, err := iago.NewPathFromAbs("/"+iago.GetStringVar(host, "data-dir"))" and it is working.

src, err := iago.NewPathFromAbs(iago.GetStringVar(host, "data-dir")) // assuming the dir variable was set earlier

I guess there is a better way to fix this.

Insecure implementation of BLS threshold signatures

Hello. I was using the program and observed some possible issues in crypto/bls12/bls12/go.

Namely, the methods CreateThresholdSignature() and VerifyThresholdSignature() actually create (and verify) an aggregate, not threshold, signature, which is insecure when partial signatures sign the same message. That is explained in Section 5.1 of the BLS paper: https://link.springer.com/content/pdf/10.1007/s00145-004-0314-9.pdf
What we actually want is described on Section 5.3 of the BLS paper.

The implemented version could be seen as a multi-signature, but still it is susceptible to the rogue-key attack, because it just adds the public keys and signatures. The correct implementation would be as in Section 3.1 of the BDN paper: https://eprint.iacr.org/2018/483.pdf

Even if one is only interested in benchmarking (and does not plan to use this in production), I am afraid the results might still be misleading, since the computations behind a threshold signature (or a correct implementation of multi-signature) are different than what the code currently does.

Possible mismatch with the algorithm

Hi Professor Hein Meling,

According to the original Hotstuff paper algorithm page 13, Algorithm 5 (Code skeleton for a Pacemaker), line 8, upon proposing a new block, the leader updates the bLeaf to the newly proposed block.

According to your implementation, you do not update the bLeaf, upon proposing. The place you update the bLeaf is here.

Can I please clarify this mismatch?

Thanks

Issues after running main.go in hotstuffclient

I was following the description in the Running the examples part. However, after I build the hotstuffclient.exe and ./ it, it gave me the error:

Failed to read public key file 'keys/r1.key.pub': open keys/r1.key.pub: no such file or directory
exit status 1

Do you have any thoughts how to fix it?

Sincerely,
Andy

Use of gRPC for one way messages

This repository uses gRPC as the messaging layer. gRPC by default uses the request-response pair architecture. However, protocols such as Hotstuff are written using Unicast abstraction.

In this implementation of Hotstuff, the gRPC request-response messages are used with google.protobuf.Empty as the return type. So to the best of my understanding, the response of each gRPC request is discarded.

While there is nothing wrong in this approach in the theoretical sense, there is a significant performance overhead; the message complexity of the implementation is 2 times the actual message complexity of Hotstuff (because of Empty response messages that are not required)

I kindly ask you to explain this design choice.

Thanks

Synchronizer problem

I have some problem in synchronizer/synchronizer.go

func (s *Synchronizer) AdvanceView(syncInfo consensus.SyncInfo) {
	v := consensus.View(0)
	timeout := false

	// check for a TC
	if tc, ok := syncInfo.TC(); ok {
		if !s.mods.Crypto().VerifyTimeoutCert(tc) {
			s.mods.Logger().Info("Timeout Certificate could not be verified!")
			return
		}
		if v > s.highTC.View() {
			s.highTC = tc
		}

		v = tc.View()
		timeout = true
	}

I think v is always 0 when compare to s.highTC.View()
shuold we put v = tc.View() before that?

Add generation of mock modules to Makefile

Mock modules are generated from GoMock.
I believe they need to be updated if the module interface is changed.
Would be great to be able to regenerate Mock modules through the make file.

Recompiling client protobuf definitions leads to errors

I deleted the client_gorums.pb.go and client.pb.go files and then tried to regenerate them with the protocol buffers compiler using the following command:

protoc -I=/Users/raphael/.go/pkg/mod/github.com/relab/[email protected]:. \
                --go_out=paths=source_relative:. \
                --gorums_out=paths=source_relative:. \
                client/client.proto 

Unfortunately when executing the server I get the following errors:

generating(async): ExecCommand
# github.com/relab/hotstuff/client
client/client_gorums.pb.go:194:42: msg.metadata.MethodID undefined (type *ordering.Metadata has no field or method MethodID)
client/client_gorums.pb.go:1139:44: req.metadata.MethodID undefined (type *ordering.Metadata has no field or method MethodID)
client/client_gorums.pb.go:1323:2: undefined: ClientClient
client/client_gorums.pb.go:1328:19: undefined: NewClientClient

I've tried compiling with different versions of gorums as well (v0.2.2 and v0.2.1) but same result. It's weird because I didn't have those problems a couple weeks ago when I did the same.

The generated client_gorums.pb.go file also seems to be way larger than the one in the git repo when just cloning (~200 lines vs ~1400 lines).

several problems in hotstuff run

Hi, I recently started to learn about Hotstuff related content, and I tried to run your code. But I ran into a few problems running the code. I will show them in the pictures. I didn't make any changes when I ran the code, I just did the following steps:
1 make
2 ./hotstuff run
But the logs seem to indicate something unusual. Do these logs represent any problems in my operation? Or are these logs normal?
Thank you very much.
图片1
图片2
图片3
图片4

Add a recovery mechanism

Related to #5 and #3.

A recovery mechanism is needed to allow a replica to recover if it has been unable to receive messages.

Command semantics verification as part of block verification?

I'm not quite sure if this is the correct place to ask but I was wondering where in consensus.go would be the best place to verify the commands in a block. As I understand it, the OnReceiveProposal() function just checks the block is placed correctly in the chain but it doesn't look inside the blocks to check the semantics of the commands. My idea was to just place the check at the beginning of that function (as seen below). Is this the right way/place to perform such checks?

// OnReceiveProposal handles a replica's response to the Proposal from the leader
func (hs *HotStuffCore) OnReceiveProposal(block *data.Block) (*data.PartialCert, error) {
	logger.Println("OnReceiveProposal:", block)
	hs.Blocks.Put(block)

	// checks if all commands in a block conform to a specific rule set
	if hs.invalidCommandsInBlock(block) {
		fmt.Println("Proposed block contains invalid command")
		return nil, fmt.Errorf("Invalid command")
	}

Thanks!

Add package documentation

The revive linter (which runs as part of GitHub's workflow) complains because documentation is missing, resulting in red (non-passing) tests. Some of them may only need to be a single sentence.

% revive ./...
cmd/hotstuff/main.go:1:1: should have a package comment
metrics/types/event.go:1:1: should have a package comment
leaderrotation/carousel.go:1:1: should have a package comment
hotstuff.go:1:1: should have a package comment
internal/proto/hotstuffpb/convert.go:1:1: should have a package comment
synchronizer/synchronizer.go:1:1: should have a package comment
cmd/plot/main.go:1:1: should have a package comment
internal/cli/root.go:1:1: should have a package comment
twins/generator.go:1:1: should have a package comment

Update go version and dependencies

The go.mod file contains the long-ago deprecated protobuf version:

github.com/golang/protobuf v1.5.2

This dependency is used in client and replica packages. The fix is trivial; the import:

"github.com/golang/protobuf/ptypes/empty"

should be replaced with:

empty "google.golang.org/protobuf/types/known/emptypb"

Currently, we are using Go 1.16. I want to convert from slice to [32]byte array without using copy; this is supported in Go 1.17.

Finally, my installed protoc-gen-gorums is newer than the one used in the existing pb.go files, preventing me from generating compatible files. Hence, I propose to upgrade both the plugin and relab/gorums to latest release version. The necessary changes should be trivial.

@Raytar @hanish520 Any reason to hold off from upgrading these dependencies?

[Question]Where is vote method for replica

Hi guys,
After looked at the code, I only saw the self vote however I didn't see the code other replica send vote msg to next view leader send(getLeader(), voteMsgu(generic, bnew , ⊥)) which refer to the paper. Am I missing anything?

Incompatibility with new version of gonum

Discussed in #38

there are some question from the path :/mttrics/plotting/helper at line 20,Goland prompted that Assignment count mismatch: 2 = 1. I go into this method (plot.New())and find that only one return value,but two var to accept.
image

several problems in hotstuff run

Discussed in #38

Originally posted by SenjiMuramasa February 14, 2022
Hi, I recently started to learn about Hotstuff related content, and I tried to run your code. But i have a question.
only two the file are red .one of them is :/internal/proto/clientpb/client_gorums.pb.go in 29 rows.Goland prompts me that Unresolved type 'Configuration' and Unresolved reference 'Size' in 36 rows. I had downloaded github.com/relab/gorums v0.7.0 ,but i dont know what shoud i do.
![Uploading (R9G21UX4LV9B~IN]7A{MXW.png…]()

[Question]Dummy leaf node

Can dummy node use high qc instead of nil qc?

func (p *RoundRobin) startNewViewTimeout(stopContext context.Context) {
	for {
		select {
		case <-p.resetTimer:
		case <-stopContext.Done():
			return
		case <-time.After(p.timeout):
			// add a dummy block to the tree representing this round which failed
			logger.Println("NewViewTimeout triggered")
			p.SetLeaf(hotstuff.CreateLeaf(p.GetLeaf(), nil, nil, p.GetHeight()+1))
			p.SendNewView(p.getLeader(p.GetHeight() + 1))
		}
	}
}

How can I run this code among 4 machines in WAN environment?

It seems that this code can only runs in a local machine, I am not sure that if there is a way to run this in a WAN environment.
I have 4 machines with global addresses, but I do not know how to run them in those machines.
Any one help me ?

Sign hash instead of block

In quorum certificates, we sign the block (encoded). Why do we not just sign the hash of the block?
By just signing the hash, we could verify certificates without having to retrieve the block.
This makes it easier to parallelize such verifications.

func (c crypto) CreatePartialCert(block *hotstuff.Block) (cert hotstuff.PartialCert, err error) {
	sig, err := c.Sign(block.ToBytes())
	if err != nil {
		return hotstuff.PartialCert{}, err
	}
	return hotstuff.NewPartialCert(sig, block.Hash()), nil
}

From here:

func (c crypto) CreatePartialCert(block *hotstuff.Block) (cert hotstuff.PartialCert, err error) {

Branches to PRs: merge or discard

Would be nice to get the various branches merged or discarded. Started to review some of the branches, and saw a few typos in critical places, so would be nice to go via PRs, so that these can be fixed... Or I guess I could just fix, commit, and push directly to the branches, if you are ok with that. But it would be nice to know first, which branches are worth keeping.

logging the deployment information

While conducting the experiments, I found that it would be nice to print the deployment data in the output directory. The deployment (mapping of clients and replicas to nodes) information may be useful to identify if there are any slow nodes in the configuration.

liveness problem when encountering view changes

Dear authors,

Hi, I think this reposistory is an excellent work. However, I encounter a liveness problem using the round-robin mode.

As the default setting, I run a 4 nodes cluster and a client. Normally, the CPU of the hotstuff server is about 200% per insistance.
But after running the hotstuff for sometime, the CPU of the hotstuff server drops to lower than 10%, and the hotstuff servers do not serve any more.

I think this is because the view-change is happenning, and some blocks could not get enough certifications and the blockchain deviates from one server to another.

Setting the view-timeout to a lower value(e.g. 10ms) increases the probability of encoutering this liveness problem.

This disturbs me a lot. and I am looking forward to hearing from you. Thanks!

What is the logic behind getting the first id in the ID map, when creating a partial certificate?

I find the following code segment in the types.go

signature.Participants().RangeWhile(func(i ID) bool {

The following is how I interpret the code.

Go map iteration is non-deterministic, hence a random ID will be chosen as the first ID inside the RangeWhile. Then, irrespective of the ID, signer is set to that random ID, and return false (which breaks the iteration)

In my understanding, what should be done is: set the signer part of the PartialCert to self. Hence i propose the following modification.


// NewPartialCert returns a new partial certificate.

func NewPartialCert(id ID, signature QuorumSignature, blockHash Hash) PartialCert {

	return PartialCert{id, signature, blockHash}

}


Am I missing something here?

Thanks

Improve performance by sending command hashes in proposals

With higher payload sizes, we should be able to reduce the size of proposals and reduce the time consumed by creating partial signatures if we send a hash of each command instead of the whole command. This optimization only works for systems where clients broadcast commands to all replicas, such that all replicas can locate a command in memory given its hash. Thus, it might be best if we make this optimization optional in order to support systems where this is not the case. We would also need a way for a replica to fetch commands from the other replicas if it is missing some of the commands.

Eliminate unnecessary timeouts throughout the project

There are a few different timeouts used in the project:

  1. A dial timeout for the initial connection setup
  2. The nextView timeout
  3. An RPC timeout
  4. A "wait" timeout for expectNodeFor

2 is necessary and 1 is likely not avoidable. 4 is currently hardcoded to nextViewTimeout/2 in the server. I am thinking that it might be possible to eliminate 3 entirely by relying on the nextView timeout instead. We could add a context parameter to the Propose and DoPropose method that can be canceled by the pacemaker when a timeout happens. We should also consider if it is necessary to have 4 in expectNodeFor, or if expectNodeFor can be restricted to waiting for a single proposal only before returning false.

Package reorganization

Too much depends on the consensus package. This makes it very difficult to refactor anything; it is too easy to get import cycles with the current design. We need to rethink the package structure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.