andreev-io / little-raft Goto Github PK
View Code? Open in Web Editor NEWThe lightest distributed consensus library. Run your own replicated state machine! ❤️
License: MIT License
The lightest distributed consensus library. Run your own replicated state machine! ❤️
License: MIT License
What happens when a new replica is added to the cluster? In the current design, other replicas won't learn about it unless they are restarted with the new peer_ids
. But upon restart, the replicas and their state machines will lose their state unless the user builds some persistence themselves.
To make this simpler for the user, we should add functionality for the replica to preserve some permanent state. This should exposed to the Little Raft user via a trait that they can implement as they wish. The Raft paper has a lot of good info on what state needs to be preserved and how, so a good first step would be implementing that, and then adding functionality to snapshot the state of the state machine.
I am seeing a deadlock where process_append_entry_request_as_follower()
is unable to grab cluster.lock()
when attempting to send an append entry response.
I am not sure if the scenario matters, but I am seeing this during initial leader election where two nodes first become candidates, then the other one wins, but the other one -- after becoming a follower -- never manages to register the other one as a leader and send a response to heartbeat because it just waits indefinitely for cluster.lock()
to be released.
I don't understand why but reducing the scope of state_machine.lock()
seems to cure the issue:
diff --git a/little_raft/src/replica.rs b/little_raft/src/replica.rs
index c13219e..5168743 100644
--- a/little_raft/src/replica.rs
+++ b/little_raft/src/replica.rs
@@ -705,23 +705,28 @@ where
return;
@@ -705,23 +705,28 @@ where
return;
}
- let mut state_machine = self.state_machine.lock().unwrap();
- for entry in entries {
- // Drop local inconsistent logs.
- if entry.index <= self.get_last_log_index()
- && entry.term != self.get_term_at_index(entry.index).unwrap() {
- for i in entry.index..self.log.len() {
- state_machine.register_transition_state(
- self.log[i].transition.get_id(),
- TransitionState::Abandoned(TransitionAbandonedReason::ConflictWithLeader)
- );
- }
- self.log.truncate(entry.index);
- }
+ {
+ let mut state_machine = self.state_machine.lock().unwrap();
+ for entry in entries {
+ // Drop local inconsistent logs.
+ if entry.index <= self.get_last_log_index()
+ && entry.term != self.get_term_at_index(entry.index).unwrap()
+ {
+ for i in entry.index..self.log.len() {
+ state_machine.register_transition_state(
+ self.log[i].transition.get_id(),
+ TransitionState::Abandoned(
+ TransitionAbandonedReason::ConflictWithLeader,
+ ),
+ );
+ }
+ self.log.truncate(entry.index);
+ }
- // Push received logs.
- if entry.index == self.log.len() + self.index_offset {
- self.log.push(entry);
+ // Push received logs.
+ if entry.index == self.log.len() + self.index_offset {
+ self.log.push(entry);
+ }
}
}
Pull request #18 fixes an issue with out-of-order or stray AppendEntries
rejects, which is only trigger able when Little Raft is wired up with networked servers.
Let's improve the tests to simulate network delay and partitioning to attempt to catch bugs like these.
Snapshotting in Raft is optional. However, Little Raft enforces that all users of the StateMachine
trait implement get_snapshot
, create_snapshot
, and set_snapshot
; the implementation can be a no-op if the user doesn't want to use snapshotting, but the code doesn't make that clear.
We should figure out how to move snapshot-related methods into a separate trait that has a default no-op implementation that Little Raft users can use if they choose to avoid shapshotting.
Currently the enum is
pub enum TransitionAbandonedReason {
// NotLeader transitions have been abandoned because the replica is not
// the cluster leader.
NotLeader,
}
However, transitions can also be dropped by a replica if at some point it gets disconnected from majority of the cluster thinking it is still a leader accepting transitions; upon reconnecting to the rest of the cluster, such a replica will drop uncommitted transitions due to a discrepancy with consensus achieved by the cluster majority.
Section 7 of the Raft paper describes a much-needed optimization: log compaction. With it, the cluster will be able to snapshot its state from time to time so that replicas in the future don't have to replicate the entire transition log and so that storage space is preserved.
Implementing this in Little Raft according to the paper is a good way to make Little Raft a production-grade library. This is a good feasible challenge for people that want to build important functionality of a real distributed system following a clear specification (the Raft paper section 7, in this case).
thread 'tokio-runtime-worker' panicked at 'called Option::unwrap()
on a None
value', /Users/penberg/.cargo/registry/src/github.com-1ecc6299db9ec823/little_raft-0.1.4/src/replica.rs:416:57
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
Error: panic
This is section 6 of the Raft paper. We need to stop assuming that the cluster configuration is fixed and add support for nodes joining / leaving the cluster. Implementing this implies adding support for joint consensus.
Tests in this project serve two goals: testing functionality and offering usage examples. There's a lot of meaningful work that can be done by adding good tests to Little Raft.
Specifically, you'll want to test Little Raft functionality when replicas go down and up, when messages get lost on the network, and leaders get reelected. You might find it challenging to simulate nodes going up or down or getting connected and disconnected, but remember that in the tests the developer has control over how messages are passing between nodes -- emulating a disconnected node is as simple as not delivering any messages to it; simulating packet loss is as simple as dropping some packets randomly.
You could also add integration tests where replicas are actual separate processes communicating over the network. The possibilities are endless.
Raft is an asynchronous protocol, so one of the challenges when using it is getting feedback on whether a particular transition has or hasn't been applied. To solve this problem, Little Raft calls the user-defined register_transition_state
hook every time any transition changes its state. This way the library user can keep track of transitions as they move through the Queued -> Committed -> Applied
pipeline.
However, a transition could be ignored by the replica it has been submitted to. The most likely reason for that is because the replica is not the cluster leader. Another possible reason is that the replica used to be the leader, then got disconnected from the rest of the cluster, kept accepting transitions for processing for a while and then had to drop them when connecting back to the cluster that in the meanwhile elected another, newer leader (the stale leader dropping uncommitted transitions is the desired behavior in this case).
To let the user know when a transition got dropped, we should add Abandoned
state to the TransitionState
enum. We could also have that Abandoned
state wrap around another type so that we could signal the particular reason why a transition has been abandoned -- NotLeader
vs UncommittedUnsynced
. Naming could be improved.
If a cluster as a single node (no peers), that node becomes a candidate when election triggers. However, as there are no incoming vote response messages (because there are no peers), the node will never perform the logic to become a leader.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.