andreev-io / little-raft Goto Github PK

View Code? Open in Web Editor NEW

406.0 9.0 25.0 249 KB

The lightest distributed consensus library. Run your own replicated state machine! ❤️

License: MIT License

Rust 100.00%

rust consensus raft distributed-systems

little-raft's People

Contributors

Stargazers

Watchers

little-raft's Issues

Create ability to adjust replica settings such as list of peer ids

Create a Storage trait

What happens when a new replica is added to the cluster? In the current design, other replicas won't learn about it unless they are restarted with the new peer_ids. But upon restart, the replicas and their state machines will lose their state unless the user builds some persistence themselves.

To make this simpler for the user, we should add functionality for the replica to preserve some permanent state. This should exposed to the Little Raft user via a trait that they can implement as they wish. The Raft paper has a lot of good info on what state needs to be preserved and how, so a good first step would be implementing that, and then adding functionality to snapshot the state of the state machine.

Deadlock when acquiring cluster lock while appending entry as follower

I am seeing a deadlock where process_append_entry_request_as_follower() is unable to grab cluster.lock() when attempting to send an append entry response.

I am not sure if the scenario matters, but I am seeing this during initial leader election where two nodes first become candidates, then the other one wins, but the other one -- after becoming a follower -- never manages to register the other one as a leader and send a response to heartbeat because it just waits indefinitely for cluster.lock() to be released.

I don't understand why but reducing the scope of state_machine.lock() seems to cure the issue:

diff --git a/little_raft/src/replica.rs b/little_raft/src/replica.rs
index c13219e..5168743 100644
--- a/little_raft/src/replica.rs
+++ b/little_raft/src/replica.rs
@@ -705,23 +705,28 @@ where
             return;
@@ -705,23 +705,28 @@ where
             return;
         }

-        let mut state_machine = self.state_machine.lock().unwrap();
-        for entry in entries {
-            // Drop local inconsistent logs.
-            if entry.index <= self.get_last_log_index()
-            && entry.term != self.get_term_at_index(entry.index).unwrap() {
-                for i in entry.index..self.log.len() {
-                    state_machine.register_transition_state(
-                        self.log[i].transition.get_id(),
-                        TransitionState::Abandoned(TransitionAbandonedReason::ConflictWithLeader)
-                    );
-                }
-                self.log.truncate(entry.index);
-            }
+        {
+            let mut state_machine = self.state_machine.lock().unwrap();
+            for entry in entries {
+                // Drop local inconsistent logs.
+                if entry.index <= self.get_last_log_index()
+                    && entry.term != self.get_term_at_index(entry.index).unwrap()
+                {
+                    for i in entry.index..self.log.len() {
+                        state_machine.register_transition_state(
+                            self.log[i].transition.get_id(),
+                            TransitionState::Abandoned(
+                                TransitionAbandonedReason::ConflictWithLeader,
+                            ),
+                        );
+                    }
+                    self.log.truncate(entry.index);
+                }

-            // Push received logs.
-            if entry.index == self.log.len() + self.index_offset {
-                self.log.push(entry);
+                // Push received logs.
+                if entry.index == self.log.len() + self.index_offset {
+                    self.log.push(entry);
+                }
             }
         }

Add capability for replicas to acknowledge actions and report status of past actions

Simulate network delay in tests

Pull request #18 fixes an issue with out-of-order or stray AppendEntries rejects, which is only trigger able when Little Raft is wired up with networked servers.

Let's improve the tests to simulate network delay and partitioning to attempt to catch bugs like these.

Make StateMachine snapshotting methods optional by moving them to separate trait with default implementation

Snapshotting in Raft is optional. However, Little Raft enforces that all users of the StateMachine trait implement get_snapshot, create_snapshot, and set_snapshot; the implementation can be a no-op if the user doesn't want to use snapshotting, but the code doesn't make that clear.

We should figure out how to move snapshot-related methods into a separate trait that has a default no-op implementation that Little Raft users can use if they choose to avoid shapshotting.

Create a crate

Extend TransitionAbandonedReason enum

Currently the enum is

pub enum TransitionAbandonedReason {
    // NotLeader transitions have been abandoned because the replica is not
    // the cluster leader.
    NotLeader,
}

However, transitions can also be dropped by a replica if at some point it gets disconnected from majority of the cluster thinking it is still a leader accepting transitions; upon reconnecting to the rest of the cluster, such a replica will drop uncommitted transitions due to a discrepancy with consensus achieved by the cluster majority.

Add rust docs

Implement log compaction

Section 7 of the Raft paper describes a much-needed optimization: log compaction. With it, the cluster will be able to snapshot its state from time to time so that replicas in the future don't have to replicate the entire transition log and so that storage space is preserved.

Implementing this in Little Raft according to the paper is a good way to make Little Raft a production-grade library. This is a good feasible challenge for people that want to build important functionality of a real distributed system following a clear specification (the Raft paper section 7, in this case).

Panic is observed mismatch_index is None

thread 'tokio-runtime-worker' panicked at 'called Option::unwrap() on a None value', /Users/penberg/.cargo/registry/src/github.com-1ecc6299db9ec823/little_raft-0.1.4/src/replica.rs:416:57
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Error: panic

Add support for cluster membership changes

This is section 6 of the Raft paper. We need to stop assuming that the cluster configuration is fixed and add support for nodes joining / leaving the cluster. Implementing this implies adding support for joint consensus.

Leader waits for heartbeat to broadcast transitions

Add extensive tests with node failures, packet loss, and leader reelection

Tests in this project serve two goals: testing functionality and offering usage examples. There's a lot of meaningful work that can be done by adding good tests to Little Raft.

Specifically, you'll want to test Little Raft functionality when replicas go down and up, when messages get lost on the network, and leaders get reelected. You might find it challenging to simulate nodes going up or down or getting connected and disconnected, but remember that in the tests the developer has control over how messages are passing between nodes -- emulating a disconnected node is as simple as not delivering any messages to it; simulating packet loss is as simple as dropping some packets randomly.

You could also add integration tests where replicas are actual separate processes communicating over the network. The possibilities are endless.

Add Abandoned type to TransitionState enum

Raft is an asynchronous protocol, so one of the challenges when using it is getting feedback on whether a particular transition has or hasn't been applied. To solve this problem, Little Raft calls the user-defined register_transition_state hook every time any transition changes its state. This way the library user can keep track of transitions as they move through the Queued -> Committed -> Applied pipeline.

However, a transition could be ignored by the replica it has been submitted to. The most likely reason for that is because the replica is not the cluster leader. Another possible reason is that the replica used to be the leader, then got disconnected from the rest of the cluster, kept accepting transitions for processing for a while and then had to drop them when connecting back to the cluster that in the meanwhile elected another, newer leader (the stale leader dropping uncommitted transitions is the desired behavior in this case).

To let the user know when a transition got dropped, we should add Abandoned state to the TransitionState enum. We could also have that Abandoned state wrap around another type so that we could signal the particular reason why a transition has been abandoned -- NotLeader vs UncommittedUnsynced. Naming could be improved.

Single node cluster never elects a leader

If a cluster as a single node (no peers), that node becomes a candidate when election triggers. However, as there are no incoming vote response messages (because there are no peers), the node will never perform the logic to become a leader.

andreev-io / little-raft Goto Github PK

little-raft's People

Contributors

Stargazers

Watchers

Forkers

little-raft's Issues

Recommend Projects

Recommend Topics

Recommend Org