Git Product home page Git Product logo

raft-kv-store's Introduction

RAFT based Key-Value Store with Transaction Support

SCPD-Project

Description

In this project, we present a highly available, consistent, fault tolerant distributed Key-Value Store. It adapts the RAFT consensus algorithm broadly in the system and supports concurrent transactions across shards. Each operation on the store is handled by a set of coordinators as a RAFT group.

The keys in the store are partitioned across shards and each shard maintains multiple replicas forming their own RAFT groups. The distributed transactions across shards is achieved using two-phase commit protocol with two-phase locking to guarantee atomicity and serializability.

Dependencies

docker runtime is the only dependency to build and run

Build kv container

make build

Bootstrap cluster

This drops into a client container shell to perform kv CRUDs

make cluster
 ######################### Starting Client #########################
>
>
>txn
Entering transaction status
>set universe 42
>set team 4
>end
Submitting [method:"set" key:"universe" value:42 method:"set" key:"team" value:4]
OK
>get team
Key=team, Value=4
>get universe
Key=universe, Value=42

Client commands:

  • get [key]: get value of a key from RAFT KV store
    • Examples: get class or get "distributed system"
    • If the [key] does not exist, return message Key=[key] does not exist
  • set [key] [value]: put (key, value) on RAFT KV store
    • Examples: put universe 42 or put "2020 spring class students" 100
  • del [key]: delete key from RAFT KV store
    • Examples: del class or del "distributed system"
  • txn: start a transaction (Only set and del are supported in transaction)
  • endtxn: end a transaction
    • Example:
    txn 
    set universe 42
    set team 4
    del class
    end
    
  • add [key] [value]: add value to an existing key
    • Example: add "my account" 100
  • sub [key] [value]: subtract value to an existing key
    • Example: sub "my account" 50
  • xfer [from-key] [to-key] [value]: transfer value from one key to another
    • Example: xfer bank-A bank-B 50
    • If either of [key] does not exist, return message Key=[key] does not exist
    • If [from-key] has a current value less than [value], return message Insufficient funds
  • exit: exit client from server

Performance test

To run the performance test locally:

go run metric/performance.go

To run the performance in the client container:

make test-performance

License

Copyright [2020] [Chen Chen, Varun Kulkarni, Supriya Premkumar, Renga Srinivasan]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

raft-kv-store's People

Contributors

alex8937 avatar varks avatar supriya-premkumar avatar renga86 avatar

Stargazers

Jenkins avatar  avatar  avatar

Watchers

James Cloos avatar  avatar  avatar  avatar  avatar

raft-kv-store's Issues

Transfer client cmd support

User enters transfer x y 5 on the command line.

Client code convert this command to two transactions:

txn
get x
get y
txn
set x = $prev_value_x - 5 if x == $prev_value_x
set y = $prev_value_y + 5 if x == $prev_value_y

Raft group to shard mapping

When a client sends an operation on a key, there should be a service/co-ordinator to map the key to the right shard.

Transaction NOT ABORT

The transaction in the current design will never abort. Say a transaction contains: [cmd1, cmd2, cmd3]. If it fails at cmd2, the entire transaction should be aborted ideally.

Leader redirecting is non-trivial

Leader redirecting seems to be more complicated than what we thought. Though raft.Leader can return the RAFT server address, it is different from listening address the server uses to receive PUT/DELETE/GET/TXN.

Possible solution: maintain another key value store to save the address information on each server with RAFT consensus.

Modify client to discover coordinator leader

Use static config of coordinator cluster and build a dynamic map between raftbinding port and listening port

Make it reusable so that the same logic can be used figure out leader for the shards.

Test/Correctness of the kv store

Simple Get/Put/Delete on all shards
Transaction on single shard - Abort/Commit - Simple case
Transaction across shards - Abort/Commit - Simple case
Transaction recovery -> mimic timeouts, faulty nodes etc
Concurrent transactions -> test to support multiple transactions
Verify raft correctness -> Assume hashicorp works well, verify fsm states on node failures

Removing nodes does not work properly yet.

Try

./kv -id node0 ./node0
./kv -id node1 -haddr :11001 -raddr :12001 -join :11000 ./node1

And kill any one of the two. The other one will keep restarting the leader election forever.

Crash during random testing (not sure about exact scenario)

2020-06-04T15:35:42.938Z [INFO] raft: initial configuration: index=4 servers="[{Suffrage:Voter ID:node0 Address:10.10.10.2:18000} {Suffrage:Voter ID:node1 Address:node1:18000} {Suffrage:Voter ID:node2 Address:node2:18000}]"
2020-06-04T15:35:42.939Z [INFO] raft: entering follower state: follower="Node at 10.10.10.4:18000 [Follower]" leader=
Jun 4 15:35:42.942 [main.go:94][main()]^[[36m [INFO] [main] ^[[0mraftd started successfully
2020-06-04T15:35:44.026Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-06-04T15:35:44.028Z [INFO] raft: entering candidate state: node="Node at 10.10.10.4:18000 [Candidate]" term=2075
2020-06-04T15:35:44.039Z [INFO] raft: election won: tally=2
2020-06-04T15:35:44.039Z [INFO] raft: entering leader state: leader="Node at 10.10.10.4:18000 [Leader]"
2020-06-04T15:35:44.040Z [INFO] raft: added peer, starting replication: peer=node0
2020-06-04T15:35:44.040Z [INFO] raft: added peer, starting replication: peer=node1
2020-06-04T15:35:44.041Z [INFO] raft: pipelining replication: peer="{Voter node0 10.10.10.2:18000}"
2020-06-04T15:35:44.042Z [INFO] raft: pipelining replication: peer="{Voter node1 node1:18000}"
panic: assignment to entry in nil map

goroutine 38 [running]:
github.com/raft-kv-store/coordinator.(*fsm).Apply(0xc0000943c0, 0xc0000b2f00, 0x0, 0x0)
/go/src/github.com/raft-kv-store/coordinator/fsm.go:35 +0x272
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc00009b080)
/go/pkg/mod/github.com/hashicorp/[email protected]/fsm.go:90 +0x2c1
github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc0000d0800, 0x1, 0x40)
/go/pkg/mod/github.com/hashicorp/[email protected]/fsm.go:113 +0x75
github.com/hashicorp/raft.(*Raft).runFSM(0xc000220000)
/go/pkg/mod/github.com/hashicorp/[email protected]/fsm.go:219 +0x42f
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc000220000, 0xc00009ae00)
/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:146 +0x55
created by github.com/hashicorp/raft.(*raftState).goFunc
/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:144 +0x66

Random timeout for each txn

#79 is to resolve the cohort crash and potentially lock held after any txn:

There is still one question remained. For example, assuming there are N txns wants to set x and y
at the same time, and x and y are in different shards. There could be the case that txn i grabs the lock of x but fails to get y, and txn j grabs the lock of y but fails to get x. All the rest txns fail to get both of them. In this case, all txns will be aborted and none txn will go through.

A quick fix will be to have random timeout for each txn so the txn with the txn with the longest timeout will keep trying to get the lock until other finish.

Leader not returning values after docker pause

docker pause <leader>. Coordinator returns leader of coord, but if client retries to that leader, I don't see the response back. Going to add more debugs to see if shard returns back the values.

Send requests:

Jun 4 16:39:52.637 [api.go:18][Get()]^[[36m [INFO] [coordinator] ^[[0mProcessing Get request x
2020-06-04T16:39:53.204Z [ERROR] raft: failed to appendEntries to: peer="{Voter node1 node1:18000}" error="read tcp 10.10.10.4:54972->10.10.10.3:18000: i/o timeout"
2020-06-04T16:39:53.521Z [ERROR] raft: failed to heartbeat to: peer=node1:18000 error="read tcp 10.10.10.4:54976->10.10.10.3:18000: i/o timeout"
Jun 4 16:39:54.640 [service.go:82][ServeHTTP()]^[[36m [INFO] [http] ^[[0mServing request for path: /key/x (node1 is paused and so this error makes sense)

Jun 4 16:39:54.642 [api.go:18][Get()]^[[36m [INFO] [coordinator] ^[[0mProcessing Get request x
2020-06-04T16:40:03.310Z [ERROR] raft: failed to appendEntries to: peer="{Voter node1 node1:18000}" error="read tcp 10.10.10.4:54998->10.10.10.3:18000: i/o timeout"
2020-06-04T16:40:03.662Z [ERROR] raft: failed to heartbeat to: peer=node1:18000 error="read tcp 10.10.10.4:55002->10.10.10.3:18000: i/o timeout"

Any idea when these snapshots are invoked ?

https://github.com/SCPD-Project/RAFT-KV-Store/blob/7f52932c5dd8418a1c8c6995634034d93db63397/store/fsm.go#L93

@varks I don't see these getting invoked. Stopped, restarted process, changed leaders and few others but don't see this ever executing. Not much comment in the source library. Any ideas on when these are invoked ? Based on code readthrough, this has to be executed in the background as part of FSM execution but can't find the exact scenario when these are invoked.

Context: Planning to use these for snapshot persist/recovery of data on a periodic basis.

Investigate shard crash on restore (occurs 10% of times)

2020-05-24T09:43:27.665-0700 [INFO] raft: entering leader state: leader="Node at :12000 [Leader]"
May 24 09:44:06.570 [cohort.go:76][ProcessCommands()] [INFO] [store] Processing rpc call: commands:<method:"leader" >
panic: unrecognized command: method:"leader"

goroutine 7 [running]:
github.com/RAFT-KV-STORE/store.(*fsm).Apply(0xc000020120, 0xc0002ba0b0, 0x1a41480, 0xbfaac6ebe8c8e718)
/Users/rengasudharsan.srini/Desktop/Raft_KV/RAFT-KV-Store/store/fsm.go:30 +0x219
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc0002a43f0)
/Users/rengasudharsan.srini/go/pkg/mod/github.com/hashicorp/[email protected]/fsm.go:90 +0x2c1
github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc0002e8000, 0x1, 0x40)
/Users/rengasudharsan.srini/go/pkg/mod/github.com/hashicorp/[email protected]/fsm.go:113 +0x75
github.com/hashicorp/raft.(*Raft).runFSM(0xc00013e000)
/Users/rengasudharsan.srini/go/pkg/mod/github.com/hashicorp/[email protected]/fsm.go:219 +0x42f
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc00013e000, 0xc000064ab0)
/Users/rengasudharsan.srini/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:146 +0x55
created by github.com/hashicorp/raft.(*raftState).goFunc
/Users/rengasudharsan.srini/go/pkg/mod/github.com/hashicorp/[email protected]/state.go:144 +0x66

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.