zheguang / resdb Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 1.52 MB

License: GNU General Public License v3.0

Python 91.88% Dockerfile 0.21% Shell 4.25% Thrift 3.66%

resdb's People

Contributors

Watchers

Forkers

serwarde

resdb's Issues

Concurrency

~~Virtual node -> physical nodes~~
"Preferred list"

Before, for each object, hash ring tells which node (champion im RH/ 1st ring node on CH) to go. Now, we need to extend this result of one node to a list (i.e. ordered set) of N nodes. Each operation can be executed by any one of the N nodes. Call the choice node 'coordinator'.
- One idea is just to take top-N nodes with highest score in RH, and N consecutive nodes clockwise on CH. Then sample the coordinator based on uniform or geometric distribution (say).
- Extra wrinkle presented by CH virtual nodes is that some of the N consecutive virtual nodes may be mapped to the same physical nodes. RH doesn't need virtual node.

Vector clock for data reads and writes

Use the format D([(S, ctr)...]) where D is the object, S is the coordinator node id, ctr is the counter. There is a list of such node-counter pairs.

For now, assume R and W quorum to be 1. So one concurrent example is:

App1 writes on S_1: D_1([S_1,1]).
App2 writes on S_2: D_2([S_2,1]).
App1 reads from S_2: D_2
App1 reconcile D_1 and D_2
App1 updates and writes on S_1: D_3([S_1,2],[S_2,1])

Hash Ring

Two flavors: Rendezvous Hashing and Consistent Hashing.

Given data X, find which node should store it.

Without considering failure and replication, take care of node addition and node departure.

ship data from departing node to neighbor node.
ship data from neighbor node to added node.

Unavailability

Unavailability can be caused by unexpected failure to reach a node, or node taking offline for maintenance.

Hinted handoff for temporary failures:
If a node S just temporarily went offline, replicas that are intended for S is forwarded to S'. S' is chosen from outside the preferred list of N, say the N+1st node. S' then stores a special marker for this S's replica, (S, replica). When it detects that S becomes available again, it sends all S's replicas back to S.

Detection of failures is to use heartbeat ('are you alive?') + gossip ('i just learned that node F has failed').

Another consideration for failure recovery is the efficiency. When a node fails and its data has to be redistributed to other node(s). So one way to characterize efficiency in failure recovery is the load of the receiving node(s). For example, if only one node is to receive the failed node's whole load, then that this receiving node might be overwhelmed, and the requests for objects on this receiving node may experience delay. However, if the load of the failed node is distributed more evenly to more nodes, then we can expect each receiving node only experience a minor delay.

In consistent hashing, Dynamo uses the mapping of a physical node to multiple virtual nodes to balance the load for such network topology changes caused by unavailability.
In rendezvous hashing, it seems such load balancing for topology changes is already baked in.

Understand Dynamo

We will understand Dynamo by running an end-to-end example through, focusing on these questions:

What types of queries are supported?
How is the distributed hash table stored?
How does conflicting writes handled?
How is node failure, permanent or temporary, handled?
How does a node join or leave the network?
How does a node communicate with each other?
What is eventual consistency?
How is resilience in Dynamo different from traditional fail-over?

Future steps in mind:

Incorporate a strongly-consistent, non-resilient relational DB instance
Extend the data model from key-value to relational, or custom data structures (e.g. queue, stack, etc., like Redis)

Background links:

Relational data model

Intro meetup

Group intro
Policy
- Time (May 1 - Sep 30, 180 hours per person, 1.8 hours per day)
- Diversity & Accessibility
- Grading (Reading, Design, Implementation, Evaluation; Peer Review on teamwork)
Project
- Goal
- Timeline & Deliverables
  - May: Reading + Design (Proposal)
  - June - July: Design + Implementation (Code)
  - August - Sep: Evaluation (Report)

Node

Hash Ring (abstract)
Communication.
- Query routing (who should execute on data X)
- Membership gossiping (sync up on hash ring)
Storage (local for now, replication later)
Failure detection (later)

A key step here is the communication layer. I suggest ZeroMQ.

Resilient Relations vs ML

In the context of Resilient DB, I can see two venues to explore:

(1) Resilient relational database
Problem here is to build a lightweight mechanism to make a standard relational database (e.g. MySQL) resilient to failures.
The idea is to run a bunch of secondaries in a p2p network, similar to Dynamo. The challenge here is to extend Dynamo to relational setting, and consider different deployment issues such as varying resources and failing probabilities.
Reading:
Dynamo: Amazon's highly available key-value store. DeCandia et al. SOSP'07.

(2) Resilient ML
Problem here is to build a tool for private-preserving federated learning that can tolerate failures such as failing devices. This tool is called Secure Aggregation. The challenge is to extend it to handle floating points (duh), and make it more efficient. Another challenge is to extend it to handle different types of aggregations, such as affine transform, counts, etc.
Reading:
Practical Secure Aggregation for Privacy-Preserving Machine Learning. Bonawitz et al., CCS'17.

Distributed data structures

Ingredients of Dynamo + First step

Consistent hashing:
The simplest explanation I found is the Voldemort Design, and source code, ConsistentRoutingStrategy
Otherwise, for the theoretically-inclined: https://dl.acm.org/doi/pdf/10.1145/258533.258660
Rendevous hashing
An alternative to consistent hashing. Wiki page has a good explanation: https://en.wikipedia.org/wiki/Rendezvous_hashing
Merkle tree
This one is not needed as a first step implementation, but it's a lingering question we had from our last Dynamo discussion.
Again wiki page has a good overview: https://en.wikipedia.org/wiki/Merkle_tree
Cryptographic hash function provides the collision resistance that Merkle tree relies on: https://en.wikipedia.org/wiki/Cryptographic_hash_function
A common crypto hash function, as you might have heard of, is SHA-256: https://en.wikipedia.org/wiki/SHA-2

Todo:

Skim over the links.
Understand how consistent hashing / rendevous hashing works, because we are going to implement and compare these two first.

Strong consistency simulation

Replication

Sync Replication

Extend the R and W quorum to > 1. Say choose R and W random subset of preferred list of N.
Coordinator waits until operation is (1) executed locally with local vector clock, and (2) replicated to the quorum verbatim, i.e. with coordinator vector clock.

Async Replication

Eventually each object gets replicated to N nodes (which N depends on the object), regardless of R or W.
Coordinator does not have to wait for async replication to return success.
Async rep can be done periodically in background, in batch.
- Merkle tree helps to quickly figure out which replicas for what key ranges and keys are out of sync (i.e. missing some should-have-been-replicated object versions)
  - Example: N = (S_1, S_2, S_3). S_1 coordinates D_1([S_1,1]) and sync-replicates to S_2. Meanwhile, S_3 coordinates D_2([S_3,1]) and sync-replicates to S_2. S_1, S_2 and S_3 need to decide what's missing on each other's data. Using Merkle tree, they quickly find out it's only object D that's different across. Then they sync on D by passing around D_1 and D_2. Say S_1 gets D_2 from S_3, and adds it to its store; S_2 gets D_1 from S_1 and D_2 from S_3 but doesn't need to update its store; S_3 gets (D_1,D_2) from S_2 and only adds D_1 to its store.

We do not consider failures just yet. But we can hopefully already see the effect of replication on reducing version divergence.

Presentation

Slides:

Vision
Contributions
Architecture, Different hashing designs and implementations

Demo:

Failure handling done by RH
Node maintenance (removal) done by CH

Project profile

zheguang / resdb Goto Github PK

resdb's People

Contributors

Watchers

Forkers

resdb's Issues

Recommend Projects

Recommend Topics

Recommend Org