Git Product home page Git Product logo

resdb's People

Contributors

nicola1m avatar serwarde avatar shanlili avatar zheguang avatar

Watchers

 avatar

Forkers

serwarde

resdb's Issues

Concurrency

Virtual node -> physical nodes
"Preferred list"

  • Before, for each object, hash ring tells which node (champion im RH/ 1st ring node on CH) to go. Now, we need to extend this result of one node to a list (i.e. ordered set) of N nodes. Each operation can be executed by any one of the N nodes. Call the choice node 'coordinator'.
    • One idea is just to take top-N nodes with highest score in RH, and N consecutive nodes clockwise on CH. Then sample the coordinator based on uniform or geometric distribution (say).
    • Extra wrinkle presented by CH virtual nodes is that some of the N consecutive virtual nodes may be mapped to the same physical nodes. RH doesn't need virtual node.

Vector clock for data reads and writes

  • Use the format D([(S, ctr)...]) where D is the object, S is the coordinator node id, ctr is the counter. There is a list of such node-counter pairs.

For now, assume R and W quorum to be 1. So one concurrent example is:

  • App1 writes on S_1: D_1([S_1,1]).
  • App2 writes on S_2: D_2([S_2,1]).
  • App1 reads from S_2: D_2
  • App1 reconcile D_1 and D_2
  • App1 updates and writes on S_1: D_3([S_1,2],[S_2,1])

Hash Ring

Two flavors: Rendezvous Hashing and Consistent Hashing.

  • Given data X, find which node should store it.

Without considering failure and replication, take care of node addition and node departure.

  • ship data from departing node to neighbor node.
  • ship data from neighbor node to added node.

Unavailability

Unavailability can be caused by unexpected failure to reach a node, or node taking offline for maintenance.

Hinted handoff for temporary failures:
If a node S just temporarily went offline, replicas that are intended for S is forwarded to S'. S' is chosen from outside the preferred list of N, say the N+1st node. S' then stores a special marker for this S's replica, (S, replica). When it detects that S becomes available again, it sends all S's replicas back to S.

Detection of failures is to use heartbeat ('are you alive?') + gossip ('i just learned that node F has failed').

Another consideration for failure recovery is the efficiency. When a node fails and its data has to be redistributed to other node(s). So one way to characterize efficiency in failure recovery is the load of the receiving node(s). For example, if only one node is to receive the failed node's whole load, then that this receiving node might be overwhelmed, and the requests for objects on this receiving node may experience delay. However, if the load of the failed node is distributed more evenly to more nodes, then we can expect each receiving node only experience a minor delay.

  • In consistent hashing, Dynamo uses the mapping of a physical node to multiple virtual nodes to balance the load for such network topology changes caused by unavailability.
  • In rendezvous hashing, it seems such load balancing for topology changes is already baked in.

Understand Dynamo

We will understand Dynamo by running an end-to-end example through, focusing on these questions:

  • What types of queries are supported?
  • How is the distributed hash table stored?
  • How does conflicting writes handled?
  • How is node failure, permanent or temporary, handled?
  • How does a node join or leave the network?
  • How does a node communicate with each other?
  • What is eventual consistency?
  • How is resilience in Dynamo different from traditional fail-over?

Future steps in mind:

  • Incorporate a strongly-consistent, non-resilient relational DB instance
  • Extend the data model from key-value to relational, or custom data structures (e.g. queue, stack, etc., like Redis)

Background links:

Intro meetup

  • Group intro
  • Policy
    • Time (May 1 - Sep 30, 180 hours per person, 1.8 hours per day)
    • Diversity & Accessibility
    • Grading (Reading, Design, Implementation, Evaluation; Peer Review on teamwork)
  • Project
    • Goal
    • Timeline & Deliverables
      • May: Reading + Design (Proposal)
      • June - July: Design + Implementation (Code)
      • August - Sep: Evaluation (Report)

Node

  • Hash Ring (abstract)
  • Communication.
    • Query routing (who should execute on data X)
    • Membership gossiping (sync up on hash ring)
  • Storage (local for now, replication later)
  • Failure detection (later)

A key step here is the communication layer. I suggest ZeroMQ.

Resilient Relations vs ML

In the context of Resilient DB, I can see two venues to explore:

(1) Resilient relational database
Problem here is to build a lightweight mechanism to make a standard relational database (e.g. MySQL) resilient to failures.
The idea is to run a bunch of secondaries in a p2p network, similar to Dynamo. The challenge here is to extend Dynamo to relational setting, and consider different deployment issues such as varying resources and failing probabilities.
Reading:
Dynamo: Amazon's highly available key-value store. DeCandia et al. SOSP'07.

(2) Resilient ML
Problem here is to build a tool for private-preserving federated learning that can tolerate failures such as failing devices. This tool is called Secure Aggregation. The challenge is to extend it to handle floating points (duh), and make it more efficient. Another challenge is to extend it to handle different types of aggregations, such as affine transform, counts, etc.
Reading:
Practical Secure Aggregation for Privacy-Preserving Machine Learning. Bonawitz et al., CCS'17.

Ingredients of Dynamo + First step

Todo:

  • Skim over the links.
  • Understand how consistent hashing / rendevous hashing works, because we are going to implement and compare these two first.

Replication

Sync Replication

  • Extend the R and W quorum to > 1. Say choose R and W random subset of preferred list of N.
  • Coordinator waits until operation is (1) executed locally with local vector clock, and (2) replicated to the quorum verbatim, i.e. with coordinator vector clock.

Async Replication

  • Eventually each object gets replicated to N nodes (which N depends on the object), regardless of R or W.
  • Coordinator does not have to wait for async replication to return success.
  • Async rep can be done periodically in background, in batch.
    • Merkle tree helps to quickly figure out which replicas for what key ranges and keys are out of sync (i.e. missing some should-have-been-replicated object versions)
      • Example: N = (S_1, S_2, S_3). S_1 coordinates D_1([S_1,1]) and sync-replicates to S_2. Meanwhile, S_3 coordinates D_2([S_3,1]) and sync-replicates to S_2. S_1, S_2 and S_3 need to decide what's missing on each other's data. Using Merkle tree, they quickly find out it's only object D that's different across. Then they sync on D by passing around D_1 and D_2. Say S_1 gets D_2 from S_3, and adds it to its store; S_2 gets D_1 from S_1 and D_2 from S_3 but doesn't need to update its store; S_3 gets (D_1,D_2) from S_2 and only adds D_1 to its store.

We do not consider failures just yet. But we can hopefully already see the effect of replication on reducing version divergence.

Presentation

Slides:

  1. Vision
  2. Contributions
  3. Architecture, Different hashing designs and implementations

Demo:

  1. Failure handling done by RH
  2. Node maintenance (removal) done by CH

Project profile

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.