Git Product home page Git Product logo

vhaline's Introduction

vhaline

high availability from a chain of processes

summary of vhaline properties:

The 'ha' part of the name is for high-availability.

vhaline is inspired by chain-replication, but is simpler and doesn't depend on a paxos system.

We are cloud and firewall friendly since we only need go one direction through the firewall.

We don't offer the same levels of proof or consistency guarantees that chain-replication does. This is just a best-effort primary-backup chain. It is susceptible to split brain situations on network partition, which could result in two processes writing at once. If you need the protection of a quorum, then vhaline is not for you. In my use, I can tolerate having more than one writer under a degraded network. AP systems like Cassandra and Dynamo make similar (configurable) choices.

architecture

Like a linked list, each node has two pointers or remote addresses. The upstream pointer points to our parent, and the downstream pointer point to our child. Either or both pointers can be nil.

The most upstream node is the root, and the root is the only one that does writes.

Regular state checkpoints are passed down the chain every 10-30 seconds to the middle.

The middle then passes checkpoints down to its child (the tail), if present.

clients pointing at servers: root (started) first, is always a server without a parent:

     root <--- middle <--- tail

dataflow of checkpoints in the line:

     root ---> middle ---> tail

In vhaline, there is only ever one parent and one child at each node, so the graph is always a straight line chain of nodes.

I. Electing ourselves root in the chain:

  • a) If we have no parent, then we are root. We write.

  • b) If we detect parent failure, then we take over as root, and write.

  • c) We regularly check for parent (and child) failure. This is done with pings. We typically require at least 3 failed pings before the TTL expires and we elect ourselves writer.

  • d) If the upstream parent is configured (not nil), then we are a middle or last node. As middle or last nodes, we listen for checkpoints from upstream, persistent them, rotate them, and copy them to our child (if we have one). We don't write, but we do standby to write if our parent fails.

II. On child failure:

  • a) Child failures should be detected. Once detected and cleared, we should allow another, different node to subscribe as our new child.

  • b) If we have a child, we replicate checkpoints to them.

III. Misc. notes:

Ideally we should have a means of dedup-ing the checkpoints so we can recognize that we've already gotten a checkpoint and we don't propagate it downstream. If we only propagate things that are new to us, that is much more efficient/saves on bandwidth. The blake2b function is already available on the Frames will suffice if we find this critical. For now it is left undone.

testing

In the shell, do ulimit -n 5000 first to raise the file limit. Otherwise go test may run out of files on the Test001 stress test for failure detection. Particularly on OSX.

administrative

Copyright (c) 2017, Jason E. Aten.

license: MIT

vhaline's People

Contributors

glycerine avatar

Watchers

 avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.