Git Product home page Git Product logo

multi-paxos-example's Introduction

Multi-Paxos Example

Purpose

Paxos and multi-paxos have been around for decades but there are few reference implementations for those learning the subjects to experiment with. This repository contains a demonstration application specifically intended to fill that role. The application maintains a single replicated value across a cluster of nodes and the code is implemented with an eye towards strong separations of concerns so that implementations for various aspects of the codebase may be easily swapped out.

This example uses very simple but functional strategies for each of the implementation-defined aspects of multi-paxos. The strategies are far too simplistic for effective real-world use but they do work and the result is a small and relatively simple code base. This will hopefully show that multi-paxos isn’t inherently all that complex or difficult to implement. When enhanced for operation in practical environments, a significant level of complexity may need to be added but as this example also demonstrates, software designs emphasizing a strong separation-of-concerns can effectively compartmentalize the areas of complexity and isolate them from one another.

This repository was created as a companion to the Understanding Paxos paper.

Implementation Approach

The code is written in Python using an asynchronous programming model. The Twisted framework is used to provide UDP networking, the asynchronous callback mechanism, function scheduling, and the overall reactor loop. To make the code easier to read by those not overly familiar with Python and/or Twisted, the use of most advanced features in both has been eschewed in favor of more straight-forward implementations. For the few that are used, a section towards the end of this readme is dedicated to providing a brief overview of how they work.

Separation-of-concerns is one of the main goals of the code and Mixin Classes are the primary means by which this is accomplished. Mixin classes are well suited for cleanly encapsulating the resolution, synchronization, and master-lease implementations. In addition to enhancing clarity, mixin classes provide a convenient means for swapping in alternative implementations and allowing conditional inclusion of optional functionality. The master lease implementation, for example, is purely an optimization and is not required for correct operation. The server implementation allows this feature to be used or excluded via a command-line flag.

The configuration of the multi-paxos chain is defined in config.py and defaults to 3 servers with UIDs of A, B, and C. A minimum of two must be used for consensus to be reached. Each server uses a state file to support recovery and it defaults to /tmp/<UID>.json (Windows users will need to adjust the file name). If a server is offline while the chain is modified, the catchup process will bring it up to date within a relatively short period of time.

Running the server
# Without Master Leases:
$ python server.py <A|B|C>
$
# With Master Leases
$ python server.py --master <A|B|C>

The client application requires a server id and a new value to propose. The provided server will be requested to set the value to the second argument via a fire-and-forget UDP packet. No error checking is done so if the server is offline or there is a networking error, it will not be reported. Also, when master leases are being used, the client will not warn in the event that the request is sent to a non-master peer. Instead, the receiving server will print an error message that states which peer is currently the master. It is assumed that all of the servers and the client will be run on the same screen but in different terminals.

Running the client
$ python client.py <A|B|C> <new_value>

Understanding the Source

To quickly get up to speed on the implementation, the recommended order to follow is:

  1. If unfamiliar with Mixin Classes or Python & Twisted read through those sections first.

  2. Read through the overviews provided here on each module to get a sense for how the pieces are intended to fit together.

  3. Look through composable_paxos.py to see how the core Paxos algorithm is implemented.

  4. Look through replicated_value.py to see how instances are chained together to maintain a single, replicated value.

  5. Look through resolution_strategy.py to see how the mixin class adds retransmissions and exponential backoffs to the passive baseclass.

  6. Look through sync_strategy.py to see how falling behind is detected and how catching up is achieved.

  7. Look through server.py to see how the mixin classes are woven together

  8. Finally, look through master_strategy.py to see how master leases and single-round-trip resolution can be added to the basic implementation.

messenger.py

The Messenger class encapsulates the message-passing strategy for the application. BaseReplicatedValue instances send messages by calling one of the Messenger’s send_<message-type> methods and receive messages by specifying a receive_<message-type> method for each message type it supports. The messenger class will dynamically look up the method for handling incoming packets by searching the class hierarchy for an appropriately named method based off of the incoming message’s message type.

To keep things simple for this example, messages are sent over UDP and use a simple JSON encoding format.

composable_paxos.py

This module, which is also available as stand alone and pip-installable package, defines the core Paxos algorithm as a set of composable classes. The classes are completely agnostic of the messaging approach used by the enclosing application. Message reception is modeled as method calls where the arguments are the message content and message transmission is modeled by returned objects that contain the message content.

replicated_value.py

This module defines the BaseReplicatedValue class and preforms three functions:

  • Maintains the multi-paxos chain by tracking the current instance and creating a new PaxosInstance object each time resolution is achieved.

  • Serves as a bridge between the Messenger class that provides access to the network and the current PaxosInstance object

  • Saves and restores state from a file prior to sending each Promise and Accept message as well as each time resolution is achieved.

Of particular note is that this class is completely passive. When a message is received from a client, this class simply converts the message into a call to the underlying composable_paxos.PaxosInstance instance and potentially sends a reply message. It does not, however, initiate the sending of any original messages or try to drive the resolution process forward. Those details as well as those of catch up and master-lease strategies are left to mixin classes.

resolution_strategy.py

This module defines a mixin class that implements all of the logic needed to ensure that a Paxos instance will eventually achieve resolution. It does so by immediately attempting to drive the process forward if it is the first to propose a value and it steps in to continue driving the process forward if another peer falls silent before completing the process. Continual inter-peer interference is avoided by using randomized sleeps within a backoff window that doubles in size each time a collision occurs.

The effective entry points to this Mixin class are the overridden propose_update and receive_accept methods. Both of these methods make the strategy aware that a value has been proposed so the resolution process must be driven to completion. In the propose_update case, the peer may immediately begin attempting to drive the process forward. In the receive_accept case, the peer delays attempting to drive the process forward until the messaging has ceased for a while; otherwise all peers would immediately conflict with the original driver.

sync_strategy.py

This module defines a mixin class that periodically sends a message to a random peer that specifies what link number it is currently on. If the receiving peer sees that the sender has fallen behind, it will respond with a message stating the current link number and the current value. The behind peer will then advance it’s current instance to match that contained in the reply and will update its current value accordingly.

master_strategy.py

This module defines a Mixin class that implements Master-Leases and resolution in a single round trip. While the lease is held, only that peer is allowed to add links to the chain and the link additions will generally require only a single round-trip to achieve consensus.

The muli-paxos chain itself is used to manage the identity of the master. The values in the chain are two-element tuples in which one element is always None. If the left element is set, it indicates a new master has been elected. If the right element is set, it indicates a new application-level value.

To avoid problems associated with clock-synchronization, each peer starts their timer for the lease hold duration when they learn about the new master. All peers will have slightly differing ideas about when the lease expires and this may delay the elections of new masters but the current master will always attempt to renew its lease prior to the expiry of the current one. As a result, leadership changes will be infrequent occurrences.

This strategy is somewhat dependent upon the resolution strategy implementation due to the need to augment the handling of the initial proposal for single-round-trip messaging semantics. While the dedicated master lease is held, the initial Prepare message and the corresponding Promise message are locally generated and injected into the PaxosInstance object. This avoids the need to actually send them over the network. The master strategy makes a small change to the resolution strategy’s handling of the initial proposal by causing it to simply drop the initial Prepare message. All subsequent messages are handled in the normal manner.

The overriding goals of this implementation are:

  • Ensure that a new master is elected if the current lease expires

  • Ensure that only the master can add links containing application-level values

  • Ensure that no two peers may simultaneously believe themselves to be the master

config.py

Defines the members of the multi-paxos group, nodes A, B, and C; and specifies which UDP port they will run on. Additionally, each node is configured to use a separate file for storing state. This file is used to during recovery and ensures that it is safe to kill the server processes at any time.

client.py

This module implements about the simplest possible client application for submitting requests for new values. The first argument to the program is the UID of the peer to send the request to and the second argument is the new value to use. The client does not wait for a response or check for errors; it simply creates the UDP request packet, sends it to the specified peer, and exits.

server.py

As the name suggests, this module implements the server component. The server takes a single argument which identifies the UID the server should use while running. The server may be stopped at any point via Ctrl+C and it will pick up where it left off the next time it’s run. The server takes an optional --master argument that enables the use of the master-leases mixin class. All peers should either use or not use the --master flag simultaneously but there is no other restriction on the use of this flag. Use of the master flag can be turned on and off for the same chain so long as all peers are taken down prior to making the switch.

All sent and received message traffic as well as the result of each resolution is printed to the console.

Uncommon Design & Feature Reference

Mixin Classes

For individuals coming from more traditional languages like C++/Java this may be something of a foreign concept. Mixin classes are not all that common even in the Python community but Scala developers familiar with trait stacking should feel right at home. The basic concept behind Mixins is to create classes that augment the behavior of a base class by overriding specific methods and having those overriding methods explicitly call up the inheritance chain. Classes that follow this pattern may then be "Mixed" together in various ways to combine those augmentations. This is subtly different from traditional multiple inheritance so working through an example may aid in understanding how it works:

class NumberQueue(object):
    def __init__(self):
        self.q = list()

    def put(self, value):
        self.q.append( value )

    def printSelf(self):
        print repr(self.q)

class DoublingMixin(object):

    def put(self, value):
        super(DoublingMixin,self).put( value * 2 )

class IncrementingMixin(object):

    def put(self, value):
        super(IncrementingMixin,self).put( value + 1 )

# In Python, calls to 'super' go left-to-right through peer super classes

class Doubler (DoublingMixin, NumberQueue):
    pass

class DoublerIncrementer (DoublingMixin, IncrementingMixin, NumberQueue):
    pass

class IncrementerDoubler (IncrementingMixin, DoublingMixin, NumberQueue):
    pass

def show(kind, q):
    q.put(2)
    q.put(3)
    q.put(4)
    print kind,
    q.printSelf()

show('Original:          ', NumberQueue())
show('Doubler:           ', Doubler())
show('DoublerIncrementer:', DoublerIncrementer())
show('IncrementerDoubler:', IncrementerDoubler())

# Outputs:
#    Original:           [2, 3, 4]
#    Doubler:            [4, 6, 8]
#    DoublerIncrementer: [5, 7, 9]
#    IncrementerDoubler: [6, 8, 10]

Python & Twisted Features

Python is often referred to as "executable pseudo code" due to its rather straight-forward nature and the ease with which even those not familiar with the language can read the source code. This example intentionally shies away from the more advanced Python & Twisted features in order to enhance readability but it does use a few features that are not particularly intuitive. The following list provides some additional context on them.

Python’s super() command

Python’s super() command is typically invoked as super(<class_name>,self).<method_name> and it searches up the inheritance hierarchy for the requested method. The wrinkle, as compared to C++ and Java, is that it searches left-to-right through peer inherited classes rather than going straight up at each tier in the hierarchy. The Mixin example demonstrates how software designs may take advantage of this.

Twisted’s task.LoopingCall

The task.LoopingCall() constructor accepts a function object as an argument and returns an object that is capable of repeatedly calling the function object at set intervals. The interval is begun with the start(<duration>, [now=True|False]) method. When the optional, now argument is supplied, the function will be immediately invoked without first waiting for duration to elapse. This implementation makes use of both the delayed initial invocation as well as the immediate invocation. + Of particular note is that all scheduling calls in Twisted, such as start(<duration>), use floating-point times with seconds as the dimension. Fractional values are permitted so to call a function at 60Hz one could use start(1.0/60.0).

Python’s lambda command

Python’s lambda command creates an unnamed function that returns whatever the right-hand side evaluates to. In this example, lambda is used to create callback functions that, when called, will invoke another function with a certain set of arguments. For example, the following call will print "Hello World" every 5 seconds: +

from __future__ import print_function
from twisted.internet import task

task.LoopingCall( lambda : print("Hello World") ).start(5.0)
Twisted’s reactor.callLater

This schedules a function object to be called at some point in the future. the method signature is reactor.callLater(<duration_in_seconds>,<function>) and it returns an object that may be used to cancel the delayed invocation. As with LoopingCall, the duration is a floating-point value and may be used to specify delays of less than a second.

multi-paxos-example's People

Contributors

cocagne avatar estensen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

multi-paxos-example's Issues

Run server code on multiple devices

Hi, Tom,

How are you doing? I really appreciate that you can create this repo to show me the theories behind multi-paxos. I am trying to run the server code on different devices and make them talk to each other. In the read me your say, it is assumed that all of the servers and the client will be run on the same screen but in different terminals. I wonder is that a must that you need to run all code on the same machine or it is possible to run on different machines through changing the config file.
You help will be much appreciated.

Best Regards,
Jeffery

Question about sync_strategy

As far as I can tell the sync_strategy seems to synchronize the node with the current state of the multi-paxos chain, but doesn't get up to date about previous resolved values on the chain, right?

Concurrent proposals

I've noticed that in your implementation of a replicated state machine the instance number is only advanced upon resolution.
So presumably clients can only issue a new proposal once the current one has been resolved.
Is there a simple way of adding support for multiple concurrent proposals?

Question about persisting master election result

I've noticed that a consequence of using the paxos chain to manage the master lease is that we end up with mixed application level and election level data in the log.

If we were to add a new parameter to def advance_instance in replicated_value.py so that the caller can specify if the value should be persisted to the log or not. And we'd keep track of the paxos instance number separately, so having it not matching the file length is not an issue. Do you see any disadvantage with this approach?

Is there any use case in where it'd be useful to read the election result from the log?

Some question about code

there are lots of methods which call the methods of superclass that there exactly isn't this method in the superclass and the names of these two methods are always the same. Is this a special feature for Python?
eg. super(SimpleSynchronizationStrategyMixin,self).set_messenger(messenger)
the superclass of SimpleSynchronizationStrategyMixin is object not containing the method "set_messenger". Exactly the name of method containing this line of code is also "set_messenger"

Question about advancing the paxos instance

When advancing the paxos instance:

self.advance_instance( self.instance_number + 1, proposal_value )

Seems like like you're setting the value for the next instance instead of the current one.

I understand that the instance number needs to be advanced, but I'd have expected to happen only after saving the current consented value for the current instance. But looking into the self.advance_instance implementation that doesn't seem to be what's happening.

ps: Thanks for putting this repo together, it's been a great help!

Possible deadlock when receiving nack?

Hi! First let me say thank you for this fantastic project. It's been immensely helpful for me learning how Paxos works.

I'm also still far from an expert here, and I haven't actually tried reproducing this bug. It's purely theoretical and I might be wrong.

In receive_nack, the proposer waits until it receives a quorum of nacks before retrying with prepare(). In a scenario where at least one peer is partitioned, it's possible for the responses to be split between successful promises and nacks such that a quorum of neither ever happens. In that case, the algorithm can get stuck, even though an overall quorum of peers is responding.

Concrete example: 5 nodes total (ABCDE). 3 are up (ABC) and 2 are offline (DE). A sends a prepare message to BC. B accepts the prepare and responds with a promise. C has already seen a higher prepare (perhaps from D before it went offline) and nacks. Now A has agreement from B and disagreement from C, and is stuck waiting for another vote from D or E in order to reach a three node (including itself) quorum.

There are a couple of potential ways this could be resolved. The simplest would be to send a new prepare as soon as we receive the first nack, although that could be inefficient.

Am I missing something or is this a real problem?

Thanks,
Ben

another two more questions. : )

first, where a value is judged to confirm in your code?
and, if one of the servers certain a value is confirmed, how the other servers know? By the listener? Could you tell me where is it in you code?
Thx!

something wrong when I run "python server.py A"

Hi Tom:

How is everything going?

I listened your suggestion and read your code
https://github.com/cocagne/multi-paxos-example

but something wrong when I try to run "python server.py A"
It said

Traceback (most recent call last):
  File "server.py", line 45, in <module>
    m = Messenger(args.uid, config.peers, r)
  File "C:\Users\xiaoj\Documents\multi-paxos-example\messenger.py", line 19, in __init__
    for k,v in self.addrs.items():
RuntimeError: dictionary changed size during iteration

and the line 19 of messenger.py is here:

        # provide two-way mapping between endpoints and server names
        for k,v in self.addrs.items():
            self.addrs[v] = k

I don't understand what's the meaning of those two line, you could help me about that? Thank you!

best wishes!

Jin

something wrong when I try to propose a value from client

Tom,

Hi!

I have configed multi-paxos-example in my computer windows7 64bit.

# (IP,UDP Port Number)
peers = dict( A=('127.0.0.1',1234),
              B=('127.0.0.1',1235),
              C=('127.0.0.1',1236) )

# State files for crash recovery. Windows users will need to modify
# these.
state_files = dict( A='E:\\multi-paxos-example\\tmp\\A.json',
                    B='E:\\multi-paxos-example\\tmp\\B.json',
                    C='E:\\multi-paxos-example\\tmp\\C.json' )

And try to run it. But when I run a client
python client.py B eee
It comes up a Error.

Error processing packet:  accept {"instance_number": 0, "proposal_id": [1, "B"],
 "proposal_value": "eee"}
Traceback (most recent call last):
  File "E:\multi-paxos-example\messenger.py", line 55, in datagramReceived
    handler(from_uid, **kwargs)
  File "E:\multi-paxos-example\resolution_strategy.py", line 82, in receive_acce
pt
    super(ExponentialBackoffResolutionStrategyMixin,self).receive_accept(from_ui
d, instance_number, proposal_id, proposal_value)
  File "E:\multi-paxos-example\replicated_value.py", line 176, in receive_accept

    proposal_id, proposal_value)
  File "E:\multi-paxos-example\replicated_value.py", line 72, in save_state
    os.rename(tmp, self.state_file)
WindowsError: [Error 183]

I google it and it means file existed, but I don't know how to solve it. Can you help me about that?
Thanks!

Jin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.