WuExchange

WuExchange is a securities trading platform written in Elixir for educational purposes: learn about a bit about markets, and Erlang OTP concepts: GenServer, GenStage, Dynamic Process Registration.

A Trader places buy limit / sell limit orders against listed securities. When an order is placed by a Trader, the Matching Engine attempts to match an incoming order with an existing order that satisfies our matching criteria. When a matching order is found, it is then executed and later transcribed by our TransactionScribe. If the order is not matched, it is placed in a queue using a Price-Time Priority Algorithm

The Trader, MatchingEngine and TransactionScribe exist within their own execution runtime using GenServer provided the Erlang OTP Framework. Seperating the runtime for each of these processes provides us with isolation guaruntees that allow us to grow functionality irrespective to dependencies one system may have on another, not to mention the extremely desired behavior: system failures will not bring down non-dependant parts of the application. tl;dr: a trader process failing shouldn't crash the Matching Engine and vice versa. we can think of our system as a collection of small, independent threads of logic that communicate with other processes through an agreed upon interface.

The Trader, MatchingEngine, and TransactionScribe exist within the boundaries of our wu_exchange_backend umbrella application and is responsible for all business logic powering the interactions between Traders and the Exchange.

The wu_exchange_web umbrella app is a stripped Phoenix Web application

Some thoughts on design goals

An exchange platform that behaves deterministically and favors: correctness, developer agility, and stability. Performance: throughput, latency, execution speed are not overlooked, but viewed as secondary. Let's build something simple / correct system and then optimize for performance.

linearizability ...

Examples

Scenario 1

Performance Characteristics

The Erlang Virtual Machine runs as one OS process per available core. Code execution is done using erlang light-weight processes inside the erlang virtual machine. we can think of the erlang virtual machine as the operating system and an erlang process as an operating system process.

Some Early Benchamarking

Hardware / Software Specs:

Elixir: 1.5.0

Erlang: 20.2

Operating system: macOS

Available memory: 16 GB

CPU Information: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz

Number of Available Cores: 8

Benchmarking is done using benchee, a very simple function benchmarking tool written in elxiir.

At a quick glance, we can see that it takes 5 microseconds for an insert operation. In these benchmarks, all initialization (creating process, data structures) are done prior to executing the benchmarks. Should be a fairly good indicator of how long an operation should take under optimal conditions. These are synthetic benchmarks and would not reflect real-world performance and only serve to give us a baseline of what to expect.

Inserting a Buy / Sell Limit Order

An insert operation, which is a O(1) map lookup, and O(1) queue insert should take same amount of time regardless of queue size (number of queued and unmatched orders). This is supported by our benchmarks which show a single insert taking 5μs, and 1000 inserts at roughly 9200μs. The high deviation could be attributed to multiple factors: First, we are running this benchmark on my laptop which means there exists process contention between the erlang VM and other operating system processes (chrome, messengers, etc). Second: the erlang VM garbage collector. the erlang GC is not a stop-the-world GC, but a stop-the-process GC meaning each time garbage collection criterion is met, the process must stop and allow for garbage collection to occur. Here's a blog post by Sasa Juric detailing how to optimize for latency in long lived processes. Each light-weight process runs it's own garbage collector and garbage collection occurs when the stack meets the heap (they grow towards each other)

This means that as we continue to insert orders in to our queue, heap space increases and GC criterion will be met causing the process to GC. I'm not 100% positive on this, but this could explain why we see run-time spikes amongst multiple runs -- the GC process kicking in and re-sizing the heap.

We could test this hypothesis by warming up the process or by increasing the default heap size of a process and re-running the benchmarks. Another thing we could do is side-step the GC issues entirely by implementing (another thing to do!) our data store in ETS (Erlang Term Storage). ETS data is stored in a seperate process which means our process won't need to as many references causing individual GC times to be greatly decreased. see examples of this here

Final Thoughts on Performance

The benchmarking we have done only measures a very small surface area of an example trading system. We have not (yet) benchmarked other areas of the system inside the system that may incur noticeable latency.

This is what a end to end trader interaction may look like: A client connects to and maintains state with our server over a websocket channel (erlang light weight process). messages are pushed from the browser client to our phoenix endpoint, which then passes a message (buy / sell / cancel) to our routing logic layer, which will then pass another message to our Matching Engine.

[Browser Client] <-- Websocket over TCP --> [Phoenix Endpoint] --> [Channel / Controller Logic] -> [Matching Engine]

The next logical step would be to benchmark performance between our endpoint and the matching engine (TODO!) and determine where we should optimize next. I imagine the matching engine itself will not be the bottlneck, but making sure messages are efficiently passed from process to process to be where the real latency issue lie.

MatchingEngine GenServer Process

This is the meat. code here If the algorithm is incorrect, everything else doesn't matter.

Improvements TODO:

re-think underlying data structures that represent the order book. currently we are using a map of queues, implemented using lists which give us O(1) insert, O(1) matching but O(n) deleting (which would not be acceptable in a real trading system). The solution seems to be implement using some sort of balanced tree: red black, gb_trees or AVL tree. Using a tree should give us worst case insert, search, and delete performance of O(log n) for a specific price point.

Additional Resources

here a description of Exchanges and Matching Engines
Read this for much better explanation of erlang gc
single threaded matching example by martin folower
Order Book Imp in Python
Order Book Imp in Go
Trading Summary (this is good)

brianwu02 / wuexchange Goto Github PK

wuexchange's Introduction

WuExchange

Some thoughts on design goals

Examples

Scenario 1

Performance Characteristics

Some Early Benchamarking

Hardware / Software Specs:

Inserting a Buy / Sell Limit Order

Final Thoughts on Performance

MatchingEngine GenServer Process

Additional Resources

wuexchange's People

Contributors

Stargazers

Watchers

Forkers

wuexchange's Issues

Recommend Projects

Recommend Topics

Recommend Org