coryan / jaybeams Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 2.0 13.08 MB

JayBeams: A Project to have fun Coding, and maybe measure relative delays in market feeds

License: Apache License 2.0

Shell 1.08% C++ 70.44% C 25.54% R 0.16% CMake 2.78%

market-data cross-correlation gpu delay-estimation

jaybeams's Introduction

JayBeams Documentation

Another project to have fun coding.

The JayBeams library performs relative time delay estimation of market feeds. That is, given two feeds for the same inside quote data it estimates, in near real-time which one is faster, and by how much.

Licensing details are found in the LICENSE file.
The installation instructions are in the INSTALL file.

Motivation

The US equity markets have a lot of redundant feeds with basically the same information, or where one feed is a super set of a second set. When one has two feeds an interesting question is to know how much faster is one feed vs. the other, or rather, how much faster it is right now, because the latency changes over time. There are (usually) no message IDs or any other simple way to match events in one feed vs. events in the second feed. So the problem of "time-delay estimation" requires some heavier computation, and doing this in real-time is extra challenging. The current implementation is based on cross-correlation of the two signals, using FFTs on GPUs to efficiently implement the cross-correlations.

You can find more details about the motivation, and the performance requirements on my posts

Really, that is the motivation?

I confess, I wanted to learn how to program GPUs, and given my background this appeared as an interesting application.

So where is the code?

I am pushing the code slowly. I want to make sure it compiles and passes its tests on at least a couple of Linux variants. This can be a challenge given the dependency on OpenCL libraries.

What is with the name?

JayBeams is named after a WWII electronic warfare system. The name was selected more or less at random from the Wikipedia list of such systems, and is not meant to represent anything in particular. It sounds cool, in a Flash Gordon kind of way.

Licensing and Copyright Notice

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

jaybeams's People

Contributors

Stargazers

Watchers

Forkers

gfariasr gitter-badger

jaybeams's Issues

Create a representation of inside_quote

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need a presentation of an inside_quote. The only changes from the design doc is that we do not want the types to be tied to the ITCH-5.0 representations.

We need to implement #57 and #58 first though.

Create a program to compute how deep are events

I am interested in computing how deep is each event in the book. That is, for each event (add, modify, or delete) we want to compute the number of price levels between the price of the order affected by that event and the price at the inside. The number of levels would be collected to report statistics about them.

We can modify an existing program or create a new one, both solutions are acceptable to me.

In jb::fftw::time_delay_estimation, the minimum required cross-correlation for accepting an estimate should be configurable

Or maybe the output should be a structure showing the maximum and the areas of the timeseries under analysis, or maybe a functor that makes the decision (with some default). We will see how this is used and check later.

Document how to setup travis-ci.org builds and coveralls.io coverage for all users

At the end of this task we will have a document describing all the steps to setup a build for jaybeams on travis-ci.org, including code-coverage on coveralls.io

Change the travis.yml configuration so it can be shared with other users

At the end of this task the .travis.yml file will be shareable with other users that have forked the repository. Currently the configuration file defines several environment variables (GIT_NAME, GIT_EMAIL, GH_TOKEN), their values only make sense when the build is running out of the coryan/jaybeams repository. Those variables should be defined in the travis-ci.org configuration instead, making the file reusable by all.

Benchmark time to upload OpenCL buffers

At the end of this task we will have a benchmark to measure the time it takes to upload a buffer to an OpenCL device. The benchmark should estimate the delay of uploading a 0-sized buffer (by regression if necessary), as well as the cost per byte/word.

Change jb::itch5::decoder<> to use Boost.Endian

Boost.Endian provides a far cleaner and probably faster way to decode ITCH5 messages. Unfortunately, Ubuntu 14.04 (which I still want to support) includes boost-1.55, and Boost.Endian was introduced in boost-1.58.

I will keep the ugly code for now, it is fairly isolated anyway, and fix it once I have other motivations to abandon boost-1.55 (and probably Ubuntu 14.04 along the way).

Create program to compute the time-delay estimation of two inside_quote files

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

This will be a very simple version of the program: single FFTW-based time-delay estimator, blocking (not asynchronous or multithreaded) computations.

Benchmark time to create OpenCL kernels

At the end of this task we will have a benchmark to measure the cost of launching an OpenCL kernel, including the cost per kernel when multiple kernels are sent in a pipeline. That is, a sequence of kernels is uploaded to an OpenCL device, but each kernel is scheduled to start once the previous one has finished.

Create Scheduler class to break down stream into sample blocks of approximately equal duration

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

Create configuration class and helper functions to create threads

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

I have some classes in my personal subversion repository to create threads with all kinds of configuration parameters (affinity, priority, scheduling class, etc). No sense in reinventing that wheel.

Create program to modify the timestamps in an inside feed file.

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We want to take an inside feed file, such as the ones generated by itch5inside, and modify the output timestamps by adding some noise. The range for the noise should be configurable. The noise should be such that no messages appear to be out of order in the output file.

Prototype time delay estimation using cross-correlation and GPUs

At the end of this task we will have an implementation of a time delay estimator that uses cross-correlation to estimate the delay, and pushes as much as the computation as possible to a GPU, including all FFTs and finding the maximum.

The objective is to compare the performance of this approach with the solution implemented in #7, to verify that GPUs do improve performance, it would be sad if they did not, but one has to check.

jb::config_object::merge_values() does not work for complex objects

The code is fairly naive, it works for simple YAML nodes, but not for very complex ones. Needs fixing.

Change jb::fftw::plan interface to only accept the vector types it was created with

At the end of this task the jb::fftw::plan class would only accept the vector types used to create it in the execute() function. It should resemble the (still under development) jb::clfft::plan template class.

Change jb::fftw::time_delay_estimator to operate asynchronously

A longer description can be found in the design doc.

If there is a conflict, the description in this issue overrides the description in the document.

The initial version of jb::fftw::time_delay_estimator is expected to block until the TDE computation ends. To better approximate how GPUs operate, we will need to modify the class to perform the computations on a separate thread, and somehow (TBD) communicate the results back.

Create functions to compute TDE using FFTW for a batch of timeseries

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need to compute the TDE (FFTW + Convo in Frequency Domain + FFT inverse + find max) using FFTW for a batch of timeseries.

Change FFTW wrappers to always use 'unsigned' for the planner flags

I accidentally used 'int' when the C library uses unsigned. It is confusing to change the type back and forth. We way want to consider using the C++ 11 enums for this purpose too.

Upgrade the Travis builds to use a more recent version of Doxygen

The default image on travis-ci.org has Doxygen 1.7.*, which does not support Markdown and a number of other features. I would need to install an updated version (anything over 1.8.9 would probably do).

Create function to automatically pick the right FFTW flags

The jb::fftw::plan wrappers use a very conservative set of flags to feed into the planner. I think we can determine most of the flag values using the type system, for example, vectors and arrays that are properly aligned can stop using the FFTW_UNALIGNED flag.

Implement a program to gather per-symbol throughput statistics about the ITCH-5.0 inside

At the end of this task we will have a program that reads the output of itch5inside and computes:

The throughput distribution (per second, millisecond and microseconds) of the inside changes
The distribution of message inter-arrival times for inside changes
The distribution distribution per symbol (per second, millisecond and microsecond)
The distribution of message inter-arrival times per symbol.

The data should be output to stdout (or a file) in CSV or another format suitable for loading into R.

Implement a better test for jb::itch5::process_iostream_mlist()

Vectorize the computation of argmax and max in jb::fftw::time_delay_estimator

The computation of max/argmax is jb::fftw::time_delay_estimator probably could benefit from SSE2-type instructions. I say probably because I do not know for certain. This would need some good benchmarks to convince myself that it does help.

Create class / function to sample irregular inside_quote timeseries into a regular timeseries

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need a function that takes a std::vector<inside_quote> as input and produces a array of samples in float_type[F][S][4][N] as output. The function should be generic on the float_type. The function must take F, S, and N as parameters. The function must accept T0 (the initial timestamp for the timeseries range), and T (the size of each timeseries sample) as parameters.

The function should have the usual unit tests. We should probably have a benchmark also.

defined to use c++ 11 style of using instead of typedef

We will use C++ 11 style of 'using blah = foo;' instead of 'typedef foo blah;'

Cleanup old typedefs
Create a document describing the "style guide" that is not enforced by clang-format.

Create an efficient representation for a stock or security

We need a representation for a stock or security. Nothing too elaborate, but efficient enough to be used in the implementation of time delay estimators, and not specifically tied to ITCH-5.0 (i.e. jb::itch5::stock_t is not what we need).

Create class (aka "Collector") to receive results from asynchronous TDE computation

A longer description can be found in the design doc.

If there is a conflict, the description in this issue overrides the description in the document.

We need an object / function / something to receive the results of the asynchronous computations.

Benchmark time to access a mapped device buffer.

At the end of this task we will have (a) a design of how to compare the latency of accessing memory mapped buffers from the host, (b) a benchmark to implement such design.

Write functions / classes to merge two files containing inside_quote objects as text by type.

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

I have some code in my personal SVN repository, we simply need to resurrect it (and its tests).

Benchmark time to download an OpenCL buffer

At the end of this task we need a benchmark to measure the time necessary to download an OpenCL buffer from the device to the host, including an estimate of the cost for a 0-sized buffer and the cost per byte/word.

Create a blocking queue to pass inside_quote objects from a produce thread to a consumer thread

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

In principle the queue should have a (configurable) limited capacity. It should be possible to efficiently remove many entities for the queue. The queue should be strictly FIFO. The first implementation can use a simple mutex/condition-variable combo.

Only push documentation from the master branch

Create function to compute summary estimate from batch of TDEs for many symbols

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need a function to compute a single estimate, just a weighted average as described in the design doc will do.

Create classes to represent batches of timeseries

A longer description can be found in the design doc.

If there is a conflict, the description in this issue overrides the description in the document.

We want nice wrappers to represent a k-dimensional array of timeseries. That way we can iterate over the timeseries easily, we can pass the whole array to the FFTW wrappers and they know how to interpret it, etc. We want to be able to do things like this:

typedef dimension<1> d1;
typedef dimension<3> index_dimension;
class timeseries_array<index_dimension,d1,aligned_vector<float>> { /* stuff */ };

index_dimension batch_rank(F, S, 4);
timeseries_dimension ts_rank(N);

timeseries_array a(batch_rank, ts_rank);

for (auto idx : batch_rank) {
  float * buffer = a(idx);
  for (auto i : ts_rank) {
    auto value = a(idx, i);
  }
}

auto plan = jb::fftw::create_plan_forward(a);

Create robot account to push the automatically generated documentation.

With the current configuration the Travis builds are using my account to commit and push any documentation changes. I think it would be better to use a robot account for these purposes, however, I do not immediately know how to do this, and it is not important enough to slow down the other changes.

itch5bookdepth may be under estimating the depth of book

Consider a book with the following price levels on the BUY side:

If we want to use an array to store the book, we would need 6 positions in the array. However itch5bookdepth would report that only 4 levels are needed.

Create time_delay_estimator class based on FFTW

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

Modify the FFTW-based TDE program to run using N asynchronous computers

A longer description can be found in the design doc.

If there is a conflict, the description in this issue overrides the description in the document.

Implement a program to compute the inside of the ITCH-5.0 feed

At the end of this task we will have a program that computes the inside for an ITCH-5.0 file. The output would be an ASCII file with the following information per-line:

The original timestamp of the ITCH-5.0 message that triggered the inside update.
The security identifier (symbol)
The bid and offer (both price and quantities)
Prices should be represented as jb::inside::price4_t values.

Fields should be separated by spaces.

Create benchmark for batched time_delay_estimator based on synthetic data

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

Implement a time delay estimator based on FFTW

At the end of this task we will have an implementation of a time delay estimator that uses cross-correlation to estimate the delay, and uses the FFTW library to compute the FFT and FFT inverse.

If possible, we will try to vectorize the computation of the maximum value.

The objective is to get some empirical numbers about computational costs.

Create jb::fftw::plan for batches of timeseries

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need to create FFTW plans that can operate on batches of timeseries. That is, given a vector of samples, for example: std::vector<float>, and a description of how to interpret the vector as an array (say a class describing the number of dimensions and the rank of each dimension, or maybe simply two integers M and N). We need to create a FFTW plan to compute the FFT (and the inverse FFT) treating the vector as an array of value_type[M][N]. The functions to create the plans should verify that the dimensions are correct. The plan may take into account whether the vectors are properly aligned for vectorized operations.

Compute ITCH5 book depth statistics

We should create a project with the scope proposed below. I do not have permissions to create anything but tickets in your repo. In my fork repo I can't create any ticket, which make sense (pointing into the direction of: this is a fork, work on the tickets documented on the actual repo), but I can see them (like this one for instance).

So, if you create the project (or allow me somehow to do so) I can create my tickets here, assigned them the project. I think with this should be enough organization until we understand the common practices.

The project is to create a new program, based on: - https://github.com/coryan/jaybeams/blob/master/tools/itch5stats.cpp - https://github.com/coryan/jaybeams/blob/master/tools/itch5inside.cpp

that generates statistics about depth of book: - What is the maximum number of levels across all symbols? - What is the p95 of depth? - The median? - What about per-symbol, what is the maximum for each symbol? The p95 for each symbol?

The program should output a CSV file that answers all those questions.

Requirements:

Req1) Generate the following statistics for Depth of Book aggregated by symbols:

minDepthOfBook: the minimum for the depth of book
p25DepthOfBook: the 25th percentile for the depth of book
p50DepthOfBook: the 50th percentile for the depth of book
p75DepthOfBook: the 75th percentile for the depth of book
p90DepthOfBook: the 90th percentile for the depth of book
p99DepthOfBook: the 99th percentile for the depth of book
p999DepthOfBook: the 99.9th percentile for the depth of book
p9999DepthOfBook: the 99.99th percentile for the depth of book
maxDepthOfBook: the maximum for the depth of book

Req2) Generate the same set of statistics documented on Req1, for each symbol.

Req3) Output to csv file these statistics.

Req4) Generate statistics per every event. Event is defined as the reception of a message that changes the order book (therefore a known message). The motivation for this requirement is: Suppose we want to handle 99.9% of the events with a fixed-sized array for the book depth instead of a hash table or a binary tree. How deep does the array need to be? What if I want to handle 95% of the events? The idea is to design a book class that makes very few allocations (allocations being expensive operations), and to exploit arrays because they are more efficient from a cache utilization perspective.

Req5) This is the second use case (Compute ITCH5 Inside Statistics, being the first) that extends the functionality currently implemented (as of 20161015). Therefore we do not intend to perform any refactoring on the code, indeed this project will implement the new functionality on classes copied (and renamed) from the current code base.

Create a class to represent stock prices

Stock prices can be easily represented by floating point numbers, but it is often better to represent them using fixed-point arithmetic. We want a representation that is not tied to ITCH-5.0 (so jb::itch5::price4_t is out).

When using sequences in a config_object the addition of values from the cmdline should be optional

When setting up an attribute that holds a sequence of config_objects the decision to either add at least one set of options or leave the sequence empty should be configurable via the attribute descriptor.

Modify the Scheduler to work with N asynchronous FFTW-based time_delay_estimator objects

A longer description can be found in the design doc.

If there is a conflict, the description in this issue overrides the description in the document.

Create class or function to normalize a batch of regular timeseries

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need a function (or classs) that takes an array of regular timeseries in sample_type[F][S][4][N] and normalizes the timeseries. That is, we find the average of sample_type[0][S][4][N] and substract that average from sample_type[1][S][4][N] and sample_type[0][S][4][N].

Setup weekly Coverity scan

Understand why using FIFO scheduling and higher priorities seems to slow benchmarks down

In an effort to make microbenchmarks more reproduceable I am setting the scheduling attributes of the main thread as part of the microbenchmark configuration. The results are completely counter to what I expected. Setting FIFO scheduling at the maximum priority produces slower results. For example, running a benchmark at regular priority produces these results:

make jb/itch5/bm_order_book && /usr/bin/time ./jb/itch5/bm_order_book --seed=3966899719 --microbenchmark.iterations=1000 --microbenchmark.test-case=array:buy
make: 'jb/itch5/bm_order_book' is up to date.
185410.804140 [0x00007f3a429a7740] [    INFO] Running benchmark for array:buy with SEED=3966899719 (../jb/itch5/bm_order_book.cpp:264)
185435.122173 [0x00007f3a429a7740] [    INFO] array:buy summary min=21700, p25=21988, p50=22028, p75=22102, p90=22143, p99=22314, p99.9=22692, max=22692, N=1000 (../jb/itch5/bm_order_book.cpp:272)
24.13user 0.00system 0:24.33elapsed 99%CPU (0avgtext+0avgdata 11680maxresident)k
0inputs+0outputs (0major+1360minor)pagefaults 0swaps

the same program at FIFO scheduling, produces:

$ make jb/itch5/bm_order_book && /usr/bin/time chrt -f 99 ./jb/itch5/bm_order_book --seed=3966899719 --microbenchmark.iterations=1000 --microbenchmark.test-case=array:buy
make: 'jb/itch5/bm_order_book' is up to date.
185557.798159 [0x00007f6335e52740] [    INFO] Running benchmark for array:buy with SEED=3966899719 (../jb/itch5/bm_order_book.cpp:264)
185651.996007 [0x00007f6335e52740] [    INFO] array:buy summary min=26098, p25=51618, p50=51806, p75=52197, p90=52565, p99=53500, p99.9=54764, max=54764, N=1000 (../jb/itch5/bm_order_book.cpp:272)
53.49user 0.15system 0:54.21elapsed 98%CPU (0avgtext+0avgdata 11996maxresident)k
0inputs+0outputs (0major+1424minor)pagefaults 0swaps

I have disabled real-time scheduling in the benchmarks until we can figure out what is happening here.

itch5inside and mold2inside are not reporting all inside changes for cancel/replace

I just realized that the code in jb::itch5::record_latency_stats() is ignoring cancel/replaces that move orders from the inside to outside the inside.

coryan / jaybeams Goto Github PK

jaybeams's Introduction

JayBeams Documentation

Motivation

Really, that is the motivation?

So where is the code?

What is with the name?

Licensing and Copyright Notice

jaybeams's People

Contributors

Stargazers

Watchers

Forkers

jaybeams's Issues

Recommend Projects

Recommend Topics

Recommend Org