Git Product home page Git Product logo

jaybeams's Introduction

JayBeams Documentation

Join the chat at https://gitter.im/jaybeams/Lobby Build Status codecov Codacy Badge Documentation

Another project to have fun coding.

The JayBeams library performs relative time delay estimation of market feeds. That is, given two feeds for the same inside quote data it estimates, in near real-time which one is faster, and by how much.

  • Licensing details are found in the LICENSE file.
  • The installation instructions are in the INSTALL file.

Motivation

The US equity markets have a lot of redundant feeds with basically the same information, or where one feed is a super set of a second set. When one has two feeds an interesting question is to know how much faster is one feed vs. the other, or rather, how much faster it is right now, because the latency changes over time. There are (usually) no message IDs or any other simple way to match events in one feed vs. events in the second feed. So the problem of "time-delay estimation" requires some heavier computation, and doing this in real-time is extra challenging. The current implementation is based on cross-correlation of the two signals, using FFTs on GPUs to efficiently implement the cross-correlations.

You can find more details about the motivation, and the performance requirements on my posts

Really, that is the motivation?

I confess, I wanted to learn how to program GPUs, and given my background this appeared as an interesting application.

So where is the code?

I am pushing the code slowly. I want to make sure it compiles and passes its tests on at least a couple of Linux variants. This can be a challenge given the dependency on OpenCL libraries.

What is with the name?

JayBeams is named after a WWII electronic warfare system. The name was selected more or less at random from the Wikipedia list of such systems, and is not meant to represent anything in particular. It sounds cool, in a Flash Gordon kind of way.

Licensing and Copyright Notice

Copyright 2015-2017 Carlos O'Ryan

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

jaybeams's People

Contributors

coryan avatar gfariasr avatar gitter-badger avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

jaybeams's Issues

Create a representation of inside_quote

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need a presentation of an inside_quote. The only changes from the design doc is that we do not want the types to be tied to the ITCH-5.0 representations.

We need to implement #57 and #58 first though.

Create a program to compute how deep are events

I am interested in computing how deep is each event in the book. That is, for each event (add, modify, or delete) we want to compute the number of price levels between the price of the order affected by that event and the price at the inside. The number of levels would be collected to report statistics about them.

We can modify an existing program or create a new one, both solutions are acceptable to me.

Change the travis.yml configuration so it can be shared with other users

At the end of this task the .travis.yml file will be shareable with other users that have forked the repository. Currently the configuration file defines several environment variables (GIT_NAME, GIT_EMAIL, GH_TOKEN), their values only make sense when the build is running out of the coryan/jaybeams repository. Those variables should be defined in the travis-ci.org configuration instead, making the file reusable by all.

Benchmark time to upload OpenCL buffers

At the end of this task we will have a benchmark to measure the time it takes to upload a buffer to an OpenCL device. The benchmark should estimate the delay of uploading a 0-sized buffer (by regression if necessary), as well as the cost per byte/word.

Change jb::itch5::decoder<> to use Boost.Endian

Boost.Endian provides a far cleaner and probably faster way to decode ITCH5 messages. Unfortunately, Ubuntu 14.04 (which I still want to support) includes boost-1.55, and Boost.Endian was introduced in boost-1.58.

I will keep the ugly code for now, it is fairly isolated anyway, and fix it once I have other motivations to abandon boost-1.55 (and probably Ubuntu 14.04 along the way).

Benchmark time to create OpenCL kernels

At the end of this task we will have a benchmark to measure the cost of launching an OpenCL kernel, including the cost per kernel when multiple kernels are sent in a pipeline. That is, a sequence of kernels is uploaded to an OpenCL device, but each kernel is scheduled to start once the previous one has finished.

Create configuration class and helper functions to create threads

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

I have some classes in my personal subversion repository to create threads with all kinds of configuration parameters (affinity, priority, scheduling class, etc). No sense in reinventing that wheel.

Create program to modify the timestamps in an inside feed file.

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We want to take an inside feed file, such as the ones generated by itch5inside, and modify the output timestamps by adding some noise. The range for the noise should be configurable. The noise should be such that no messages appear to be out of order in the output file.

Prototype time delay estimation using cross-correlation and GPUs

At the end of this task we will have an implementation of a time delay estimator that uses cross-correlation to estimate the delay, and pushes as much as the computation as possible to a GPU, including all FFTs and finding the maximum.

The objective is to compare the performance of this approach with the solution implemented in #7, to verify that GPUs do improve performance, it would be sad if they did not, but one has to check.

Change jb::fftw::time_delay_estimator to operate asynchronously

A longer description can be found in the design doc.

If there is a conflict, the description in this issue overrides the description in the document.

The initial version of jb::fftw::time_delay_estimator is expected to block until the TDE computation ends. To better approximate how GPUs operate, we will need to modify the class to perform the computations on a separate thread, and somehow (TBD) communicate the results back.

Create function to automatically pick the right FFTW flags

The jb::fftw::plan wrappers use a very conservative set of flags to feed into the planner. I think we can determine most of the flag values using the type system, for example, vectors and arrays that are properly aligned can stop using the FFTW_UNALIGNED flag.

Implement a program to gather per-symbol throughput statistics about the ITCH-5.0 inside

At the end of this task we will have a program that reads the output of itch5inside and computes:

  • The throughput distribution (per second, millisecond and microseconds) of the inside changes
  • The distribution of message inter-arrival times for inside changes
  • The distribution distribution per symbol (per second, millisecond and microsecond)
  • The distribution of message inter-arrival times per symbol.

The data should be output to stdout (or a file) in CSV or another format suitable for loading into R.

Create class / function to sample irregular inside_quote timeseries into a regular timeseries

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need a function that takes a std::vector<inside_quote> as input and produces a array of samples in float_type[F][S][4][N] as output. The function should be generic on the float_type. The function must take F, S, and N as parameters. The function must accept T0 (the initial timestamp for the timeseries range), and T (the size of each timeseries sample) as parameters.

The function should have the usual unit tests. We should probably have a benchmark also.

Create an efficient representation for a stock or security

We need a representation for a stock or security. Nothing too elaborate, but efficient enough to be used in the implementation of time delay estimators, and not specifically tied to ITCH-5.0 (i.e. jb::itch5::stock_t is not what we need).

Benchmark time to download an OpenCL buffer

At the end of this task we need a benchmark to measure the time necessary to download an OpenCL buffer from the device to the host, including an estimate of the cost for a 0-sized buffer and the cost per byte/word.

Create classes to represent batches of timeseries

A longer description can be found in the design doc.

If there is a conflict, the description in this issue overrides the description in the document.

We want nice wrappers to represent a k-dimensional array of timeseries. That way we can iterate over the timeseries easily, we can pass the whole array to the FFTW wrappers and they know how to interpret it, etc. We want to be able to do things like this:

typedef dimension<1> d1;
typedef dimension<3> index_dimension;
class timeseries_array<index_dimension,d1,aligned_vector<float>> { /* stuff */ };

index_dimension batch_rank(F, S, 4);
timeseries_dimension ts_rank(N);

timeseries_array a(batch_rank, ts_rank);

for (auto idx : batch_rank) {
  float * buffer = a(idx);
  for (auto i : ts_rank) {
    auto value = a(idx, i);
  }
}

auto plan = jb::fftw::create_plan_forward(a);

Create robot account to push the automatically generated documentation.

With the current configuration the Travis builds are using my account to commit and push any documentation changes. I think it would be better to use a robot account for these purposes, however, I do not immediately know how to do this, and it is not important enough to slow down the other changes.

itch5bookdepth may be under estimating the depth of book

Consider a book with the following price levels on the BUY side:

10.05
10.04
10.01
10.00

If we want to use an array to store the book, we would need 6 positions in the array. However itch5bookdepth would report that only 4 levels are needed.

Implement a program to compute the inside of the ITCH-5.0 feed

At the end of this task we will have a program that computes the inside for an ITCH-5.0 file. The output would be an ASCII file with the following information per-line:

  • The original timestamp of the ITCH-5.0 message that triggered the inside update.
  • The security identifier (symbol)
  • The bid and offer (both price and quantities)
  • Prices should be represented as jb::inside::price4_t values.

Fields should be separated by spaces.

Implement a time delay estimator based on FFTW

At the end of this task we will have an implementation of a time delay estimator that uses cross-correlation to estimate the delay, and uses the FFTW library to compute the FFT and FFT inverse.

If possible, we will try to vectorize the computation of the maximum value.

The objective is to get some empirical numbers about computational costs.

Create jb::fftw::plan for batches of timeseries

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need to create FFTW plans that can operate on batches of timeseries. That is, given a vector of samples, for example: std::vector<float>, and a description of how to interpret the vector as an array (say a class describing the number of dimensions and the rank of each dimension, or maybe simply two integers M and N). We need to create a FFTW plan to compute the FFT (and the inverse FFT) treating the vector as an array of value_type[M][N]. The functions to create the plans should verify that the dimensions are correct. The plan may take into account whether the vectors are properly aligned for vectorized operations.

Compute ITCH5 book depth statistics

We should create a project with the scope proposed below. I do not have permissions to create anything but tickets in your repo. In my fork repo I can't create any ticket, which make sense (pointing into the direction of: this is a fork, work on the tickets documented on the actual repo), but I can see them (like this one for instance).

So, if you create the project (or allow me somehow to do so) I can create my tickets here, assigned them the project. I think with this should be enough organization until we understand the common practices.

The project is to create a new program, based on: - https://github.com/coryan/jaybeams/blob/master/tools/itch5stats.cpp - https://github.com/coryan/jaybeams/blob/master/tools/itch5inside.cpp

that generates statistics about depth of book: - What is the maximum number of levels across all symbols? - What is the p95 of depth? - The median? - What about per-symbol, what is the maximum for each symbol? The p95 for each symbol?

The program should output a CSV file that answers all those questions.

Requirements:

Req1) Generate the following statistics for Depth of Book aggregated by symbols:

  • minDepthOfBook: the minimum for the depth of book
  • p25DepthOfBook: the 25th percentile for the depth of book
  • p50DepthOfBook: the 50th percentile for the depth of book
  • p75DepthOfBook: the 75th percentile for the depth of book
  • p90DepthOfBook: the 90th percentile for the depth of book
  • p99DepthOfBook: the 99th percentile for the depth of book
  • p999DepthOfBook: the 99.9th percentile for the depth of book
  • p9999DepthOfBook: the 99.99th percentile for the depth of book
  • maxDepthOfBook: the maximum for the depth of book

Req2) Generate the same set of statistics documented on Req1, for each symbol.

Req3) Output to csv file these statistics.

Req4) Generate statistics per every event. Event is defined as the reception of a message that changes the order book (therefore a known message). The motivation for this requirement is: Suppose we want to handle 99.9% of the events with a fixed-sized array for the book depth instead of a hash table or a binary tree. How deep does the array need to be? What if I want to handle 95% of the events? The idea is to design a book class that makes very few allocations (allocations being expensive operations), and to exploit arrays because they are more efficient from a cache utilization perspective.

Req5) This is the second use case (Compute ITCH5 Inside Statistics, being the first) that extends the functionality currently implemented (as of 20161015). Therefore we do not intend to perform any refactoring on the code, indeed this project will implement the new functionality on classes copied (and renamed) from the current code base.

Create a class to represent stock prices

Stock prices can be easily represented by floating point numbers, but it is often better to represent them using fixed-point arithmetic. We want a representation that is not tied to ITCH-5.0 (so jb::itch5::price4_t is out).

Create class or function to normalize a batch of regular timeseries

A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.

We need a function (or classs) that takes an array of regular timeseries in sample_type[F][S][4][N] and normalizes the timeseries. That is, we find the average of sample_type[0][S][4][N] and substract that average from sample_type[1][S][4][N] and sample_type[0][S][4][N].

Understand why using FIFO scheduling and higher priorities seems to slow benchmarks down

In an effort to make microbenchmarks more reproduceable I am setting the scheduling attributes of the main thread as part of the microbenchmark configuration. The results are completely counter to what I expected. Setting FIFO scheduling at the maximum priority produces slower results. For example, running a benchmark at regular priority produces these results:

make jb/itch5/bm_order_book && /usr/bin/time ./jb/itch5/bm_order_book --seed=3966899719 --microbenchmark.iterations=1000 --microbenchmark.test-case=array:buy
make: 'jb/itch5/bm_order_book' is up to date.
185410.804140 [0x00007f3a429a7740] [    INFO] Running benchmark for array:buy with SEED=3966899719 (../jb/itch5/bm_order_book.cpp:264)
185435.122173 [0x00007f3a429a7740] [    INFO] array:buy summary min=21700, p25=21988, p50=22028, p75=22102, p90=22143, p99=22314, p99.9=22692, max=22692, N=1000 (../jb/itch5/bm_order_book.cpp:272)
24.13user 0.00system 0:24.33elapsed 99%CPU (0avgtext+0avgdata 11680maxresident)k
0inputs+0outputs (0major+1360minor)pagefaults 0swaps

the same program at FIFO scheduling, produces:

$ make jb/itch5/bm_order_book && /usr/bin/time chrt -f 99 ./jb/itch5/bm_order_book --seed=3966899719 --microbenchmark.iterations=1000 --microbenchmark.test-case=array:buy
make: 'jb/itch5/bm_order_book' is up to date.
185557.798159 [0x00007f6335e52740] [    INFO] Running benchmark for array:buy with SEED=3966899719 (../jb/itch5/bm_order_book.cpp:264)
185651.996007 [0x00007f6335e52740] [    INFO] array:buy summary min=26098, p25=51618, p50=51806, p75=52197, p90=52565, p99=53500, p99.9=54764, max=54764, N=1000 (../jb/itch5/bm_order_book.cpp:272)
53.49user 0.15system 0:54.21elapsed 98%CPU (0avgtext+0avgdata 11996maxresident)k
0inputs+0outputs (0major+1424minor)pagefaults 0swaps

I have disabled real-time scheduling in the benchmarks until we can figure out what is happening here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.