coryan / jaybeams Goto Github PK
View Code? Open in Web Editor NEWJayBeams: A Project to have fun Coding, and maybe measure relative delays in market feeds
License: Apache License 2.0
JayBeams: A Project to have fun Coding, and maybe measure relative delays in market feeds
License: Apache License 2.0
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
I have some code in my personal SVN repository, we simply need to resurrect it (and its tests).
At the end of this task we will have (a) a design of how to compare the latency of accessing memory mapped buffers from the host, (b) a benchmark to implement such design.
Or maybe the output should be a structure showing the maximum and the areas of the timeseries under analysis, or maybe a functor that makes the decision (with some default). We will see how this is used and check later.
The jb::fftw::plan wrappers use a very conservative set of flags to feed into the planner. I think we can determine most of the flag values using the type system, for example, vectors and arrays that are properly aligned can stop using the FFTW_UNALIGNED flag.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
This will be a very simple version of the program: single FFTW-based time-delay estimator, blocking (not asynchronous or multithreaded) computations.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We need a function that takes a std::vector<inside_quote> as input and produces a array of samples in float_type[F][S][4][N] as output. The function should be generic on the float_type. The function must take F, S, and N as parameters. The function must accept T0 (the initial timestamp for the timeseries range), and T (the size of each timeseries sample) as parameters.
The function should have the usual unit tests. We should probably have a benchmark also.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We need a presentation of an inside_quote. The only changes from the design doc is that we do not want the types to be tied to the ITCH-5.0 representations.
The computation of max/argmax is jb::fftw::time_delay_estimator probably could benefit from SSE2-type instructions. I say probably because I do not know for certain. This would need some good benchmarks to convince myself that it does help.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
With the current configuration the Travis builds are using my account to commit and push any documentation changes. I think it would be better to use a robot account for these purposes, however, I do not immediately know how to do this, and it is not important enough to slow down the other changes.
At the end of this task we will have a benchmark to measure the cost of launching an OpenCL kernel, including the cost per kernel when multiple kernels are sent in a pipeline. That is, a sequence of kernels is uploaded to an OpenCL device, but each kernel is scheduled to start once the previous one has finished.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
The initial version of jb::fftw::time_delay_estimator is expected to block until the TDE computation ends. To better approximate how GPUs operate, we will need to modify the class to perform the computations on a separate thread, and somehow (TBD) communicate the results back.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We need a function (or classs) that takes an array of regular timeseries in sample_type[F][S][4][N]
and normalizes the timeseries. That is, we find the average of sample_type[0][S][4][N]
and substract that average from sample_type[1][S][4][N]
and sample_type[0][S][4][N]
.
Consider a book with the following price levels on the BUY side:
10.05
10.04
10.01
10.00
If we want to use an array to store the book, we would need 6 positions in the array. However itch5bookdepth would report that only 4 levels are needed.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We want to take an inside feed file, such as the ones generated by itch5inside, and modify the output timestamps by adding some noise. The range for the noise should be configurable. The noise should be such that no messages appear to be out of order in the output file.
At the end of this task we need a benchmark to measure the time necessary to download an OpenCL buffer from the device to the host, including an estimate of the cost for a 0-sized buffer and the cost per byte/word.
At the end of this task we will have an implementation of a time delay estimator that uses cross-correlation to estimate the delay, and pushes as much as the computation as possible to a GPU, including all FFTs and finding the maximum.
The objective is to compare the performance of this approach with the solution implemented in #7, to verify that GPUs do improve performance, it would be sad if they did not, but one has to check.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
In principle the queue should have a (configurable) limited capacity. It should be possible to efficiently remove many entities for the queue. The queue should be strictly FIFO. The first implementation can use a simple mutex/condition-variable combo.
Boost.Endian provides a far cleaner and probably faster way to decode ITCH5 messages. Unfortunately, Ubuntu 14.04 (which I still want to support) includes boost-1.55, and Boost.Endian was introduced in boost-1.58.
I will keep the ugly code for now, it is fairly isolated anyway, and fix it once I have other motivations to abandon boost-1.55 (and probably Ubuntu 14.04 along the way).
We need a representation for a stock or security. Nothing too elaborate, but efficient enough to be used in the implementation of time delay estimators, and not specifically tied to ITCH-5.0 (i.e. jb::itch5::stock_t is not what we need).
We will use C++ 11 style of 'using blah = foo;' instead of 'typedef foo blah;'
Cleanup old typedefs
Create a document describing the "style guide" that is not enforced by clang-format.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
I have some classes in my personal subversion repository to create threads with all kinds of configuration parameters (affinity, priority, scheduling class, etc). No sense in reinventing that wheel.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
I accidentally used 'int' when the C library uses unsigned. It is confusing to change the type back and forth. We way want to consider using the C++ 11 enums for this purpose too.
I am interested in computing how deep is each event in the book. That is, for each event (add, modify, or delete) we want to compute the number of price levels between the price of the order affected by that event and the price at the inside. The number of levels would be collected to report statistics about them.
We can modify an existing program or create a new one, both solutions are acceptable to me.
At the end of this task we will have an implementation of a time delay estimator that uses cross-correlation to estimate the delay, and uses the FFTW library to compute the FFT and FFT inverse.
If possible, we will try to vectorize the computation of the maximum value.
The objective is to get some empirical numbers about computational costs.
At the end of this task we will have a document describing all the steps to setup a build for jaybeams on travis-ci.org, including code-coverage on coveralls.io
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
At the end of this task we will have a program that reads the output of itch5inside and computes:
The data should be output to stdout (or a file) in CSV or another format suitable for loading into R.
We should create a project with the scope proposed below. I do not have permissions to create anything but tickets in your repo. In my fork repo I can't create any ticket, which make sense (pointing into the direction of: this is a fork, work on the tickets documented on the actual repo), but I can see them (like this one for instance).
So, if you create the project (or allow me somehow to do so) I can create my tickets here, assigned them the project. I think with this should be enough organization until we understand the common practices.
The project is to create a new program, based on: - https://github.com/coryan/jaybeams/blob/master/tools/itch5stats.cpp - https://github.com/coryan/jaybeams/blob/master/tools/itch5inside.cpp
that generates statistics about depth of book: - What is the maximum number of levels across all symbols? - What is the p95 of depth? - The median? - What about per-symbol, what is the maximum for each symbol? The p95 for each symbol?
The program should output a CSV file that answers all those questions.
Requirements:
Req1) Generate the following statistics for Depth of Book aggregated by symbols:
Req2) Generate the same set of statistics documented on Req1, for each symbol.
Req3) Output to csv file these statistics.
Req4) Generate statistics per every event. Event is defined as the reception of a message that changes the order book (therefore a known message). The motivation for this requirement is: Suppose we want to handle 99.9% of the events with a fixed-sized array for the book depth instead of a hash table or a binary tree. How deep does the array need to be? What if I want to handle 95% of the events? The idea is to design a book class that makes very few allocations (allocations being expensive operations), and to exploit arrays because they are more efficient from a cache utilization perspective.
Req5) This is the second use case (Compute ITCH5 Inside Statistics, being the first) that extends the functionality currently implemented (as of 20161015). Therefore we do not intend to perform any refactoring on the code, indeed this project will implement the new functionality on classes copied (and renamed) from the current code base.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We need to create FFTW plans that can operate on batches of timeseries. That is, given a vector of samples, for example: std::vector<float>
, and a description of how to interpret the vector as an array (say a class describing the number of dimensions and the rank of each dimension, or maybe simply two integers M and N). We need to create a FFTW plan to compute the FFT (and the inverse FFT) treating the vector as an array of value_type[M][N]
. The functions to create the plans should verify that the dimensions are correct. The plan may take into account whether the vectors are properly aligned for vectorized operations.
In an effort to make microbenchmarks more reproduceable I am setting the scheduling attributes of the main thread as part of the microbenchmark configuration. The results are completely counter to what I expected. Setting FIFO scheduling at the maximum priority produces slower results. For example, running a benchmark at regular priority produces these results:
make jb/itch5/bm_order_book && /usr/bin/time ./jb/itch5/bm_order_book --seed=3966899719 --microbenchmark.iterations=1000 --microbenchmark.test-case=array:buy
make: 'jb/itch5/bm_order_book' is up to date.
185410.804140 [0x00007f3a429a7740] [ INFO] Running benchmark for array:buy with SEED=3966899719 (../jb/itch5/bm_order_book.cpp:264)
185435.122173 [0x00007f3a429a7740] [ INFO] array:buy summary min=21700, p25=21988, p50=22028, p75=22102, p90=22143, p99=22314, p99.9=22692, max=22692, N=1000 (../jb/itch5/bm_order_book.cpp:272)
24.13user 0.00system 0:24.33elapsed 99%CPU (0avgtext+0avgdata 11680maxresident)k
0inputs+0outputs (0major+1360minor)pagefaults 0swaps
the same program at FIFO scheduling, produces:
$ make jb/itch5/bm_order_book && /usr/bin/time chrt -f 99 ./jb/itch5/bm_order_book --seed=3966899719 --microbenchmark.iterations=1000 --microbenchmark.test-case=array:buy
make: 'jb/itch5/bm_order_book' is up to date.
185557.798159 [0x00007f6335e52740] [ INFO] Running benchmark for array:buy with SEED=3966899719 (../jb/itch5/bm_order_book.cpp:264)
185651.996007 [0x00007f6335e52740] [ INFO] array:buy summary min=26098, p25=51618, p50=51806, p75=52197, p90=52565, p99=53500, p99.9=54764, max=54764, N=1000 (../jb/itch5/bm_order_book.cpp:272)
53.49user 0.15system 0:54.21elapsed 98%CPU (0avgtext+0avgdata 11996maxresident)k
0inputs+0outputs (0major+1424minor)pagefaults 0swaps
I have disabled real-time scheduling in the benchmarks until we can figure out what is happening here.
At the end of this task the jb::fftw::plan class would only accept the vector types used to create it in the execute() function. It should resemble the (still under development) jb::clfft::plan template class.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We need an object / function / something to receive the results of the asynchronous computations.
The default image on travis-ci.org has Doxygen 1.7.*, which does not support Markdown and a number of other features. I would need to install an updated version (anything over 1.8.9 would probably do).
When setting up an attribute that holds a sequence of config_objects the decision to either add at least one set of options or leave the sequence empty should be configurable via the attribute descriptor.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We want nice wrappers to represent a k-dimensional array of timeseries. That way we can iterate over the timeseries easily, we can pass the whole array to the FFTW wrappers and they know how to interpret it, etc. We want to be able to do things like this:
typedef dimension<1> d1;
typedef dimension<3> index_dimension;
class timeseries_array<index_dimension,d1,aligned_vector<float>> { /* stuff */ };
index_dimension batch_rank(F, S, 4);
timeseries_dimension ts_rank(N);
timeseries_array a(batch_rank, ts_rank);
for (auto idx : batch_rank) {
float * buffer = a(idx);
for (auto i : ts_rank) {
auto value = a(idx, i);
}
}
auto plan = jb::fftw::create_plan_forward(a);
Stock prices can be easily represented by floating point numbers, but it is often better to represent them using fixed-point arithmetic. We want a representation that is not tied to ITCH-5.0 (so jb::itch5::price4_t is out).
I just realized that the code in jb::itch5::record_latency_stats() is ignoring cancel/replaces that move orders from the inside to outside the inside.
At the end of this task we will have a benchmark to measure the time it takes to upload a buffer to an OpenCL device. The benchmark should estimate the delay of uploading a 0-sized buffer (by regression if necessary), as well as the cost per byte/word.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We need a function to compute a single estimate, just a weighted average as described in the design doc will do.
The code is fairly naive, it works for simple YAML nodes, but not for very complex ones. Needs fixing.
At the end of this task the .travis.yml
file will be shareable with other users that have forked the repository. Currently the configuration file defines several environment variables (GIT_NAME
, GIT_EMAIL
, GH_TOKEN
), their values only make sense when the build is running out of the coryan/jaybeams
repository. Those variables should be defined in the travis-ci.org configuration instead, making the file reusable by all.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
We need to compute the TDE (FFTW + Convo in Frequency Domain + FFT inverse + find max) using FFTW for a batch of timeseries.
At the end of this task we will have a program that computes the inside for an ITCH-5.0 file. The output would be an ASCII file with the following information per-line:
Fields should be separated by spaces.
A longer description can be found in the design doc.
If there is a conflict, the description in this issue overrides the description in the document.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.