store and analyze binance crytocurrency exchange trade data.
- tectonicdb initial test, store only btcusdt trade tick data [100%]
- tectonicdb stress test, store all trade pairs [100%]
- tectonicdb custom datastore format, to allow more info per row [10%]
- store orderbook at any moment [0%]
- map tectonicdb datastore into traditional database, allow time search [0%]
- use plotly dash to visualize data [0%]
- spin up server to host database [100%]
- chaos engineering and fault tolerance [0%]
- backup server [0%]
- docker image [0%]
- ml or dl to analyze data [0%]
- simple logging [100%]
if you are on free tier, pick 30gb storage, the entire installation should take about 16gb this gives you some leftover storage to store data
sudo apt-get update
sudo apt-get upgrade
install rust: https://www.rust-lang.org/en-US/install.html
clone tectonicdb, my branch has google cloud upload turned off: https://github.com/mingrui/tectonicdb.git
master branch: https://github.com/rickyhan/tectonicdb
before building tectonicdb, install gcc: https://gist.github.com/application2000/73fd6f4bf1be6600a2cf9f56315a2d91
GCC 8.1.0 on Ubuntu 14.04 & 16.04 & 18.04:
sudo apt-get update -y &&
sudo apt-get upgrade -y &&
sudo apt-get dist-upgrade -y &&
sudo apt-get install build-essential software-properties-common -y &&
sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y &&
sudo apt-get update -y &&
sudo apt-get install gcc-8 g++-8 -y &&
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 60 --slave /usr/bin/g++ g++ /usr/bin/g++-8 &&
sudo update-alternatives --config gcc
select gcc-8
Install OpenSSL, after installation reboot EC2 instance, otherwise OpenSSL path will not register. https://github.com/sfackler/rust-openssl
sudo apt-get install pkg-config libssl-dev
Setup rust nightly: https://github.com/rust-lang-nursery/rustup.rs#working-with-nightly-rust
Build tectonicdb
install anaconda: https://conda.io/docs/user-guide/install/linux.html
mainly using this for the easy to use tools that come along with it.
https://github.com/fastai/fastai
source activate fastai
, then install this binance api helper
https://github.com/sammchardy/python-binance
clone this repo
export gpg key: https://askubuntu.com/questions/648857/how-to-share-the-public-openpgp-key-using-gnupg
download key file from ec2: https://stackoverflow.com/questions/9441008/how-can-i-download-a-file-from-ec2
when creating a new gpg key on your aws, you need to use a DIFFERENT email address, otherwise git secret reveal
will not work
To tell someone a secret, make sure you have gawk installed sudo apt-get install gawk
remember to use fastai
source activate fastai
change liblibtectonic.so location to use absolute path, in ffi.py
lib_path = path.normpath(path.join(cwd, '/mnt/960EVO/workspace/tectonicdb/target/debug/liblibtectonic.so'))
python binance_orderbook.py
install https://github.com/jupyterlab/jupyterlab
to connect to jupyter lab running on ec2 instance https://towardsdatascience.com/setting-up-and-using-jupyter-notebooks-on-aws-61a9648db6c5
Use tmux
to manage your terminal windows
tectonic-server
is built to tectonicdb/target/debug
. For tectonic-server, i just set write interval to 1000 rows / flush
(flush is tectonicdb's term for batch write operation) ./tectonic-server -vv -a -i 1000
tectonicdb: https://github.com/rickyhan/tectonicdb
Remember to set tectonicdb upload to gcloud delete after upload to false
python-binance: https://github.com/sammchardy/python-binance
binance websocket api
fastai: https://github.com/fastai/fastai
For python environment, I just use the same as fastai, I find the script for setting up conda env extremely easy to use, and basically for any data science project, you have almost all the libraries that you will need.
git-secret: http://git-secret.io/
Use this to store your api keys
Dash by Plotly https://plot.ly/products/dash/
//!
//! File format for Dense Tick Format (DTF)
//!
//!
//! File Spec:
//! Offset 00: ([u8; 5]) magic value 0x4454469001
//! Offset 05: ([u8; 20]) Symbol
//! Offset 25: (u64) number of records
//! Offset 33: (u32) max ts
//! Offset 80: -- records - see below --
//!
//!
//! Record Spec:
//! Offset 81: bool for `is_snapshot`
//! 1. if is true
//! 4 bytes (u32): reference ts
//! 2 bytes (u32): reference seq
//! 2 bytes (u16): how many records between this snapshot and the next snapshot
//! 2. record
//! dts (u16): $ts - reference ts$, 2^16 = 65536 - ~65 seconds
//! dseq (u8) $seq - reference seq$ , 2^8 = 256
//! `is_trade & is_bid`: (u8): bitwise and to store two bools in one byte
//! price: (f32)
//! size: (f32)
-
Each
record
is ordered into rows, dts is u16, can store about 65 seconds. This might be problematic for infrequent token pairs. Need to insert empty rows if 65 seconds limit is reached. -
Convert row datastore into column datastore