This repository contains the code and datasets for creating the machine learning models in the research paper titled "Time-series forecasting of Bitcoin prices using high-dimensional features: a machine learning approach"
After creating the the master BTC_Data.csv file, it needs to be broken down into the respective indicator files for the different intervals (1, 2, 3) and periods (1, 7, 30, 90 days etc). There seems to be a loose framework for the interval interval file generation in the Feature_Selection notebooks, but I just want to confirm the methodology before proceeding.
Do you already have this code in a loop that will generate each file automatically, or do the notebooks require manually editing for each iteration? If the latter, can you please clarify which lines need to be updated in Feature_Collection_reg.ipynb and Feature_Collection_cls.ipynb to generate all the different combinations of technical indicators on each run?
Hello,
I would like to ask you when you use the LSTM algorithm , do you use timestep=1 in the input shape of LSTM? LSTM models shouldn't have a bigger timestep because of their memory state?
Secondly, for the n-th day prediction, this timestep shouldn't be n? For example when we predict the 7th day, shouldn't we use timestep=7? What is the difference timestep=1 for one day forecast and for 7-th day forecast?
Assalamualaikum Warahmatullahi Wabarakatuh and Hello,
I've read the paper regarding this github file, and it says "Removing about 10% of the outliers increased model performance for most of the ML models. A few models performed well despite the outliers". But I'm unable to find the code regarding the outlier itself.
I'm currently doing a thesis for my undergraduate degree with the same topic, may I ask you what exactly did you do to the outlier itself? Or better yet may I ask for the code?
Thank you before hand, wassalamualaikum warahmatullahi wabarakatuh
I'm trying to reproduce you're results as indicated in the Feature_Selection_reg notebook but am finding that I'm getting slightly different results starting with running X=cmns.drop_high_vif(df_reduced,thresh=5) on line 130, even though I'm using the same BTC_Data_736_features_raw.csv file that was available in commit b80f8913e0. My guess is this is coming from slightly different versions of Python (I'm running 3.8) and related packages compared to what was in your manuscript.
Do you have an Anaconda environment (or other virtualenv) file from your original workflow that can be shared, so I can better understand how these discrepancies are arising?
I found your manuscript for this repository to be really interesting - thanks for publishing! I'm now trying to independently recreate the results to better understand how LSTM and Keras work within Python.
It appears that a small fix for datacollector.py is required for scraping from bitinfocharts.com due to changes on the remote side. Line 100 should now be values=soup.find_all('script')[4].string when using Python 3.8 and BS4 >=4.9.3.
I have a question regarding jupyter notebooks, for example "Training_LSTM_cls.ipynb",
there are read functions called on files not present in repository (for instance "pca_75_clas.csv"),
is there a way to obtain them by running other part of code?
Within the feature selection and training notebooks, the import statements specifically reference a commons package or file which doesn't appear to be available within the repo:
import commons as cmns
A quick search through the pipand conda libraries suggests this is a custom package - can you please clarify and supply if available?
In figure 9 of your paper, you show test results of forecasting prices after 31-12-2019.
I can find no code related to this in the notebooks. The biggest dataset I can find goes up until 2-2-2020, while the graph is until 5-2020.
Could you upload your remaining code?