Git Product home page Git Product logo

stock-volatility-google-trends's Introduction

Deep Learning Stock Volatility with Google Domestic Trends

Tensorflow/Keras implementation of the [paper].


Trend displayed in Google Domestic Trends

Status?

Work accomplished so far:

  • End-to-end implementation
  • Test the data workflow pipeline
  • Sanity check of the model
  • Train the models
  • Reproduction of the results in the paper

Plot


MAPE of the train, val and test along with dummy (benchmark future value = last value)

A new predictor is added every 600 epochs. We start with only the historical volatility as a predictor. Then, at 600 epochs, the second predictor is added: returns. At 1200, we had Trend COMPUT, 1800 Trend CRCARD, 2400 Trend INVEST and so forth.

We realized that we started to overfit after more than 5 predictors. The dataset is indeed incredibly small.

The lowest model MAPE loss on the test set coincide with the lowest MAPE loss on the validation set. They both have a comparable value (around 25), in agreement with the results found in the paper.

As a summary, the 5 best predictors are:

  • sigma (volatility)
  • returns
  • Trend COMPUT
  • Trend CRCARD
  • Trend INVEST

As a conclusion, I'm a bit curious to know how they trained a model with 30 predictors on a such tiny dataset. They don't mention any information related to the model. It's just composed of a LSTM layer. I'm also a bit skeptical about this approach in general.

How to run it?

# might require python3.6.
git clone https://github.com/philipperemy/stock-volatility-google-trends.git svgt
cd svgt
pip3 install -r requirements.txt
python3 run_model.py

stock-volatility-google-trends's People

Contributors

philipperemy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stock-volatility-google-trends's Issues

Key Error with Numpy

I'm trying to run this code but I keep getting:

Traceback (most recent call last):
File "/Users/username/stock-volatility-google-trends/run_model.py", line 27, in
sigma_mean = d['tr_col_mean']
File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 237, in getitem
raise KeyError("%s is not a file in the archive" % key)
KeyError: 'tr_col_mean is not a file in the archive'

I tried running it in python 3.5 and 3.6 and I updated Numpy but doesnt work. Is it an issue with the .npz file missing a column? Thanks. Really cool repo!

test MAPE around 400 not 40?

Hi Philippe!
I am grateful for your replication of the paper !

Here is my question:
I clone your code and basically not change any code( just quote some pd.set_option column in data_reader.py) and parse run_model.py file into jupyter-notebook and execute it there.

I am using tf-nightly-gpu dev20190205 and keras 2.2.4. Keras backend is tensorflow.

The result of the first predictor seems fine.

Now we have 1/29 predictors.
[0000] test = 400.717, test_dummy = 35.730, train = 250.169, val = 361.254.
[0001] test = 229.488, test_dummy = 35.730, train = 196.438, val = 218.495.
[0002] test = 95.848, test_dummy = 35.730, train = 128.429, val = 67.883.
[0003] test = 131.620, test_dummy = 35.730, train = 86.672, val = 87.068.
[0004] test = 127.860, test_dummy = 35.730, train = 83.004, val = 86.102.
[0005] test = 121.149, test_dummy = 35.730, train = 79.777, val = 83.238.
[0006] test = 117.143, test_dummy = 35.730, train = 76.550, val = 82.130.
[0007] test = 116.458, test_dummy = 35.730, train = 73.160, val = 83.352.
[0008] test = 109.620, test_dummy = 35.730, train = 69.687, val = 80.677.
[0009] test = 96.495, test_dummy = 35.730, train = 65.771, val = 74.193.
[0010] test = 89.862, test_dummy = 35.730, train = 61.930, val = 72.068.
[0011] test = 87.194, test_dummy = 35.730, train = 57.443, val = 72.924.
[0012] test = 82.010, test_dummy = 35.730, train = 53.082, val = 72.291.
[0013] test = 70.244, test_dummy = 35.730, train = 50.106, val = 67.062.
[0014] test = 75.681, test_dummy = 35.730, train = 48.663, val = 72.526.
[0015] test = 64.220, test_dummy = 35.730, train = 47.879, val = 66.390.
[0016] test = 64.904, test_dummy = 35.730, train = 47.895, val = 67.356.
[0017] test = 70.042, test_dummy = 35.730, train = 47.376, val = 70.178.
[0018] test = 72.951, test_dummy = 35.730, train = 47.125, val = 72.526.
[0019] test = 62.617, test_dummy = 35.730, train = 47.048, val = 66.888.
[0020] test = 62.307, test_dummy = 35.730, train = 47.006, val = 66.946.
[0021] test = 63.614, test_dummy = 35.730, train = 46.580, val = 67.927.
[0022] test = 64.639, test_dummy = 35.730, train = 46.480, val = 68.664.
[0023] test = 63.164, test_dummy = 35.730, train = 46.315, val = 68.060.
[0024] test = 62.632, test_dummy = 35.730, train = 46.085, val = 67.720.
[0025] test = 63.270, test_dummy = 35.730, train = 46.297, val = 68.020.
[0026] test = 57.431, test_dummy = 35.730, train = 45.896, val = 64.943.
[0027] test = 62.448, test_dummy = 35.730, train = 45.619, val = 67.574.
[0028] test = 62.062, test_dummy = 35.730, train = 45.559, val = 67.545.
[0029] test = 66.334, test_dummy = 35.730, train = 45.404, val = 69.701.
[0030] test = 61.042, test_dummy = 35.730, train = 45.344, val = 66.581.
[0031] test = 66.224, test_dummy = 35.730, train = 45.410, val = 69.639.
[0032] test = 62.496, test_dummy = 35.730, train = 45.056, val = 67.591.
[0033] test = 56.331, test_dummy = 35.730, train = 44.912, val = 64.265.
[0034] test = 66.461, test_dummy = 35.730, train = 44.732, val = 69.361.
[0035] test = 63.552, test_dummy = 35.730, train = 44.615, val = 67.774.
[0036] test = 61.630, test_dummy = 35.730, train = 44.543, val = 66.960.
[0037] test = 57.977, test_dummy = 35.730, train = 44.369, val = 64.616.
[0038] test = 59.852, test_dummy = 35.730, train = 44.320, val = 65.562.
[0039] test = 60.299, test_dummy = 35.730, train = 44.227, val = 65.858.
[0040] test = 64.031, test_dummy = 35.730, train = 44.120, val = 67.900.
[0041] test = 60.272, test_dummy = 35.730, train = 43.909, val = 65.952.
[0042] test = 62.131, test_dummy = 35.730, train = 43.859, val = 66.975.
[0043] test = 57.825, test_dummy = 35.730, train = 43.709, val = 64.676.
[0044] test = 57.408, test_dummy = 35.730, train = 43.567, val = 64.322.
[0045] test = 64.868, test_dummy = 35.730, train = 43.395, val = 68.591.
[0046] test = 59.758, test_dummy = 35.730, train = 43.315, val = 65.871.
[0047] test = 63.039, test_dummy = 35.730, train = 43.768, val = 67.353.
[0048] test = 62.418, test_dummy = 35.730, train = 43.128, val = 67.107.
[0049] test = 60.137, test_dummy = 35.730, train = 43.138, val = 65.651.
[0050] test = 59.793, test_dummy = 35.730, train = 42.966, val = 65.658.
[0051] test = 62.317, test_dummy = 35.730, train = 42.779, val = 67.082.
[0052] test = 54.273, test_dummy = 35.730, train = 42.689, val = 62.452.
[0053] test = 58.119, test_dummy = 35.730, train = 42.697, val = 64.138.
[0054] test = 56.406, test_dummy = 35.730, train = 42.457, val = 63.394.
[0055] test = 58.466, test_dummy = 35.730, train = 42.403, val = 64.424.
[0056] test = 55.233, test_dummy = 35.730, train = 42.242, val = 62.729.
[0057] test = 56.947, test_dummy = 35.730, train = 42.130, val = 63.660.
[0058] test = 59.724, test_dummy = 35.730, train = 42.152, val = 64.817.
[0059] test = 53.021, test_dummy = 35.730, train = 41.852, val = 61.323.
[0060] test = 51.008, test_dummy = 35.730, train = 41.977, val = 60.155.
[0061] test = 54.600, test_dummy = 35.730, train = 41.845, val = 61.994.
[0062] test = 49.409, test_dummy = 35.730, train = 41.783, val = 59.586.
...

However,

Now we have 27/29 predictors.
[0000] test = 353.166, test_dummy = 35.730, train = 29.346, val = 144.498.
[0001] test = 350.791, test_dummy = 35.730, train = 19.965, val = 145.339.
[0002] test = 351.112, test_dummy = 35.730, train = 16.450, val = 151.287.
[0003] test = 349.421, test_dummy = 35.730, train = 14.289, val = 152.010.
[0004] test = 353.675, test_dummy = 35.730, train = 13.170, val = 152.310.
[0005] test = 352.262, test_dummy = 35.730, train = 12.588, val = 149.141.
[0006] test = 357.054, test_dummy = 35.730, train = 10.941, val = 153.442.
[0007] test = 354.914, test_dummy = 35.730, train = 10.309, val = 157.844.
[0008] test = 342.204, test_dummy = 35.730, train = 12.243, val = 154.254.
[0009] test = 344.954, test_dummy = 35.730, train = 9.946, val = 152.557.
[0010] test = 348.354, test_dummy = 35.730, train = 9.138, val = 154.026.
[0011] test = 347.432, test_dummy = 35.730, train = 9.539, val = 154.585.
[0012] test = 353.018, test_dummy = 35.730, train = 9.318, val = 154.395.
[0013] test = 357.550, test_dummy = 35.730, train = 8.292, val = 155.190.
[0014] test = 353.914, test_dummy = 35.730, train = 7.926, val = 154.913.
[0015] test = 355.490, test_dummy = 35.730, train = 7.585, val = 156.675.
[0016] test = 354.821, test_dummy = 35.730, train = 8.850, val = 155.362.
[0017] test = 354.489, test_dummy = 35.730, train = 7.336, val = 152.145.
[0018] test = 356.071, test_dummy = 35.730, train = 8.081, val = 154.395.
[0019] test = 352.757, test_dummy = 35.730, train = 7.413, val = 154.405.
[0020] test = 354.936, test_dummy = 35.730, train = 9.924, val = 156.035.
[0021] test = 348.511, test_dummy = 35.730, train = 7.375, val = 151.506.
[0022] test = 348.432, test_dummy = 35.730, train = 7.027, val = 154.944.
[0023] test = 347.811, test_dummy = 35.730, train = 6.719, val = 150.221.
[0024] test = 351.702, test_dummy = 35.730, train = 7.312, val = 155.571.
[0025] test = 350.940, test_dummy = 35.730, train = 6.994, val = 152.205.
[0026] test = 352.935, test_dummy = 35.730, train = 6.384, val = 153.850.
[0027] test = 356.619, test_dummy = 35.730, train = 6.487, val = 156.018.

It seems that train loss continues lowering ,however test MAPE is large.
Is this normal? And why test_dummy does not change?
I am a real green hand in deep learning and looking forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.