HSE_Masters

General info
Project structure
Setup
- Conda
- Pip
Notes

General info

The carbon footprint left during experiments and GPU model training is 4.06 kg and calculated using Eco2AI library

In this work, we aim to test the hypothesis that semantic features and context are important in predicting financial market trends, and compare our approach to baseline sentiment-based solutions.

In the conducted experiment, the correlation between sentiment score and price volatility in financial markets was analyzed. The study was conducted using dataset of historical market data during a 5 year period and sentiment scores obtained from Twitter social network. The sentiment scores were calculated based on the sentiments expressed in social media posts throughout the day.

The results of the correlation analysis showed a strong positive relationship between sentiment score and price volatility. This indicates that as the sentiment score increases, the price volatility also increases, and vice versa. This relationship was found to be statistically significant.

This study provides evidence that sentiment score and price volatility are highly correlated and can be used together to improve the performance of predictive models for financial markets. The results suggest that incorporating sentiment score as a feature in predictive models indeed leads to improved predictions of price movements and can provide valuable insights into the underlying market dynamics.

The goal of this paper was to investigate whether sentence embeddings would yield better results in stock price prediction compared to sentiment analysis approach. The results of the experiments showed that the sentiment polarity extraction approach outperformed sentence embeddings in terms of accuracy and training time for predicting the stock closing price 3 and 5 days ahead.

Project structure

Project_Files
- Financial has notebooks regarding stock market data
- Twitter twitter dataset exploration, sentence embeddings and exploration of sentiment score feature in the separate notebook
- Preprocessed_Files files made by prediction and preprocessing functions saved for a later use in pickle format not to repeat calculations aall over again every time
  - sentence embeddings
  - historical predictions of validation datasets
  - sentiment score from multiple models
  - BERTopic models for each company
  - total dataframes combining all the data required for training
  - twitter files for all companies
  - etc.
- TimeSeries_Prediction
  - darts_logs stores checkpoint files for trained models
  - jupyter notebooks used for model training
- helper_funcs contains multiple helper functions used throught the project to collect the data, make predictions as well as preprocessing, etc.
- models stores the init files of TFT, N-Linear and other models used for prediction
- scinet an implementation of SCINet paper used for prediction of timeseries without using the covariates
- Others residual files
png stores pictures generated by predictions and other visualization
Datasets
- kaggle twitter dataset downloaded from kaggle
- market market data downloaded manually from yahoo finance
emission.csv file generated by Eco2AI library containing information about CO2 emissions generated during training process

Setup

To run this project, install it locally using either conda or pip.

Conda

The environment.yml file is stored in the root folder of this repository and lists all Python libraries on which the notebooks depend, you can replicate the environment using the following conda commands.

First you need to create the environment:

conda env create -f environment.yml

Then, activate it:

conda activate hse-stock

Verify that the new environment was installed correctly:

conda env list

Pip

The following command will install the packages according to the configuration file requirements.txt that is stored in the root folder of this repository:

pip install -r requirements.txt

Notes

Sone of the files could not be uploaded to the Github repository due to the storage limitations, that is why all the notebooks stored in the project repository is completed with info and should not be rerun to see the results. All of the missing files will be generated automatically if you ran the functions but it will require quite some time to run.

iandroid1812 / hse_masters Goto Github PK

hse_masters's Introduction

HSE_Masters

Table of contents

General info

Project structure

Setup

Conda

Pip

Notes

hse_masters's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent