Git Product home page Git Product logo

purvasingh96 / stockgram-intelligent-portfolio-manager Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 4.0 9.48 MB

📈This repo describes a framework that leverages sentiment stability of a financial 10-K report as the trading signal (alpha factor)

Jupyter Notebook 100.00%
sharpe-ratio alpha-factors factor-returns turnover-analysis nlp-analysis cosine-similarity alphalens factor-rank-autocorrelation portfolio factor-data

stockgram-intelligent-portfolio-manager's Introduction

Research Paper

Please find my research paper on Intelligent Portfolio Management via NLP Analysis of Financial 10-K Statements, published in the November issue of International Journal of Artificial Intelligence and Applications

Overview

The project attempts to analyze if the sentiment stability of financial 10-K reports over time can determine the company’s future mean returns. A diverse portfolio of stocks was selected to test this hypothesis. The proposed framework downloads 10-K reports of the companies from SEC’s EDGAR database. It passes them through the preprocessing pipeline to extract critical sections of the filings to perform NLP analysis. Using Loughran and McDonald sentiment word list, the framework generates sentiment TF-IDF from the 10-K documents to calculate the cosine similarity between two consecutive 10-K reports and proposes to leverage this cosine similarity as the alpha factor. For analyzing the effectiveness of our alpha factor at predicting future returns, the framework uses the alphalens library to perform factor return analysis, turnover analysis, and for comparing the Sharpe ratio of potential alpha factors. The results show that there exists a strong correlation between the sentiment stability of our portfolio’s 10-K statements and its future mean returns.

System Architecture

The below figure gives a high-level overview of how this intelligent portfolio mangager works.

Quandl Dataset

Quandl end of day US Stock Prices database, Accessed: 2020-10

How to use Quandl data?

!pip install quandl

import quandl

quandl.ApiConfig.api_key = "YOURAPIKEY"

data = quandl.get(['EOD/AMZN', 'EOD/NKE'])

data.head()

Portfolio

We test our hypothesis: Sentiment stability of financial 10-K report can be a potential trading signal, on a diverse portfolio of 7 stocks as below:

The SEC EDGAR Database

In order to extract financial 10-K reports of the stocks in our universe, we leverage a pre-defined SEC API and the CIK number of the stock. Details on how to extract the 10-K report from SEC EDGAR database and pre-process it can be found in this notebook.

Loughran McDonald Sentiment Word List

Loughran McDonald word lists contains 6 different sentiments (negative, positive, uncertainty, litigious, strong modal, and weak modal) which are curated by examining word usage in at least 5% of 10-Ks (i.e., annual reports) during 1994-2008. It is a relatively exhaustive list of words that makes avoidance much more challenging. The sentiment lists are based on the most likely interpretation of a word in a business context. The Loughran and McDonald (LM) word lists are quite extensive: it contains 354 positive and 2,329 negative words. You can find the csv version used in this project here

Code

You can find the PyTorch implementation of the framework here

Evaluation and Results

Factor Returns

Factor returns are a way to directly measure the returns of our portfolio if their weights were determined purely by the alpha factor. Alphalens requires two mandatory arguments to predict future mean returns: factors and prices. In this project, we consider cosine similarity between two consecutive 10-K reports as factor data and year-end adjusted closing prices of the stocks in our portfolio as pricing data to run against our factor data.

After generating the factor data frame and setting the pricing data, we pass both the arguments in the alphalens’ method called get_clean_factor_and_forward_returns, which accepts factor data, pricing data, quantiles, bins, and periods. This function generates a multi-indexed merged data frame that is indexed by date at level 0 and followed by stock/asset at level 1. This data frame contains the values for a single alpha factor, forward returns for each period, and quantile/bin in which the signal belongs.



Turnover Analysis

The turnover analysis estimates the fraction of the portfolio's total value getting traded in a period. One of the ways to measure turnover is factor rank autocorrelation . Factor rank autocorrelation is a way to measure how stable are the ranked alpha factors. A high factor rank autocorrelation is an indication that the turnover is lower. A low or even a negative autocorrelation is a proxy to indicate a higher turnover. If two alpha factors have similar quintile performance and similar factor returns, we would prefer the one with lower turnover.



The reason for choosing alpha factor with lower turnover is that it makes it possible for us to execute trades if we have liquid stocks and reduce transaction costs. Excessive turnover could imply that our Alpha factor is only catching noise.

Sharpe Ratio

Usually, a ratio under 1.0 is considered sub-optimal. Sharpe ratio greater than 1.0 is acceptable to good by investors. A Sharpe ratio higher than 2.0 is good, and investors deem a 3.0 or higher Sharpe ratio excellent. Looking at the Sharpe ratio of our Alpha factor, we can see that the 10-K filing reports that convey the sentiment interesting have the highest Sharpe ratio of 4.10, followed by the 10-K documents that express a positive view with a Sharpe ratio of 1.02.



Contributor

Contributing

Please feel free to open a Pull Request to contribute towards this repository. Also, if you think there's any section that requires more/better explanation, please use the issue tracker to let me know about the same.

Support

If you like this repo and find it useful, please consider (★) starring it (on top right of the page) so that it can reach a broader audience.

stockgram-intelligent-portfolio-manager's People

Contributors

purvasingh96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.