Git Product home page Git Product logo

personal-coding / stock-earnings-call-transcript-natural-language-processing Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 1.0 488.92 MB

Natural Language Processing on Stocks' Earnings Call Transcripts: An Investment Strategy Backtest Based on S&P Global Papers.

Python 100.00%
earnings-calls-transcripts investment-strategy natural-language-processing sp500-data-analysis nlp sentiment-analysis

stock-earnings-call-transcript-natural-language-processing's Introduction

Natural Language Processing on Earnings Call Transcripts

Investment strategy using natural language processing on earnings call transcripts. Based on the S&P Global papers Natural Language Processing โ€“ Part II: Stock Selection and Natural Language Processing โ€“ Part III: Feature Engineering.

Background

S&P Global released two papers using natural language processing on stocks' earnings call transcripts, which purported an outperforming investment strategy backtest.

  • The Part II paper suggested that sentiment scores could be created using the Loughran and McDonald Sentiment Word Lists. Using the net positive score (the number of positive words minus the number of negative words divided by the total number of words in the transcript), an investment strategy was created. The investment strategy takes the top 20% quintile of transcript scores over a four-month lookback period. The stocks chosen are equal weighted and are rebalanced at month-end. The paper suggested that a long-only strategy yielded a 2.35% monthly average return, while a long-short strategy yielded a 4.14% monthly average return.

  • The Part III paper suggested that scores could be created using descriptor tags (i.e. revenue, earnings, profitability) along with positive or negative keywords within each transcript sentence. Using the net positive score (the number of positive descriptor tag sentences minus the number of negative descriptor tag sentences divided by the total number of sentences in the transcript), an investment strategy was created. The investment strategy is to take the top 20% quintile of transcript scores over a four-month lookback period. The stocks chosen are equal weighted and are rebalanced at month-end. The paper suggested that a long-only strategy yielded a 4.24% monthly average return, while a long-short strategy yielded a 9.16% monthly average return.

My tests focus on S&P500 stocks from Apr 2012 - Aug 2022. I did not factor in stocks that had moved in or out of the S&P500 over the investment strategy period. Simply, I only looked at stocks that were present in the S&P500 as of Sep 2022. If the indicators were truly indicative of market outperformance, I did not believe that small stock selection differences in the S&P500 over time would yield a significantly different result.

Results

Over the investment strategy period, a buy-and-hold strategy would have generated a 4.07x return (again, based on stocks that were presently in the S&P500 as of Sep 2022 and not factoring in stocks that moved in or out of the index over time).

My results show much worse results than suggested in the papers. The long-only strategy slightly outperforms the buy-and-hold strategy, but not significantly. The long-short strategy fails to generate a profit, as the short side of the trades eats away at all of the long trades' profits.

I am surpised that the short trades results were significantly different than the papers' results. Perhaps I incorrectly programmed the back test, although I heavily reviewed the back test programming logic.

All of the earnings calls and stock data are in this repo, so feel free to test yourself.

  • NLP II long-only back test
Return Long Hit Rate Short Hit Rate
4.36x 59.1% N/A
  • NLP II long-short back test
Return Long Hit Rate Short Hit Rate
0.98x 59.1% 42.0%
  • NLP III long-only back test (revenue topic + directionally positive)
Return Long Hit Rate Short Hit Rate
4.10x 59.1% N/A
  • NLP III long-short back test (revenue topic + directionally positive)
Return Long Hit Rate Short Hit Rate
0.99x 59.1% 42.2%

Programs

There are several programs in this repository:

stock-earnings-call-transcript-natural-language-processing's People

Contributors

personal-coding avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

dolphinbluer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.