This repository contains code for calculating the historical returns of stock strategies calculated with daily data, as well as a generated dataset that contains the values of four portfolios over time following four different strategies.
The strategies included are simple-moving-average based strategies which are traded using stocks that are members of the S&P500 (Standard and Poor's 500) Index. This is a list of the 500 "leading" publicly-traded companies. For more information on S&P500 inclusion criteria, see this article
Below are some descriptions of each file. If you want, you can skip to the "How to Use This Repository" section for a quick-start guide.
This python file is responsible for fetching some basic information about each stock currently in the S&P500. It fetches the data from a Wikipedia table That is regularly updated.
It creates a csv called "sp500-info.py" with quite a few features. The most notable features are the 'tickers' and 'dates_added' features, which tell you the stock symbol and date it was added to the S&P500 index, respectively.
This python file creates the historical daily data for each stock based on the stocks listed in the csv generated by the program in sp500-info.py.
It uses the yfinance api to fetch the data and creates a separate csv with the stocks price data history, up to 10 years but maximally back to the date it was added to the S&{ (if it was added in the last 10 years).
Note that running this program may take a while.
See the next section for the features this data has.
This is a folder where the historical price data for the S&P stocks will be stored.
This historical data has some useful features that can be used to make indicator calculations:
- 'Date' - the date in which the stock price data corresponds to
- 'Open' - the price at the start of the trading day
- 'High' - the highest price of the given stock during the trading day
- 'Low' - the lowest price of the given stock during the trading day
- 'Close' - the price at the end of the trading day
- 'Volume' - the amount of shares traded during the trading day
- 'Dividend' - indicates whether a dividend was paid out, and how much was paid out per share (0 for no dividend, or an amount per share)
- 'Stock Splits' - Indicates whether a stock split occurred, with 0 indicating no stock split and a different number indicating that the number of shares in circulation was multiplied by that number
First, make sure your environment is configured with the necessary packages installed. Here is a list of the ones you will need:
- requests
- bs4
- pandas
- time
- yfinance
Second, run the SP500-get-info to generate sp500-info.csv, a csv which will have information on each S&P stock.
Third, you need to run sp500_create_10yr_hist_data_csvs.py to create the csvs with the historical data of stocks that are currently in the S&P500 index. This will create historical data (up to 10 years) for each of the stocks in their own csv files within the sp500-10-yr-hist-data folder. Note that this may take a while.
Fourth, you can add indicators or compose new strategies based on the class templates in the sp500_strategy_tester.py, and your selected strategies' portfolio equity data will be calculated over time and stored as a feature for each strategy you define in a csv called "sp500-sma-strategy-performance-data.csv (you can change the name depending on what types of strategies you are testing). This may take a while to run, possibly up to an hour or so.
The strategy calculations do not take into account historical data for companies that were previously on the S&P500 index but were removed. They only tracks the performance of historically trading stocks that are currently a part of the S&P500 index.