Using standard released financial data along with machine learning classification models to predict an overall health/investment rating for a business
There needs to be more comprehensive and reliable tools for individual investors to assess a company's financial stability and potential for sustainable growth. When relying on traditional financial metrics and analysis, many investors need help identifying healthy companies to invest in. This is because these metrics only partially capture the entire picture of a company's long-term viability.
Objective: Our Focus is to assist individual investors by building a predictive machine learning model (Business Health Predictor) to rate individual companies with a business health grade as a suggestive indicator for investment worthiness.
Method: We used a Classification type, Random Forest Model as our final predictor for our Future Business Health target variable.
Success Criteria: Can we predict with an acceptable level of accuracy the business health target variable for an individual public company using the reported year-end financial data from the prior year?
The data source for this project is a Kaggle.com repository for American companies listed on the New York Stock Exchange and NASDAQ. The dataset comprises financial data from 8,000+ distinct companies recorded during the period spanning from 1999 to 2018.
https://www.kaggle.com/datasets/utkarshx27/american-companies-bankruptcy-prediction-dataset
- That data is anonymized so that we don't know the actual names of the companies.
- There are no industry categories or stock history data provided.
- There are 8,262 distinct companies in the dataset.
- There are no null values in the dataset to discard.
- All data features are in numeric format except for the company name and status label fields. These must either be removed from the model or converted to a numeric value using an encoding process.
- All numeric monetary features are in the same format and rounded to the same precision.
Data required minimal cleanup and preparation
- We renamed most column headings to be clear and understand what is contained in each column._
- Dropping the categorical columns that do not effect the performance of the dataset as well as dropping the columns that were used to find the ratios that determine the necessary ratings.
Creating ratios and rating that are built from financial data listed in the dataset. We determined these rations to cover three major aspects of business, Solvency, Liquidity, Profitability. These ratings were determined by comparing the results to that of other businesses. The solvency, liquidity, and profitability ratings are then added up to get the overall business health of the organization. This overal business helath is what will be used as the target variable for the model.
- Refresh the dataset to obtain financial information for the current period from 2019-2022
- Review for changes in predictive modeling around impactful business events (pre and post-Covid, recessionary periods, etc.)
- Obtain industry data for companies to review how the models are affected when used for specific industry categories.
- Include outstanding stock and price of stock at YE to review any effects on the model as build.
Please review the full analysis in our Jupyter Notebook or [presentation deck](Group 09_Final Capstone Presentation.pptx).
MAIN
├── DATA <- Kaggle repository download for American Companies Financial Report Data
│ ├── american_bankruptcy.csv <- American Companies Financial Data
│ ├── american_bankruptcy_datafile_original.zip <- Downloaded zip file of dataset from Kaggle.com
│ ├── american_bankruptcy_updated.csv <- American Companies Financial Data (**working version used in final notebook**)
├── IMAGES <- file containing any visualizations found throughout the project
├── GROUP 9_JUNE 23.pdf <- PDF version of project proposal.
├── README.md <- Project README file
├── GROUP 9_FINAL_NOTEBOOK.ipynb <- Technical and narrative documentation in Jupyter Notebook
├── GROUP 09_Final Capstone Presentation.pdf <- PDF version of final project presentation
├──(Branches) <- Individual Branches for each project member
│ ├── DATA
│ ├── IMAGES
│ notebookname.ipynb <- Individual notebooks each project member