Predicting the future health and investability of public companies using published financial data.

Using standard released financial data along with machine learning classification models to predict an overall health/investment rating for a business

Business Problem

There needs to be more comprehensive and reliable tools for individual investors to assess a company's financial stability and potential for sustainable growth. When relying on traditional financial metrics and analysis, many investors need help identifying healthy companies to invest in. This is because these metrics only partially capture the entire picture of a company's long-term viability.

The Solution

Objective: Our Focus is to assist individual investors by building a predictive machine learning model (Business Health Predictor) to rate individual companies with a business health grade as a suggestive indicator for investment worthiness.

Method: We used a Classification type, Random Forest Model as our final predictor for our Future Business Health target variable.

Success Criteria: Can we predict with an acceptable level of accuracy the business health target variable for an individual public company using the reported year-end financial data from the prior year?

Data Understanding and Preparation

The data source for this project is a Kaggle.com repository for American companies listed on the New York Stock Exchange and NASDAQ. The dataset comprises financial data from 8,000+ distinct companies recorded during the period spanning from 1999 to 2018.

Source of Data

https://www.kaggle.com/datasets/utkarshx27/american-companies-bankruptcy-prediction-dataset

Data Dictionary

Data Investigation Findings

That data is anonymized so that we don't know the actual names of the companies.
There are no industry categories or stock history data provided.
There are 8,262 distinct companies in the dataset.
There are no null values in the dataset to discard.
All data features are in numeric format except for the company name and status label fields. These must either be removed from the model or converted to a numeric value using an encoding process.
All numeric monetary features are in the same format and rounded to the same precision.

Modeling and Evaluation

Data required minimal cleanup and preparation

We renamed most column headings to be clear and understand what is contained in each column._
Dropping the categorical columns that do not effect the performance of the dataset as well as dropping the columns that were used to find the ratios that determine the necessary ratings.

Creating Ratios to use for Targe Prediction

Creating ratios and rating that are built from financial data listed in the dataset. We determined these rations to cover three major aspects of business, Solvency, Liquidity, Profitability. These ratings were determined by comparing the results to that of other businesses. The solvency, liquidity, and profitability ratings are then added up to get the overall business health of the organization. This overal business helath is what will be used as the target variable for the model.

Conclusion

Future Work

Refresh the dataset to obtain financial information for the current period from 2019-2022
Review for changes in predictive modeling around impactful business events (pre and post-Covid, recessionary periods, etc.)
Obtain industry data for companies to review how the models are affected when used for specific industry categories.
Include outstanding stock and price of stock at YE to review any effects on the model as build.

For More Information

Please review the full analysis in our Jupyter Notebook or [presentation deck](Group 09_Final Capstone Presentation.pptx).

Repository Navigation

MAIN
├── DATA                                          <- Kaggle repository download for American Companies Financial Report Data
│   ├── american_bankruptcy.csv                        <- American Companies Financial Data
│   ├── american_bankruptcy_datafile_original.zip      <- Downloaded zip file of dataset from Kaggle.com
│   ├── american_bankruptcy_updated.csv                <- American Companies Financial Data (**working version used in final notebook**)
├── IMAGES                                        <- file containing any visualizations found throughout the project
├── GROUP 9_JUNE 23.pdf                           <- PDF version of project proposal. 
├── README.md                                     <- Project README file
├── GROUP 9_FINAL_NOTEBOOK.ipynb                  <- Technical and narrative documentation in Jupyter Notebook
├── GROUP 09_Final Capstone Presentation.pdf      <- PDF version of final project presentation
├──(Branches)                                     <- Individual Branches for each project member
│   ├── DATA
│   ├── IMAGES
│   notebookname.ipynb                                 <- Individual notebooks each project member

scotthills-deloitte / group09_feb06_capstone Goto Github PK

group09_feb06_capstone's Introduction

Predicting the future health and investability of public companies using published financial data.

Business Problem

The Solution

Data Understanding and Preparation

Source of Data

Data Dictionary

Data Investigation Findings

Modeling and Evaluation

Creating Ratios to use for Targe Prediction

Conclusion

Future Work

For More Information

Repository Navigation

group09_feb06_capstone's People

Contributors

Watchers

Forkers

group09_feb06_capstone's Issues

Recommend Projects

Recommend Topics

Recommend Org