Git Product home page Git Product logo

group09_feb06_capstone's Introduction

Header_Image

Predicting the future health and investability of public companies using published financial data.

Using standard released financial data along with machine learning classification models to predict an overall health/investment rating for a business

Business Problem

There needs to be more comprehensive and reliable tools for individual investors to assess a company's financial stability and potential for sustainable growth. When relying on traditional financial metrics and analysis, many investors need help identifying healthy companies to invest in.   This is because these metrics only partially capture the entire picture of a company's long-term viability.

The Solution

Objective: Our Focus is to assist individual investors by building a predictive machine learning model (Business Health Predictor) to rate individual companies with a business health grade as a suggestive indicator for investment worthiness.

Method: We used a Classification type, Random Forest Model as our final predictor for our Future Business Health target variable.

Success Criteria: Can we predict with an acceptable level of accuracy the business health target variable for an individual public company using the reported year-end financial data from the prior year?

Data Understanding and Preparation

The data source for this project is a Kaggle.com repository for American companies listed on the New York Stock Exchange and NASDAQ. The dataset comprises financial data from 8,000+ distinct companies recorded during the period spanning from 1999 to 2018.

Source of Data

https://www.kaggle.com/datasets/utkarshx27/american-companies-bankruptcy-prediction-dataset

Data Dictionary

dataset_datadictionary

Data Investigation Findings
  • That data is anonymized so that we don't know the actual names of the companies.
  • There are no industry categories or stock history data provided.
  • There are 8,262 distinct companies in the dataset.
  • There are no null values in the dataset to discard.
  • All data features are in numeric format except for the company name and status label fields. These must either be removed from the model or converted to a numeric value using an encoding process.
  • All numeric monetary features are in the same format and rounded to the same precision.

Modeling and Evaluation

code_python_libraries

Code_Data_load

Data required minimal cleanup and preparation

  • We renamed most column headings to be clear and understand what is contained in each column._
  • Dropping the categorical columns that do not effect the performance of the dataset as well as dropping the columns that were used to find the ratios that determine the necessary ratings.
Creating Ratios to use for Targe Prediction

Creating ratios and rating that are built from financial data listed in the dataset. We determined these rations to cover three major aspects of business, Solvency, Liquidity, Profitability. These ratings were determined by comparing the results to that of other businesses. The solvency, liquidity, and profitability ratings are then added up to get the overall business health of the organization. This overal business helath is what will be used as the target variable for the model.

Code_Creating_Ratio_Fields

Code_dropping_Columns

Code_solvency_rating Code_Liq_prof_ratings Code_BusHealth_calc

code_train_test_split code_LinearSVC1 code_LinearSVC2 code_LinearSVC3

code_RandomForestModel1 code_RandomForestModel2 code_RandomForestModel3

Conclusion

Future Work

  1. Refresh the dataset to obtain financial information for the current period from 2019-2022
  2. Review for changes in predictive modeling around impactful business events (pre and post-Covid, recessionary periods, etc.)
  3. Obtain industry data for companies to review how the models are affected when used for specific industry categories.
  4. Include outstanding stock and price of stock at YE to review any effects on the model as build.

For More Information

Please review the full analysis in our Jupyter Notebook or [presentation deck](Group 09_Final Capstone Presentation.pptx).

Repository Navigation

MAIN
├── DATA                                          <- Kaggle repository download for American Companies Financial Report Data
│   ├── american_bankruptcy.csv                        <- American Companies Financial Data
│   ├── american_bankruptcy_datafile_original.zip      <- Downloaded zip file of dataset from Kaggle.com
│   ├── american_bankruptcy_updated.csv                <- American Companies Financial Data (**working version used in final notebook**)
├── IMAGES                                        <- file containing any visualizations found throughout the project
├── GROUP 9_JUNE 23.pdf                           <- PDF version of project proposal. 
├── README.md                                     <- Project README file
├── GROUP 9_FINAL_NOTEBOOK.ipynb                  <- Technical and narrative documentation in Jupyter Notebook
├── GROUP 09_Final Capstone Presentation.pdf      <- PDF version of final project presentation
├──(Branches)                                     <- Individual Branches for each project member
│   ├── DATA
│   ├── IMAGES
│   notebookname.ipynb                                 <- Individual notebooks each project member

group09_feb06_capstone's People

Contributors

scotthills-deloitte avatar zmtillery avatar ndsecond avatar

Watchers

 avatar

Forkers

zmtillery

group09_feb06_capstone's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.