Built and designed an ML training model that assessed machine learning algorithmic neural networks from organisation model statistics which enabled easy distribution and data classification.
The provided code seems to be focused on building, evaluating, and comparing machine learning models for churn prediction in a step-by-step manner.
Let's break down the code into detailed steps:
- Import necessary libraries:
numpy
,pandas
,matplotlib
,seaborn
. - Suppress warnings using
warnings.filterwarnings("ignore")
. - Load the dataset from an Excel file named 'Customer Churn Data.xlsx' into a Pandas DataFrame (
cf
). - Display the first 5 rows and last 5 rows of the dataset to get an initial view of the data.
- Check the data types of each column in the dataset using
cf.dtypes
.
- Check for missing values in the dataset using
cf.isnull().sum()
. - Handle missing values by replacing them with the median (for numerical columns) and mode (for categorical columns).
- Drop unwanted columns from the dataset using
cf.drop
.
- Convert object columns (categorical features) into float using
pd.to_numeric
. - Convert the 'Churn' column to float explicitly.
- Check for and handle any remaining missing values.
- Explore the dataset by displaying unique values for each column and visualizing the data. This includes creating scatter plots, count plots, and correlation matrices to better understand the data distribution and relationships.
- Sample the data to understand its structure and characteristics.
- Examine various attributes, including:
churn
: Account churn flag (the target variable)Tenure
: Tenure of the accountCity_Tier
: Tier of the primary customer's citycc_contacted_Ly
: Number of times the customer has contacted customer care in the last 12 monthsService_Score
: Satisfaction score given by customers on the service providedAccount_user_count
: Number of customers associated with the accountaccount_segment
: Account segmentation based on spendingCC_Agent_Score
: Satisfaction score given by customers on customer care servicerev_per_month
: Monthly average revenue generated by the account in the last 12 monthscomplain_ly
: Any complaints raised by the account in the last 12 monthsrev_growth_yoy
: Revenue growth percentage of the account (last 12 months vs. last 24 to 13 months)Day_Since_CC_connect
: Number of days since no customers in the account contacted customer care
- Use the Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset.
- Split the dataset into training and testing sets using
train_test_split
. - Perform feature scaling on the data using
StandardScaler
.
- Build three different machine learning models: Decision Tree (CART), Random Forest, and Neural Network (MLP).
- Perform Grid Search to find the best hyperparameters for the models.
- Fit the models on the training data.
- Calculate model accuracy, precision, recall, F1-score, and plot Receiver Operating Characteristic (ROC) curves.
- Evaluate model performance on both training and test datasets using classification reports and confusion matrices.
- Compare the model performance metrics, including accuracy, recall, precision, and F1-score for the three models on both training and test datasets.
The code concludes by displaying a comparison of the model performance metrics in a tabular format, allowing you to assess and compare the models' performance at each step. It's important to select the model that best aligns with the business goals and constraints based on the evaluation results.