Churn Customer Problem ML Project

Built and designed an ML training model that assessed machine learning algorithmic neural networks from organisation model statistics which enabled easy distribution and data classification.

MODEL GENERATION

The provided code seems to be focused on building, evaluating, and comparing machine learning models for churn prediction in a step-by-step manner.

Let's break down the code into detailed steps:

Step 1: Importing Libraries and Loading Data

Import necessary libraries: numpy, pandas, matplotlib, seaborn.
Suppress warnings using warnings.filterwarnings("ignore").
Load the dataset from an Excel file named 'Customer Churn Data.xlsx' into a Pandas DataFrame (cf).
Display the first 5 rows and last 5 rows of the dataset to get an initial view of the data.
Check the data types of each column in the dataset using cf.dtypes.

Step 2: Data Cleaning

Check for missing values in the dataset using cf.isnull().sum().
Handle missing values by replacing them with the median (for numerical columns) and mode (for categorical columns).
Drop unwanted columns from the dataset using cf.drop.

Step 3: Data Preprocessing

Convert object columns (categorical features) into float using pd.to_numeric.
Convert the 'Churn' column to float explicitly.
Check for and handle any remaining missing values.

Step 4: Exploratory Data Analysis (EDA)

Explore the dataset by displaying unique values for each column and visualizing the data. This includes creating scatter plots, count plots, and correlation matrices to better understand the data distribution and relationships.
Sample the data to understand its structure and characteristics.
Examine various attributes, including:
- churn: Account churn flag (the target variable)
- Tenure: Tenure of the account
- City_Tier: Tier of the primary customer's city
- cc_contacted_Ly: Number of times the customer has contacted customer care in the last 12 months
- Service_Score: Satisfaction score given by customers on the service provided
- Account_user_count: Number of customers associated with the account
- account_segment: Account segmentation based on spending
- CC_Agent_Score: Satisfaction score given by customers on customer care service
- rev_per_month: Monthly average revenue generated by the account in the last 12 months
- complain_ly: Any complaints raised by the account in the last 12 months
- rev_growth_yoy: Revenue growth percentage of the account (last 12 months vs. last 24 to 13 months)
- Day_Since_CC_connect: Number of days since no customers in the account contacted customer care

Step 5: Data Balancing

Use the Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset.

Step 6: Splitting Data

Split the dataset into training and testing sets using train_test_split.
Perform feature scaling on the data using StandardScaler.

Step 7: Model Building and Evaluation

Build three different machine learning models: Decision Tree (CART), Random Forest, and Neural Network (MLP).
Perform Grid Search to find the best hyperparameters for the models.
Fit the models on the training data.
Calculate model accuracy, precision, recall, F1-score, and plot Receiver Operating Characteristic (ROC) curves.
Evaluate model performance on both training and test datasets using classification reports and confusion matrices.

Step 8: Model Comparison

Compare the model performance metrics, including accuracy, recall, precision, and F1-score for the three models on both training and test datasets.

The code concludes by displaying a comparison of the model performance metrics in a tabular format, allowing you to assess and compare the models' performance at each step. It's important to select the model that best aligns with the business goals and constraints based on the evaluation results.

ayushgupta16 / churn_customer_problem-ml_project Goto Github PK