DATA SET: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
I have built this Support Vector Machine for classification using scikit-learn and the Radial Basis Function (RBF) Kernel. The training data set contains continuous and categorical data from the UCI Machine Learning Repository to predict whether or not a patient has heart disease.
Concepts covered:
programming in Python and the concepts behind Support Vector Machines, the Radial Basis Function, Regularization, Cross Validation and Confusion Matrices.
Steps Involved:
Import the modules that will do all the work
Import the data
Identifying Missing Data
Dealing With Missing Data
Split the Data into Dependent and Independent Variables
One-Hot Encoding
Centering and Scaling
Build A Preliminary Support Vector Machine
Optimize Parameters with Cross Validation
Building, Evaluating, Drawing, and Interpreting the Final Support Vector Machine
DATASET OVERVIEW:
age,
sex,
cp, chest pain
restbp, resting blood pressure (in mm Hg)
chol, serum cholesterol in mg/dl
fbs, fasting blood sugar
restecg, resting electrocardiographic results
thalach, maximum heart rate achieved
exang, exercise induced angina
oldpeak, ST depression induced by exercise relative to rest
slope, the slope of the peak exercise ST segment.
ca, number of major vessels (0-3) colored by fluoroscopy
thal, this is short of thalium heart scan.
hd, diagnosis of heart disease, the predicted attribute