We want to predict some data using Classification in Machine Learning.
Let the classifier be SVM, Decision Tree, Random Forest, Logistic Rgression, and so on.
What are the observations? In each classifier, we get both right as well as wrong predictions.
- False Positive
- False Negative
When we predict that something happens/occurs and it didn't happened/occured.(rejection of a true null hypothesis) Example :- We predict that an earthquake would occur which didn't happen.
When we predict that something won't happen/occur but it happens/occurs.(non-rejection of a false null hypothesis) Example :- We predict that there might be no earthquake but there occurs an earthquake.
Usually, type I errors are considered to be not as critical as type II errors.
But in fields like Medicine, Agriculture, both the errors might seem critical.
A 2x2 matrix denoting the right and wrong predictions might help us analyse the rate of success
.
This matrix is termed the Confusion Matrix
.
0 | 1 | |
---|---|---|
0 | TN | FP |
1 | FN | TP |
The horizontal axis corresponds to the predicted values(y-predicted
) and the vertical axis correspomds to the actual values(y-actual
).
- [1][1] represents the values which are predicted to be false and are actually false.
- [1][2] represents the values which are predicted to be true, but are false.
- [2][1] represents the values which are predicted to be false, but are true.
- [2][2] represents the values which are predicted to be true and are actually true.
Confusion Matrix can be used in python by importing the metrics
module from sklearn
.
from sklearn import metrics
cm = metrics.confusion_matrix(y_actual, y_predicted)
We can calculate the rate of success as:-
r = (TN+TP)/(FN+FP)
It can be directly interpreted by,
print("Accuracy = ",metrics.accuracy_score(y_actual, y_predicted)*100)