Regression with CART Trees - Lab

Introduction

In this lab, we shall put into practice, the skills shown in the previous code along. We shall use a simple dataset from Kaggle, called the "Petrol Consumption Dataset" which entails the petrol consumption for a bunch of examples, based on drivers' features.

Objectives

You will be able to:

Conduct a regression experiment using CART trees
Evaluate the model fit and study the impact of hyper parameters on the final tree
Understand training, prediction, evaluation and visualizations required to run regression experiments using trees

Import necessary libraries

# Import libraries 
import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
%matplotlib inline

Read the dataset `petrol_consumption.csv` and view its head and dimensions

# Read the dataset and view head and dimensions

# Code here

(48, 5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Petrol_tax	Average_income	Paved_Highways	Population_Driver_licence(%)	Petrol_Consumption
0	9.0	3571	1976	0.525	541
1	9.0	4092	1250	0.572	524
2	9.0	3865	1586	0.580	561
3	7.5	4870	2351	0.529	414
4	8.0	4399	431	0.544	410

Check the basic statistics for the dataset and inspect the target variable `Petrol_Consumption`

# Describe the dataset

# Code here

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Petrol_tax	Average_income	Paved_Highways	Population_Driver_licence(%)	Petrol_Consumption
count	48.000000	48.000000	48.000000	48.000000	48.000000
mean	7.668333	4241.833333	5565.416667	0.570333	576.770833
std	0.950770	573.623768	3491.507166	0.055470	111.885816
min	5.000000	3063.000000	431.000000	0.451000	344.000000
25%	7.000000	3739.000000	3110.250000	0.529750	509.500000
50%	7.500000	4298.000000	4735.500000	0.564500	568.500000
75%	8.125000	4578.750000	7156.000000	0.595250	632.750000
max	10.000000	5342.000000	17782.000000	0.724000	968.000000

Create features, labels and train/test datasets with a 80/20 split

As with the classification task, we will divide our data into attributes/features and labels and consequently into training and test sets.

# Create datasets for training and test


# Code here

Create an instance of CART regressor and fit the data to the model

As mentioned earlier, for a regression task we'll use a different sklearn class than we did for the classification task. The class we'll be using here is the DecisionTreeRegressor class, as opposed to the DecisionTreeClassifier from before.

# Train a regression tree model with training data 


# Code here

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

Using test set, make predictions and calculate the MAE, MSE and RMSE

To evaluate performance of the regression algorithm, the commonly used metrics are mean absolute error, mean squared error, and root mean squared error. The sklearn library contains functions that can help calculate these values for us. To do so, use this code from the metrics package.

# Predict and evaluate the predictions


# Code here

Mean Absolute Error: 55.6
Mean Squared Error: 6286.2
Root Mean Squared Error: 79.28555984540942

Visualize the tree using `graphviz`

Let's visualize our learnt tree as we have been doing in previous lessons and labs.

# Visualize the decision tree using graph viz library 


# Code here

Level Up - Optional

In order to understand and interpret a tree structure, we need some domain knowledge in which the data was generated. That can help us inspect each leaf and investigate/prune the tree based on qualitative analysis.
Look at the hyper parameters used in the regression tree, check their values ranges in official doc and try running some optimization by growing a number of trees in a loop.
Use a dataset that you are familiar with and run tree regression to see if you can interpret the results.
Check for outliers, try normalization and see the impact on the output

Summary

In this lesson, we developed a tree regressor architecture to train the regressor and predict values for unseen data. We saw that with a vanilla approach, the results were not so great, and this requires further pre-tuning of the model (what we described as hyper parameter optimization OR pruning in the case of trees.

kpokrass / dsc-3-31-10-regression-cart-trees-lab-online-ds-ft-021119 Goto Github PK

dsc-3-31-10-regression-cart-trees-lab-online-ds-ft-021119's Introduction

Regression with CART Trees - Lab

Introduction

Objectives

Import necessary libraries

Read the dataset `petrol_consumption.csv` and view its head and dimensions

Check the basic statistics for the dataset and inspect the target variable `Petrol_Consumption`

Create features, labels and train/test datasets with a 80/20 split

Create an instance of CART regressor and fit the data to the model

Using test set, make predictions and calculate the MAE, MSE and RMSE

Visualize the tree using `graphviz`

Level Up - Optional

Summary

dsc-3-31-10-regression-cart-trees-lab-online-ds-ft-021119's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

kpokrass / dsc-3-31-10-regression-cart-trees-lab-online-ds-ft-021119 Goto Github PK

dsc-3-31-10-regression-cart-trees-lab-online-ds-ft-021119's Introduction

Regression with CART Trees - Lab

Introduction

Objectives

Import necessary libraries

Read the dataset petrol_consumption.csv and view its head and dimensions

Check the basic statistics for the dataset and inspect the target variable Petrol_Consumption

Create features, labels and train/test datasets with a 80/20 split

Create an instance of CART regressor and fit the data to the model

Using test set, make predictions and calculate the MAE, MSE and RMSE

Visualize the tree using graphviz

Level Up - Optional

Summary

dsc-3-31-10-regression-cart-trees-lab-online-ds-ft-021119's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Read the dataset `petrol_consumption.csv` and view its head and dimensions

Check the basic statistics for the dataset and inspect the target variable `Petrol_Consumption`

Visualize the tree using `graphviz`