Introduction
Introduction
This lesson summarizes the topics we'll be covering in section 10 and why they'll be important to you as a data scientist.
Objectives
You will be able to:
- Understand and explain what is covered in this section
- Understand and explain why the section will help you to become a data scientist
Linear Regression
In this section we're going to introduce our first machine learning model - linear regression. It's really just a fancy way of saying "(straight) line of best fit", but it will introduce a number of concepts that will be important as we continue to explore more sophisticated models in modules 2 and 3.
Covariance and Correlation
We start the section by covering covarience and correlation, both of which relate to how likely two variables are to change together. For example, with houses, it wouldn't be too surprising if the number of rooms and the price of a house were correlated (in general, more rooms == more expensive).
Statistical Learning Theory
We then explore statistical learning theory and how dependent and independent variables relate to it.
Linear Regression
Next, we look into a simple linear regression and figure out how to calculate the "line of best fit".
Coefficient of Determination
We're then gong to introduce the idea of "R squared" as the coefficient of determination to quantify how well a particular line fits a particular data set.
A Complete Regression
From there we look at calculating a complete linear regression, just using code, cover some of the assumptions that must be held for a "least squares regression", introduce Ordinary Least Squares in Statsmodel and introduce some tools for diagnosing your linear regression such as Q-Q plots, the Jarque-Bera test for normal distribution of residuals and the Goldfield-Quandt test for heteroscedasticity. We then look at interpretation of significance and p-value and finish up by doing a regression model of the Boston Housing data set.
Summary
Congratulations! You've made it through much of the introductory data and we've finally got enough context to take our first look at our first machine learning model, while broading our experience of both coding and math so we'll be able to introduce more sophisticated machine learning models as the course progresses.