Grow your machine learning skills with scikit-learn and discover how to use this popular Python library to train models using labeled data. In this course, you'll learn how to make powerful predictions, such as whether a customer is will churn from your business, whether an individual has diabetes, and even how to tell classify the genre of a song. Using real-world datasets, you'll find out how to build predictive models, tune their parameters, and determine how well they will perform with unseen data.
- Using supervised learning techniques to build predictive models
- For both regression and classification problems
- Underfitting and overfittng
- How to split data
- Cross-validation
- Data preprocessing techniques
- Model selection
- Hyperparameter tuning
- Model performance evaluation
- Using pipelines
In this chapter, you'll be introduced to classification problems and learn how to solve them using supervised learning techniques. You'll learn how to split data into training and test sets, fit a model, make predictions, and evaluate accuracy. You’ll discover the relationship between model complexity and performance, applying what you learn to a churn dataset, where you will classify the churn status of a telecom company's customers.
- Binary Classification
- k-Nearest Neighbors
- Measuring Model Performance
- Train/test split and computing accuracy
- Overfitting and underfitting
- Visualizing Model Complexity
In this chapter, you will be introduced to regression, and build models to predict sales values using a dataset on advertising expenditure. You will learn about the mechanics of linear regression and common performance metrics such as R-squared and root mean squared error. You will perform k-fold cross-validation, and apply regularization to regression models to reduce the risk of overfitting.
- Introduction to Regression
- Linear Regression
- Cross-validation
- Regularized Regression
- Lasso Regression
Having trained models, now you will learn how to evaluate them. In this chapter, you will be introduced to several metrics along with a visualization technique for analyzing classification model performance using scikit-learn. You will also learn how to optimize classification and regression models through the use of hyperparameter tuning.
- How good is your model?
- Logistic Regression and the ROC curve
- Hyperparameter tuning
- Hyperparameter tuning with
GridSearchCV
- Hyperparameter tuning with
RandomizedSearchCV
Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!
- Preprocessing data
- Creating dummy variables
- Handling / Dropping missing data
- Centering and Scaling (Regression / Classification)
- Evaluating multiple models
- Visualizing Regression model performance
- Visualizing Classification model performance