- a. Lesson Notes
- b. Exercises
Acquiring and importing the data we will be using
- aquire.py file
Preparing and cleaning our imported data
- prepare.py file
- data should be tabular (made up of rows and columns)
- there should only be one value per cell
- each variable should be one column
- each observation shpould be one row Melt required when one variable is spread across multiple columns Pivot required when one column contains multiple variables
EDA | In this step we determine which features to feed into our model
- initial investigations
- discover patterns
- spot anomolies
- formulate and test hypothesis
- check assumptions
- summary statistics
- graphical representations
X_train: Feature variable columns, drop target variable column
y_train: Series with our target variable column
How we evaluate our classification model's performance
Visualize (Decision Tree) Feature Importance (Random Forest)
******ignore warnings
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from scipy import stats
import os
******files/data
from pydataset import data
import env
import acquire
import prepare
******visualizations
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
******sklearn
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
np.random.seed(123)