- Perform an Explorartory Data Analysis with visualization and
- Use a liner regression model (OLS) to predict the number of bikes rented.
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Time
- scikit-learn
- statsmodels
- Trello
- MIRO
London bike sharing dataset https://www.kaggle.com/hmavrodiev/london-bike-sharing-dataset
Train R^2: 0.628 - Train Adjusted R^2: 0.628
Test R^2: 0.655 - Test R^2: 0.582
- Difference in R^2 between train and test 2.07%
- Difference in R^2 between train and test is 4.6% which is less than 5%.
- Design:
- I have created a MIRO board with the story mapping. https://miro.com/app/board/o9J_lRe3F5E=/
- I have created a Trello board with the epics of requirements needed to deliver. https://trello.com/b/S7aR17IA
- Clean, manipulate and create the visualizations.
- Exploratory Data Analysis
- Create visualizations
- Create Dummies
- Recursive Feature Selection (RFE)
- Create the linear regression model.
- Validate the assumptions (Linearity, Autocorrelation, Sub-Normality, Normality, Multicollinearity)
- Analyze the model perforamnce
- Creating a Story Telling presentation
To see the presentation, click in the below picture.