Take a look at the project: https://nbviewer.org/github/CeloCruz/ForecastingCustomers/blob/ddc205b3a5c6ecd0763e146c44da69cdb7548758/forecasting_notebook.ipynb
Objective: This project represents a real-world application of machine learning to enhance forecasting capabilities, leveraging the power of the XGBoost model. The primary aim is to optimize weekly purchase budgets across multiple restaurants by analyzing time series data and employing advanced analytics.
Business Impact:
- Precision in forecasting to minimize operational waste and optimize resource allocation.
- Transition from traditional human-centric forecasting to machine learning-driven approaches for enhanced adaptability.
- Proactive adjustment to customer demand, ensuring optimized operations and budgets.
Project Scope: Focused on supervised regression machine learning for forecasting weekly customer numbers.
Key Metric: Mean Absolute Error (MAE) used to quantify prediction accuracy.
Models Explored:
- ETS
- ARMA
- SARIMA
- XGBoost
Tools Utilized:
- Data Processing: Pandas, Numpy
- Visualization: Matplotlib, Seaborn
- Statistical Analysis: Statsmodels
- Machine Learning: Scikit-Learn, XGBoost
Skills required:
- Business Understanding
- Critical Thinking & Problem Solving
- Domain Knowledge in Time Series Forecasting
- Data Analysis
- Data Transformation for Relevant Insights
- Data Visualization
- Data Preprocessing
- Feature Selection and Engineering
- Supervised Regression Machine Learning
- Predictive Analysis
- Time Series Analysis and Cross-Validation
- Fine-Tuning Techniques
- Ongoing Model Refinement
- Communication of Results to Non-Technical Audiences
-
Data Preparation:
- Rigorous feature engineering and time series analysis.
- Data integrity ensured through careful preprocessing.
-
Model Selection:
- Exploration of ETS, ARMA, SARIMA, and XGBoost models.
- XGBoost identified as the top-performing model through rigorous cross-validation.
-
Evaluation:
- MAE used as the primary performance measure.
- XGBoost demonstrated superior performance compared to other models.
-
Fine-tuning:
- Thorough fine-tuning and feature selection for optimal model performance.
- Scikit-learn pipeline employed for streamlined data processing.
Contributions and feedback are highly valued for this project. Feel free to explore the code, datasets, and documentation. I appreciate your interest in my project!