This project focuses on analyzing and predicting sales data for PlayStation 4 (PS4) games. It aims to provide insights into the sales trends of PS4 games and build a predictive model to forecast sales based on the release year of the games.
- Introduction
- Project Overview
- Dataset
- Installation
- Usage
- Results
- Contributing
- License
The gaming industry has witnessed tremendous growth, and analyzing sales data can provide valuable insights for game developers, publishers, and enthusiasts. This project focuses on analyzing sales data for PS4 games and developing a predictive model to forecast future sales based on the release year of the games.
The project follows these major steps:
Importing Libraries: Required libraries such as numpy, pandas, tensorflow, matplotlib.pyplot, and plotly.express are imported to facilitate data manipulation, visualization, and machine learning tasks.
Loading the Dataset: The dataset named "List of best-selling PlayStation 4 video games.csv" is loaded using the pandas library. It contains information about the best-selling PS4 games, including their titles, release dates, genres, and copies sold.
Data Preprocessing: The dataset is preprocessed to ensure it is in a suitable format for analysis. Column names are standardized, and the "CopiesSold" column is transformed to numeric values. The "ReleaseYear" column is converted to a datetime format and then extracted as the year.
Data Analysis: Exploratory data analysis (EDA) is performed on the dataset to gain insights into the sales trends of PS4 games. Visualizations, such as bar plots and pie charts, are used to showcase the copies sold for each game and the distribution of copies sold across different genres.
Prediction: A linear regression model is trained to predict the number of copies sold based on the release year of the games. The model is fitted with the input feature (ReleaseYear) and target variable (CopiesSold). Using the trained model, sales are predicted for the years 2021, 2022, 2023, and 2024. A plot is generated to visualize the actual sales data and the predicted sales for these years.
The dataset used in this project, "List of best-selling PlayStation 4 video games.csv," contains information about the best-selling PS4 games, including the game title, release date, genre(s), and the number of copies sold. The dataset is provided in CSV format and is loaded using the pandas library.
To run this project locally, please follow these steps:
Clone the repository:
git clone <repository_url>
Install the required libraries:
pip install numpy pandas tensorflow matplotlib plotly
Run the Jupyter Notebook:
jupyter notebook List_of_best_selling_PS4_games.ipynb
The Jupyter Notebook, "List_of_best_selling_PS4_games.ipynb," serves as the main entry point for the project. The notebook contains the code for loading the dataset, performing data preprocessing, conducting data analysis, training the predictive model, and visualizing the results. Each code cell in the notebook is accompanied by comments explaining its purpose and functionality.
To use this project, you can modify and enhance the existing code according to your specific requirements. You can explore different visualizations, try alternative machine learning models, or incorporate additional features from the dataset for analysis and prediction.
The project provides insights into the sales trends of PS4 games and predicts future sales based on the release year of the games. The data analysis section includes visualizations, such as bar plots and pie charts, to showcase the copies sold for each game and the distribution of copies sold across different genres. The predictive model, based on linear regression, is used to forecast sales for the years 2021, 2022, 2023, and 2024. The actual sales data and the predicted sales for these years are visualized using a line plot.
Contributions to this project are welcome. If you have any suggestions, ideas, or improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.