This Portfolio contains data-related projects.
- Scraped the Boxofficemojo website using Scrapy in Python.
- Checked all the movies released in the US during certain periods of time and extracted useful information about the individual movies.
- For each movie, Domestic Revenues, Worldwide Revenues, Distributor, Opening, Budget, MPAA, Genres, and In Release are scraped.
- Scraped the Boxofficemojo and Traileraddict websites to get movie information.
- Explored movie features such as budget, distributors, MPAA, and genres.
- Examined whether the variation in the promotion period is related to such features.
- Analyze the factors related to housing prices in Melbourne and performed the predictions for the housing prices using several machine learning techniques.
- Employed Linear Regression, Ridge Regression, K-Nearest Neighbors, and Decision Tree.
- Found the optimal values for hyper parameters in each model using the methods of the Cross Validation and Grid Search techniques.
- Compared the results to find the best machine learning model to predict the housing prices in Melbourne.
- Converted a data in one spreadsheet to a relational database for SQL.
- Performed several SQL queries using the database.
- Scraped over 3000 job postings for 'Data Analyst' from the Glassdoor website using the Selenium library in the Python
- Cleaned the scraped data using the Python.
- Converted the data to the format for the Relational Database to store it in the SQL format.
- Visualized the data using Tableau, showing the salary distributions by state, city, sector, and skills.
- Implemented the cohort analysis using eCommerce data from UIC machine learning repository
- Showed how to create the matrix for cohort analysis from the raw ecommerce data.
- Used a movie data set from the MovieLens, which has 9742 movies.
- Quantified the movie features using the Term Frequency and Inverse Document Frequency (tf-idf).
- Calculated the similarities between movies using the cosine similarity.
- Added the 'Did you mean...?' function to the recommender in order to make the searching process easier.
- Used a sample rating dataset: 10 movies and 10 users
- Found similar movies to a selected movie using the NearestNeighbors() in the sklearn library which applies the cosine similarity method.
- Predicted the unknown rating for the movie using the weighted average of ratings for the similar movies by the user.
- Built a movie recommender using the algorithm and applied it to the real movie dataset.