YohanJeong_Portfolio

This Portfolio contains data-related projects.

Project 1: Scraping Box Office Info Using Scrapy

Medium Post

Scraped the Boxofficemojo website using Scrapy in Python.
Checked all the movies released in the US during certain periods of time and extracted useful information about the individual movies.
For each movie, Domestic Revenues, Worldwide Revenues, Distributor, Opening, Budget, MPAA, Genres, and In Release are scraped.

Project 2: Analysis of Movies and their Trailer Release Dates

Medium Post

Scraped the Boxofficemojo and Traileraddict websites to get movie information.
Explored movie features such as budget, distributors, MPAA, and genres.
Examined whether the variation in the promotion period is related to such features.

Project 3: Predicting Housing Prices using Cross Validation and Grid Search in Regression Models

Medium Post

Analyze the factors related to housing prices in Melbourne and performed the predictions for the housing prices using several machine learning techniques.
Employed Linear Regression, Ridge Regression, K-Nearest Neighbors, and Decision Tree.
Found the optimal values for hyper parameters in each model using the methods of the Cross Validation and Grid Search techniques.
Compared the results to find the best machine learning model to predict the housing prices in Melbourne.

Project 4: Creating a Database and Practicing SQL Queries

Medium Post 1 / Medium Post 2

Converted a data in one spreadsheet to a relational database for SQL.
Performed several SQL queries using the database.

Project 5: The Analysis for Glassdoor Job Postings

Scraped over 3000 job postings for 'Data Analyst' from the Glassdoor website using the Selenium library in the Python
Cleaned the scraped data using the Python.
Converted the data to the format for the Relational Database to store it in the SQL format.
Visualized the data using Tableau, showing the salary distributions by state, city, sector, and skills.

Project 6: Cohort Analysis

Implemented the cohort analysis using eCommerce data from UIC machine learning repository
Showed how to create the matrix for cohort analysis from the raw ecommerce data.

Project 7: Making a Content-based Movie Recommender

Medium Post

Used a movie data set from the MovieLens, which has 9742 movies.
Quantified the movie features using the Term Frequency and Inverse Document Frequency (tf-idf).
Calculated the similarities between movies using the cosine similarity.
Added the 'Did you mean...?' function to the recommender in order to make the searching process easier.

Project 8: Making an Item-based Collaborative Filtering

Medium Post

Used a sample rating dataset: 10 movies and 10 users
Found similar movies to a selected movie using the NearestNeighbors() in the sklearn library which applies the cosine similarity method.
Predicted the unknown rating for the movie using the weighted average of ratings for the similar movies by the user.
Built a movie recommender using the algorithm and applied it to the real movie dataset.

yjeong5126 / yohanjeong_portfolio Goto Github PK

yohanjeong_portfolio's Introduction

YohanJeong_Portfolio

Project 1: Scraping Box Office Info Using Scrapy

Project 2: Analysis of Movies and their Trailer Release Dates

Project 3: Predicting Housing Prices using Cross Validation and Grid Search in Regression Models

Project 4: Creating a Database and Practicing SQL Queries

Project 5: The Analysis for Glassdoor Job Postings

Project 6: Cohort Analysis

Project 7: Making a Content-based Movie Recommender

Project 8: Making an Item-based Collaborative Filtering

yohanjeong_portfolio's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent