Git Product home page Git Product logo

youtube-clickbait-detector's Introduction

YouTube Clickbait Detector

Problem statement

YouTube has more than 2 billion active monthly users and more than 122 million active daily users who view more than a billion hours per day. It's apparent how influential it's grown as the world's second most viewed website with over 26 Billion videos. A thumbnail, a title, comments and video statistics such as likes, dislikes, and views are the three primary components of a video.

Certain platform content providers have created videos with purposefully deceptive names and thumbnails in order to get platform users to click on their films, much as how some news headlines misrepresent the content of the article. Clickbait is a term used to describe an overblown video.

Because various individuals interpret clickbait differently, the term employed in this project is: a piece of material that employs an exaggerated title and thumbnail to deceive the audience into watching the video that does not deliver on what the title and thumbnail promised.

Objective: Spend less time and attention watching false YouTube videos.

Main Notebooks to check out:

  • Cognitive evidences
  • Data Exploration
  • Ensemble Learning
  • Data Preprocessing & ML estimators
  • Title Classifier

Getting the Data

I created a method to retrieve the thumbnail, title, and statistics from a given video using the YouTubeDataV3 API. The videos that weren't clickbait were chosen at random from the Explore page and verified to make sure they weren't deceptive.

In terms of obtaining the clickbait videos, I had a Catch 22 situation. The ultimate objective of this investigation is to construct a machine learning system that can automatically gather clickbait and non-clickbait videos.

The combination of a clickbait video's title and thumbnail was used to identify it, and to lessen bias, the videos were chosen manually by two separate people. To avoid giving any one theme too much weight, other genres were sampled.

Key Insights:

  • Nonclickbait Videos titiles have nouns like game , official , highlisghts etc and have relavant comments matching to the context of the video
  • While Clickbait video titiles are like Prank, Hack or gone wrong with comments similar to "fake video" or "do not agree" and "wrong thumbanil". etc

Data Preprocessing & Feature Engineering:

developed a new function: Dislike to Like Ratio because clickbait videos sometimes have a high dislike count. All of the statistics are scaled to a normal distribution and shuffled randomly. Other video metadata is eliminated, including ID and Favorites. Emojis and other non-ascii characters are deleted from all titles since they cannot be used to name files on Windows file systems.

Machine Learning Models

  • Ensemble of ML Model estimators (Random Forest, K Nearest Neighbors, Support Vector Classifier, XGBoost, Logisitic Regression, Gaussian Naive Bayes) for Video Statistics Classification using soft voting as we want to display probabilities

  • Feedforward Neural Network for Video Title Classification using the Google Universal Sentence Encoder, an encoder with with 512 dimensional embeddings trained by a DAN encoder for language classification tasks. It is a 1GB model, so it takes a while to load at first.

  • Combining both with my own custom Ensemble model Used this to convert logits to logs

Skills Learned:

  • Analyzing distributions and creating correlations for both numerical and natural language data
  • Fetching & Parsing through JSON data with the YouTube API
  • Ensembling Machine Learning Estimators and Tensorflow Hub NLP models together

Any YouTube user might utilise a web application as a potential software solution. The UI of the web app would be optimised for mobile devices given that more than 70% of YouTube viewing time occurs on mobile devices (though works fine on desktop)

Authors

youtube-clickbait-detector's People

Contributors

atharva21-stack avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.