Git Product home page Git Product logo

citadel_correlation_one_global_phd_datathon_2023's Introduction

Quantifying and ranking user engagement with clickbait articles using NLP-created feature

In this highly selective, global PhD student competition, I individually tackled a confidential problem statement and documented my findings in a detailed report. Note that the main findings are in the report itself and the code is considered only as supplementary material.

For more details about the competition, check out these links:

Research Overview

This research delved into the textual characteristics of clickbait, focusing on how they impact user engagement. Utilizing Natural Language Processing (NLP) techniques, I analyzed sentiments, emotions, and topics present in clickbait articles. My analysis involved a statistical evaluation and ranking of these factors in terms of their effect on user interaction, supplemented by the development of two null models to validate the reliability of this ranking. My methodological approach is encapsulated in the Clickbait Defender product concept:

image

Technical Overview

My technical work on this project is divided into four main parts:

  1. Google Analytics Analysis: Leveraging Google Analytics, I extracted and analyzed user engagement metrics. This involved studying user behavior patterns, click-through rates, and other relevant metrics to understand how users interact with clickbait content. The notebook google_analytics_analysis.ipynb details this process.

  2. Data Cleaning, Processing and Exploratory Data Analysis: I refined the dataset used in the original study, focusing on cleaning, categorizing, and preparing the data for deeper analysis. The notebook data_cleaning_processing_eda.ipynb contains the entire process.

  3. NLP Classification: Here, I developed algorithms for classifying the text of clickbait articles. This part involves sentiment analysis, emotion detection, and topic categorization, as seen in nlp_text_classyfing_algorithms.ipynb.

  4. Statistical Rank Analysis, Null Models and Insights: This section involves applying statistical models to the processed data to glean insights into user engagement. The Jupyter notebook text_analysis_sentiment_emotion_topic.ipynb outlines this analysis.

Data

I cannot provide the processed datasets that we have obtained for the competition, but I provide its source, The Upworthy Research Archive: https://upworthy.natematias.com/

citadel_correlation_one_global_phd_datathon_2023's People

Contributors

lukablagoje avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.