The widespread popularity of short-form videos, fueled by personalized content recommendations from powerful machine learning algorithms, raises concerns about their impact, both positive and negative, particularly on societal well-being. While short-form videos dominate social media platforms, especially among youth, there is a need to understand their emotional impact more deeply. Alongside their entertainment value, short-form videos can harbor harmful content such as hate speech and violence. These forms of content have detrimental effects on youth, perpetuating intolerance, hostility, and aggression. Hate speech, in particular, not only promotes discriminatory attitudes but also undermines mental well-being. Similarly, the propagation of violence in short videos can desensitize viewers and contribute to societal tensions. Thus, the unchecked proliferation of hate speech and violence in short-form videos poses a significant threat to the emotional and social development of youth, ultimately undermining their well-being and contributing to societal discord.
This research aims to investigate the social consequences of short-form videos by employing sentiment analysis techniques to uncover the emotional nuances within this content. By identifying emotionally charged content and its potential effects, particularly on platforms like reels, this study seeks to provide insights that can mitigate societal conflicts arising from the consumption of short-form videos. Leveraging machine learning, the goal is to address concerns regarding the harmful effects of certain short videos and foster a more informed approach to their creation and consumption.
Our proposed model for detecting harmful short videos utilizes a multi-modal approach aimed at enhancing accuracy by integrating both audio and video analysis to assess content suitability for young audiences. Leveraging advanced motion detection algorithms such as CNN, the video component is scrutinized for any instances of violence. Simultaneously, the audio component is processed to extract speech, allowing identification of potential hate speech within the video content. Through a multi-modal framework, we combine the outputs of both models using a weighted averaging technique to derive the final assessment.