Intensity Analysis: Emotion Classification

This project aims to predict emotions from text data using various machine learning and deep learning models. It includes preprocessing steps, different vectorization techniques, and a Streamlit web application for interactive predictions.

Project Overview
Dataset
Data Preprocessing
Models Used
Project Structure
Installation
Usage
Streamlit App
Conclusion

Project Overview

The main goal of this project is to predict emotions such as happiness, anger, and sadness from textual data. We employed various models and vectorization techniques to find the best performing model.

Dataset

The dataset used in this project consists of WhatsApp chat data from Indian users. This data presents unique challenges as many Indian users often type in their native languages using the English script, which includes a lot of slang and colloquial expressions. For example, the word "khatarnaak" (खतरनाक), which means "dangerous" in Hindi, is often used to describe something intense or impressive in a positive way. This linguistic mix makes it challenging for models to accurately interpret and predict emotions.

Despite these challenges, the best performing model, which uses a Linear SVM with Word2Vec, achieved a validation accuracy of 73%.

Data Preprocessing

The text preprocessing pipeline includes the following steps:

Convert to lowercase
Remove whitespace
Remove newline characters
Remove ".com" substrings
Remove URLs
Remove punctuation
Remove HTML tags
Remove emojis
Handle problematic characters within words( â€™, iâm, ðŸ™ and so on)
Convert acronyms
Expand contractions
handle slangs and abbreviations
Correct spelling
Lemmatize text
Discard non-alphabetic characters
Keep specific parts of speech
Remove stopwords

Text-Preprocessing Pipeline Flowchart

graph TD;
    A[Input Text] --> B[Convert to lowercase];
    B --> C[Remove whitespace];
    C --> D[Remove newline characters];
    D --> E[Remove .com];
    E --> F[Remove URLs];
    F --> G[Remove punctuation];
    G --> H[Remove HTML tags];
    H --> I[Remove emojis];
    I --> J[Handle problematic characters];
    J --> K[Convert acronyms];
    K --> L[Expand contractions];
    L --> M[handle slangs and abbreviations];
    M --> N[Correct spelling];
    N --> O[Lemmatize text];
    O --> P[Discard non-alphabetic characters];
    P --> Q[Keep specific parts of speech];
    Q --> R[Remove stopwords];
    R --> S[Preprocessed Text];

Models Used

10 models were Tested for each vectorization method, Best performed:

TF-IDF Vectorizer with XGBoost
Bag of Words (BoW) Vectorizer with XGBoost
Word2Vec with Linear SVM
GloVe with Bidirectional LSTM

Project Structure

├── Datasets/
|   ├── angriness.csv
|   ├── happiness.csv
|   ├── sadness.csv
├── assets/
|   ├── comment.png
|   ├── streamlit app overview.png
├── streamlit app/
|   ├── pkl files/
│   |  ├── best_xgb_model.pkl
│   |  ├── bow_vectorizer.pkl
│   ├── main.py
│   ├── text_normalization.py
│   ├── requirement.txt
├── emotion_classificatiom.ipynb
├── project report.pdf
├── README.md

Installation

Clone the repository:

git clone https://github.com/vn33/Intensity-Analysis-EmotionClassification.git

Install the dependencies:

pip install -r streamlit app/requirement.txt

Download necessary NLTK data:

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')

Download Spacy model:

python -m spacy download en_core_web_sm

Usage

Run the Streamlit app:

streamlit run app.py

Enter text into the input box and click "Predict" to see the emotion prediction.

Streamlit App

The Streamlit app allows users to input text and get an emotion prediction. It uses the pre-trained models and vectorizers to preprocess the text and make predictions.

Conclusion

This project demonstrates the use of various text preprocessing techniques and machine learning models to predict emotions from text. Despite the challenges posed by the unique linguistic characteristics of the dataset, we achieved a validation accuracy of 73% with our best model. The Streamlit app provides an interactive way to test the models and see their predictions in real-time.

vn33 / intensity-analysis-emotionclassification Goto Github PK