Git Product home page Git Product logo

vn33 / intensity-analysis-emotionclassification Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.66 MB

Predict emotions (happiness, anger, sadness) from WhatsApp chat data using machine learning and deep learning models. Includes text normalization, vectorization (TF-IDF, BoW, Word2Vec, GloVe), and model evaluation.

Jupyter Notebook 99.20% Python 0.80%
bidirectional-lstm countvectorizer deep-learning emotion-classification glove-embeddings hyperparameter-tuning machine-learning natural-language-processing text-classification text-normalization tf-idf-vectorizer word2vec word2vec-embeddinngs

intensity-analysis-emotionclassification's Introduction

Intensity Analysis: Emotion Classification

This project aims to predict emotions from text data using various machine learning and deep learning models. It includes preprocessing steps, different vectorization techniques, and a Streamlit web application for interactive predictions.

Table of Contents

Project Overview

The main goal of this project is to predict emotions such as happiness, anger, and sadness from textual data. We employed various models and vectorization techniques to find the best performing model.

Dataset

The dataset used in this project consists of WhatsApp chat data from Indian users. This data presents unique challenges as many Indian users often type in their native languages using the English script, which includes a lot of slang and colloquial expressions. For example, the word "khatarnaak" (खतरनाक), which means "dangerous" in Hindi, is often used to describe something intense or impressive in a positive way. This linguistic mix makes it challenging for models to accurately interpret and predict emotions.

Despite these challenges, the best performing model, which uses a Linear SVM with Word2Vec, achieved a validation accuracy of 73%.

Data Preprocessing

The text preprocessing pipeline includes the following steps:

  1. Convert to lowercase
  2. Remove whitespace
  3. Remove newline characters
  4. Remove ".com" substrings
  5. Remove URLs
  6. Remove punctuation
  7. Remove HTML tags
  8. Remove emojis
  9. Handle problematic characters within words( ’, iâm, 🙠and so on)
  10. Convert acronyms
  11. Expand contractions
  12. handle slangs and abbreviations
  13. Correct spelling
  14. Lemmatize text
  15. Discard non-alphabetic characters
  16. Keep specific parts of speech
  17. Remove stopwords

Text-Preprocessing Pipeline Flowchart

graph TD;
    A[Input Text] --> B[Convert to lowercase];
    B --> C[Remove whitespace];
    C --> D[Remove newline characters];
    D --> E[Remove .com];
    E --> F[Remove URLs];
    F --> G[Remove punctuation];
    G --> H[Remove HTML tags];
    H --> I[Remove emojis];
    I --> J[Handle problematic characters];
    J --> K[Convert acronyms];
    K --> L[Expand contractions];
    L --> M[handle slangs and abbreviations];
    M --> N[Correct spelling];
    N --> O[Lemmatize text];
    O --> P[Discard non-alphabetic characters];
    P --> Q[Keep specific parts of speech];
    Q --> R[Remove stopwords];
    R --> S[Preprocessed Text];
Loading

Models Used

10 models were Tested for each vectorization method, Best performed:

  1. TF-IDF Vectorizer with XGBoost
  2. Bag of Words (BoW) Vectorizer with XGBoost
  3. Word2Vec with Linear SVM
  4. GloVe with Bidirectional LSTM

Project Structure

├── Datasets/
|   ├── angriness.csv
|   ├── happiness.csv
|   ├── sadness.csv
├── assets/
|   ├── comment.png
|   ├── streamlit app overview.png
├── streamlit app/
|   ├── pkl files/
│   |  ├── best_xgb_model.pkl
│   |  ├── bow_vectorizer.pkl
│   ├── main.py
│   ├── text_normalization.py
│   ├── requirement.txt
├── emotion_classificatiom.ipynb
├── project report.pdf
├── README.md

Installation

  1. Clone the repository:
git clone https://github.com/vn33/Intensity-Analysis-EmotionClassification.git
  1. Install the dependencies:
pip install -r streamlit app/requirement.txt
  1. Download necessary NLTK data:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')
  1. Download Spacy model:
python -m spacy download en_core_web_sm

Usage

Run the Streamlit app:

streamlit run app.py

Enter text into the input box and click "Predict" to see the emotion prediction.

Streamlit App

The Streamlit app allows users to input text and get an emotion prediction. It uses the pre-trained models and vectorizers to preprocess the text and make predictions.

Conclusion

This project demonstrates the use of various text preprocessing techniques and machine learning models to predict emotions from text. Despite the challenges posed by the unique linguistic characteristics of the dataset, we achieved a validation accuracy of 73% with our best model. The Streamlit app provides an interactive way to test the models and see their predictions in real-time.

intensity-analysis-emotionclassification's People

Contributors

vn33 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.