Git Product home page Git Product logo

nocturne2333 / comparison-of-hybrid-neural-network-methodologies-for-sentiment-emotion-analysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ramos-iyer/comparison-of-hybrid-neural-network-methodologies-for-sentiment-emotion-analysis

1.0 0.0 0.0 5.39 MB

Twitter tweets play an important role in every organisation. This project is based on analysing the English tweets and categorizing the tweets based on the sentiment and emotions of the user. The literature survey conducted showed promising results of using hybrid methodologies for sentiment and emotion analysis. Four different hybrid methodologies have been used for analysing the tweets belonging to various categories. A combination of classification and regression approaches using different deep learning models such as Bidirectional LSTM, LSTM and Convolutional neural network (CNN) are implemented to perform sentiment and behaviour analysis of the tweets. A novel approach of combining Vader and NRC lexicon is used to generate the sentiment and emotion polarity and categories. The evaluation metrics such as accuracy, mean absolute error and mean square error are used to test the performance of the model. The business use cases for the models applied here can be to understand the opinion of customers towards their business to improve their service. Contradictory to the suggestions of Google’s S/W ratio method, LSTM models performed better than using CNN models for categorical as well as regression problems.

Jupyter Notebook 100.00%

comparison-of-hybrid-neural-network-methodologies-for-sentiment-emotion-analysis's Introduction

Comparison-of-Hybrid-Neural-Network-Methodologies-for-Sentiment-Emotion-Analysis

Masters in Data Analytics Project

Project: Comparison of Hybrid Neural Network Methodologies for Sentiment & Emotion Analysis

Table of Contents


Overview

Twitter tweets play an important role in every organisation. This project is based on analysing the English tweets and categorizing the tweets based on the sentiment and emotions of the user. The literature survey conducted showed promising results of using hybrid methodologies for sentiment and emotion analysis. Four different hybrid methodologies have been used for analysing the tweets belonging to various categories. A combination of classification and regression approaches using different deep learning models such as Bidirectional LSTM, LSTM and Convolutional neural network (CNN) are implemented to perform sentiment and emotion analysis of the tweets. A novel approach of combining Vader and NRC lexicon is used to generate the sentiment and emotion polarity and categories. The evaluation metrics such as accuracy, mean absolute error and mean square error are used to test the performance of the model. The business use cases for the models applied here can be to understand the opinion of customers towards their business to improve their service. Contradictory to the suggestions of Google’s S/W ratio method, LSTM models performed better than using CNN models for categorical as well as regression problems.

Methodology

The below diagram shows the methodology followed for the project and the analysis therein:

Screenshot1

Components

Data Extraction and Pre-Processing

File 'Data Cleaning and Pre-Processing.ipynb' :

  • Imports the full dataset containing twitter tweets for 1 day (01-Aug-2019)
  • Filters the data using Language, Retweets and Hashtags
  • Exports the filtered and fina data into a .csv file

Emotion Classification using NRC Lexicon and LSTM based DNN

File 'NRC_Emotion Category.ipynb' :

  • Imports the filtered and final data of twitter tweets
  • Performs text analysis on the data
  • Applies NRC Lexicon to generate the emotions for each tweet
  • Applies the LSTM based DNN to create a model that predicts the emotion based on the tweet
  • Generates evaluation metrics for comparison

Emotion Classification using Vader Lexicon and LSTM+CNN based DNN

File 'Vader_Emotion Category.ipynb' :

  • Imports the filtered and final data of twitter tweets
  • Performs text analysis on the data
  • Applies Vader Lexicon along with clustering to generate the emotions for each tweet
  • Applies the LSTM and CNN based DNN to create a model that predicts the emotion based on the tweet
  • Generates evaluation metrics for comparison

Sentiment Classification using Vader Lexicon and LSTM+CNN based DNN

File 'Sentiment Category.ipynb' :

  • Imports the filtered and final data of twitter tweets
  • Performs text analysis on the data
  • Applies Vader Lexicon to generate the sentiment for each tweet
  • Applies the LSTM and CNN based DNN to create a model that predicts the sentiment based on the tweet
  • Generates evaluation metrics for comparison

Sentiment Polarity Analysis using Vader Lexicon and Bi-Directional LSTM based DNN

File 'Sentiment Polarity.ipynb' :

  • Imports the filtered and final data of twitter tweets
  • Performs text analysis on the data
  • Applies Vader Lexicon to generate the sentiment polarity scores for each tweet
  • Applies the Bi-Directional LSTM based DNN to create a model that predicts the sentiment polarity based on the tweet
  • Generates evaluation metrics for comparison

Running the Code

Download the base dataset from the below link and store it in the same folder as the codes - https://archive.org/details/twitterstream?and[]=year\%3A"2019"

(Only download the 01-Aug-2019 data zip file)

  1. Execute the "Data Cleaning and Pre-processing.ipynb" file to generate the final dataset used for analysis

  2. Execute the respective model ipynb files to perform the analysis and see the results.

Screenshots

Screenshot2 Screenshot3 Screenshot4 Screenshot5 Screenshot6 Screenshot7 Screenshot8 Screenshot9 Screenshot10 Screenshot11 Screenshot12 Screenshot13 Screenshot14 Screenshot15 Screenshot16

System Configuration Steps

In order to run the code, below are the necessary requirements:

  • Python and Jupyter Notebook: As the code for data extraction and merging is written in Python, Python along with Jupyter Notebook as IDE is required for the execution of the same. Below are the packages that are required as part of the pre-requisites for the same:

os, tarfile, pandas, pyspark, vaderSentiment, matplotlib, numpy, re, tensorflow, sklearn, bs4, string, nltk, emoji, nrclex, seaborn, keras, itertools, scikitplot, gensim, operator, pickle, pathlib, nlp_utils

File Descriptions

Below are the files and the folders that are part of the project implementation:

  1. Cleaned Data:
  • August01_Tweets_Final.csv: Contains the data used for analysis after filtering the raw tweets.
  1. Code:
  • Data Cleaning and Pre-Processing.ipynb: Contains the code to clean, pre-process and filter the raw twitter tweets data
  • NRC_Emotion Category.ipynb: Contains to code to apply Emotion Classification using NRC Lexicon and LSTM based DNN model
  • Sentiment Category.ipynb: Contains the code to apply Sentiment Classification using Vader Lexicon and LSTM+CNN based DNN model
  • Sentiment Polarity.ipynb: Contains the code to apply Sentiment Polarity Analysis using Vader Lexicon and Bi-Directional LSTM based DNN model
  • Vader_Emotion Category.ipynb: Contains the code to apply Emotion Classification using Vader Lexicon and LSTM+CNN based DNN model

Credits and Acknowledgements

  • Archive Team: The Twitter Stream Grab for providing the data used for this project.
  • NCI for a challenging project as part of their full-time masters in data analytics course subject 'Data Mining and Machine Learning 2'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.