Git Product home page Git Product logo

omar-sherif9992 / dialect-llm-bachelor-project Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 0.0 16.95 MB

The aim of the Bachelor project is to innovate a new way for Arabic (Egyptian-Dialect) Sentiment Analysis , Forecasting and Topic Modeling using Machine Learning , Deep Learning and Transformers!

Jupyter Notebook 99.97% HTML 0.03%
natural-language-processing nlp python arabic-nlp deep-learning huggingface machine-learning pytorch tensorflow text-classification text-preprocessing transformers

dialect-llm-bachelor-project's Introduction

Hi , I am Eng. Omar Sherif Ali

Computer Science & Engineering

"Language is the bridge that connects minds, and NLP is the compass guiding us to understand and unlock its infinite potential."

Welcome to my Bachelor Thesis

Logo
Logo
Logo Logo Logo Logo Logo
Logo
Logo Logo

Egyptian Tweet Sentiment Analysis , Forecasting and Topic Modeling

The aim of the Bachelor project is to innovate a new way for Arabic (Egyptian-Dialect) Sentiment Analysis, Forecasting, and Topic Modeling using Machine Learning, Deep Learning, and Transformers!

📄View the Thesis »
· Presentation · Demo Video · Report Bug · Be a Contributer

model architechture

💡 Description

In recent years, social media platforms have become increasingly popular for individuals to express their thoughts and opinions on various topics and situations. Monitoring sentiment and understanding the evolution of topics is crucial for governments to identify negative sentiments and respond promptly. In this study, we developed a sentiment analysis ensemble model architecture consisting of 4 different Transformers namely: MARBERT,CaMel-bert-DA, CaMel-bert-Mix, and Alanzi. This ensemble architecture was followed by a final classification layer, consisting of a three-layer feed-forward neural network. The output of each transformer applied on its logits a formula, and the resulting logits were then summed before applying the softmax activation function. Evaluating the model’s performance on the sentiment analysis test dataset, an impressive test accuracy of 83%. To analyze the temporal trends of sentiment, we applied the LSTM model using a sliding window to time-stamped tweets related to English News, which generated significant discussions on social media. That is then translated to Arabic by a translation model. we plotted the sentiment arc and observed our results with the original results. We explored the effectiveness of BERTopic, a topic modeling technique, in comparison to LDA and NMF techniques. By employing various pre-trained Arabic language models as embeddings, we conducted topic modeling and aspect-based analysis. The results highlight that BERTopic and NMF achieved comparable and competitive outcomes, demonstrating their capability to capture meaningful topics effectively. However, LDA exhibited poor performance in generating coherent and informative topics. These findings emphasize the superiority of BERTopic and NMF over LDA in topic modeling tasks.

model architechture

Pipeline

  • Tweet Collecting & Merging & Pre-processing & Analysis & Correctness Investigation Open In Colab

  • Machine Learning Sentiment Analysis Open In Colab

  • Deep Learning Sentiment Analysis Open In Colab

  • Pure Transformers Sentiment Analysis Open In Colab

  • Customized Transformers & Ensemble Model Sentiment Analysis & Website Open In Colab

  • Sentiment Forecasting Open In Colab

  • Aspect-Based & Topic Modeling Open In Colab

  • Zero Shot Classification Open In Colab

Sentiment Analysis Website

model architechture

💻️ Languages & Libraries Used

(back to top)

⚠️ Disclaimer

Users who will Use this Data should only use it for Practice and not for Commercial Purposes !

(back to top)

Author: Omar Sherif Ali - OSA

(back to top)

Connect with me


Made with ❤️ by Omar Sherif Ali - OSA.

© OSA - 2022

(back to top)

Daily Progress

First Week

Date Day Progress Resources
2023-03-01 Saturday reading in OReily Sckitlearn book ,revisied Python and searching for extra resources book
2023-03-01 Sunday reading in OReily Sckitlearn book and searching for extra resources book
2023-03-01 Monday Learned NLP,Tokenization,stemmation,lemmetization,count vectorizer Udemy course on Python
2023-03-01 Tuesday Discovered Pandas and done 5 notesbooks for practice on kaggle and finished Data cleaning course in Kaggle Kaggle
2023-03-01 Wednesday learned TF-IDF+ notebooks excercise and finished Machine Learning Beginner Kaggle Course
2023-03-02 Thursday Digging deep in models and their hyperparameters and participating in kaggle competition Kaggle competition on house prices
2023-03-03 Friday understand feature engineering and importance, imputers ,Worked on project proposal Medium article on best practices for ML models

Second Week

Date Day Progress Resources
2023-03-01 Saturday Understand more models SVMs,KNN and ensemble models XGBoost Coursera course on Python for Data Science
2023-03-01 Sunday Finished Kaggle ML interediate course,categorial encoding,and search for more resources Kaggle
2023-03-01 Monday ROC,AUC,Conersion matrix,Covariance matrix
2023-03-01 Tuesday Learned about standardization and done a project on all previously learned models Coursera course on Python for Data Science
2023-03-01 Wednesday Activation functions and Sentiment analysis
2023-03-02 Thursday Model Interpretation(Model-agnostics),text summarization,random walk
2023-03-03 Friday

dialect-llm-bachelor-project's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.