Comparison of LDA Learning Algorithms

This repository contains the code and documentation for an empirical comparison of two popular LDA learning algorithms: Variational Inference (Gensim Implementation) and Gibbs Sampling (Mallet Implementation). This project aims to evaluate and compare the performance of these algorithms across various text datasets.

Project Overview

The project consists of several key stages:

Preprocessing: Text datasets are preprocessed (tokenization, stopwords removal, lemmatization, and bi-grams/tri-grams creation) before modeling.
Training and Tuning: We explore different configurations of the number of topics and soruce dataset for comparison between the 2 learnign algorithms. For tuning, we optimizaed for cohehernce.
- Train LDA model using Variational Inference with Gensim.
- Train LDA model using Gibbs Sampling with Mallet.
Evaluation: The performance of the algorithms is evaluated using the intrusion test, leveraging Large Language Models (LLM) to assess topic quality.

Authors

This project was created by Sicun Chen and Ayush Hate.

References

This section includes references to key documentation and tutorials that heled us in this project:

Gensim LDA Documentation - Documentation on using Gensim for topic modeling. Gensim LDA
Gensim Coherence Documentation - Documentation on using Gensim for Coherence calculation. Gensim Coherence
Gensim Tutorial - Tutorial using Gensim for topic modeling. Gensim Tutorial
Gensim LDA Tips - A discussion on hyper-parameter setting for Gensim. Gensim LDA: Tips and Tricks
Mallet Documentation - Documentation on using Mallet for topic modeling. Mallet Topic Modeling
Mallet Tutorial - Tutorial using Mallet for topic modeling. Mallet Tutorial
Mallet Python Wrapper - We adpated the wrapper fucntions implemented here: little-mallet-wrapper

These resources are fundamental for understanding the implementation details and enhancing the functionality of the project.

chens28 / lda Goto Github PK

lda's Introduction

Comparison of LDA Learning Algorithms

Project Overview

Authors

References

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent