Git Product home page Git Product logo

text-topic-modelling-bertopic's Introduction

Topic Classification and Keyword Extraction from News Articles using BERTopic

This is the project that uses the BERTopic model to predict topics based on input text. This project is containerized using Docker for easy deployment and execution.

This topic model was trained in several news articles and journals mainly based on Bloom Berg News and Foreign Affairs. The expected effect is to identify the most related topics for the input article, and there are some common topics will present after you input your text, such as Technology, Market, and Economy etc.

Table of Contents

Features

  • Model Loading: Seamlessly load a pre-trained BERTopic model.
  • Prediction Endpoint: Provide any text input to identify the top related topics.
  • Cross-Platform Docker Support: Built for compatibility across both AMD64 & ARM64 architectures.

Software And Tools Requirements

Python Environment and Dependencies

The application is written in Python and requires Python 3.8.13 on Apple Silicon device. Dependencies are listed in the requirements.txt file in this repository, which include:

  • BERTopic
  • numpy
  • pandas
  • seaborn
  • transformers
  • tokenizers
  • scikit-learn

For a complete list of dependencies and their versions, refer to requirements.txt.

Getting Started

Prerequisites

  • Docker installed on your machine.
  • Git for cloning the repository.

Pulling the Docker Image

docker pull ttonnyy789/bertopic-bb:latest

Running on Docker container

docker run -it --rm ttonnyy789/bertopic-bb

Once you execute the command, the follow result will present on your terminal. After the Enter text: appear you can provide any text input when prompted to get the related topics.

Make sure there are no spaces or blank lines in the input text

Image1

Here is a simple result.

Image1

Building the Docker Image Locally (Optional)

If you want to build the Docker image locally:

The multi-platform docker image builder is required in this case. Take M1 Apple Silicon device as an example:

docker buildx create --use
docker buildx inspect --boostrap

You can execute the following commands to check this specific docker builder whether has been installed in your machine or not.

docker images

If it is successfully installed, you will be able to find a docker image on your local device called moby/buildkit.

Next step, execute the commands below, you would be able to build and run this docker file successfully.

git clone https://github.com/TTonnyy789/Topic_Modelling.git
cd Topic_Modelling

Once cloned this repositories from Github, you can run following commands and run this dockerized model locally.

In this case, this image is built with --platform linux/arm64,linux/amd64 setting, so it would not store image to your local device automatically if you did not add --load.

Therefore, --load is essential in this stage, because this docker file is based on multi-platform, the initial configuration of building docker image will not store it directly on you device.

You can change --load to --push if you want to push this image onto your docker hub.

docker buildx build --platform linux/amd64,linux/arm64 -t ttonnyy789/bertopic-bb --load .

Last but not least, execute this command and enjoy your topic predicting journey ! !

docker run -it --rm ttonnyy789/bertopic-bb

License

MIT

Acknowledgements

  • Thanks to Docker and the BERTopic community for the foundational tools and resources.
  • Special thanks to ChatGPT for project troubleshooting guidance.

text-topic-modelling-bertopic's People

Contributors

ttonnyy789 avatar

Watchers

 avatar

text-topic-modelling-bertopic's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.