This is the project that uses the BERTopic model to predict topics based on input text. This project is containerized using Docker for easy deployment and execution.
This topic model was trained in several news articles and journals mainly based on Bloom Berg News and Foreign Affairs. The expected effect is to identify the most related topics for the input article, and there are some common topics will present after you input your text, such as Technology, Market, and Economy etc.
- Features
- Software and Tools Requirements
- Python Environment and Dependencies
- Getting Started
- Building the Docker Image Locally
- License
- Acknowledgments
- Model Loading: Seamlessly load a pre-trained BERTopic model.
- Prediction Endpoint: Provide any text input to identify the top related topics.
- Cross-Platform Docker Support: Built for compatibility across both AMD64 & ARM64 architectures.
- Docker: Ensure Docker is installed on your machine. Download Docker
- VSCodeIDE: Main platform to compose your code. Download VSCodeIDE
- Git (Optional): For cloning and contributing to the repository. Download Git
The application is written in Python and requires Python 3.8.13 on Apple Silicon device. Dependencies are listed in the requirements.txt
file in this repository, which include:
- BERTopic
- numpy
- pandas
- seaborn
- transformers
- tokenizers
- scikit-learn
For a complete list of dependencies and their versions, refer to requirements.txt
.
- Docker installed on your machine.
- Git for cloning the repository.
docker pull ttonnyy789/bertopic-bb:latest
docker run -it --rm ttonnyy789/bertopic-bb
Once you execute the command, the follow result will present on your terminal. After the Enter text:
appear you can provide any text input when prompted to get the related topics.
Make sure there are no spaces or blank lines in the input text
If you want to build the Docker image locally:
The multi-platform docker image builder is required in this case. Take M1 Apple Silicon device as an example:
docker buildx create --use
docker buildx inspect --boostrap
You can execute the following commands to check this specific docker builder whether has been installed in your machine or not.
docker images
If it is successfully installed, you will be able to find a docker image on your local device called moby/buildkit
.
Next step, execute the commands below, you would be able to build and run this docker file successfully.
git clone https://github.com/TTonnyy789/Topic_Modelling.git
cd Topic_Modelling
Once cloned this repositories from Github, you can run following commands and run this dockerized model locally.
In this case, this image is built with --platform linux/arm64,linux/amd64
setting, so it would not store image to your local device automatically if you did not add --load
.
Therefore, --load
is essential in this stage, because this docker file is based on multi-platform, the initial configuration of building docker image will not store it directly on you device.
You can change --load
to --push
if you want to push this image onto your docker hub.
docker buildx build --platform linux/amd64,linux/arm64 -t ttonnyy789/bertopic-bb --load .
Last but not least, execute this command and enjoy your topic predicting journey ! !
docker run -it --rm ttonnyy789/bertopic-bb
MIT
- Thanks to Docker and the BERTopic community for the foundational tools and resources.
- Special thanks to ChatGPT for project troubleshooting guidance.