Simple chatbot that can run localy using "mlx_lm" with models from the mLX-Community on HuggingFace (see here: https://huggingface.co/mlx-community). On default it uses "mlx-community/NeuralBeagle14-7B-4bit-mlx" (pretty fast but still a good model). You can change it in the "main_mlx.py" file by changing the "MODEL" variable.
Install the package using the following command: ''' pip install mlx-lm '''
It also uses the "LLMLingua" (https://github.com/microsoft/LLMLingua?tab=readme-ov-file) in the background to compress previous context. To install it, use the following command: ''' pip install llmlingua accelerate '''
The webpage is generated by the "streamlit" package. To install it, use the following command: ''' pip install streamlit '''
To run the chatbot, use the following command: ''' streamlit run --server.port 8501 main_mlx.py '''
It will load the models at the beginning, so it might take a while to start. After that, you can chat with the bot using the webpage that will open in your browser.
This chatbot is based on a tutorial from lightning AI (https://lightning.ai/lightning-ai/studios/run-codellama-70b-instruct?section=featured) using their Lightning Studio. It uses Ollama (http://ollama.ai/) for the LLM in the background.