This repository contains an application that utilizes the Google Gemini Pro Vision model for multimodal tasks. The application is built using Streamlit as the web interface.
The application leverages the capabilities of the Google Gemini Pro Vision model to perform various multimodal tasks, such as image classification, object detection, and image captioning. It provides a user-friendly web interface powered by Streamlit, allowing users to interact with the model and visualize its outputs.
The solution has been deployed using Hugging Face Code Spaces, providing a seamless and cloud-based development environment for the application.
To use the application, follow these steps:
- Clone the repository to your local environment.
- Navigate to the project directory.
- Install the required dependencies by running:
pip install -r requirements.txt
- Launch the application by running:
streamlit run app.py
This project is licensed under the MIT License.