PaLI-GeMMA Image Analyzer is a web application that utilizes the PaLI-GeMMA (Pathways Language and Image Model - Generalist Multimodal Agent) to analyze images based on user prompts. This application allows users to upload images or provide image URLs and ask questions or provide prompts about the image content.
- Drag and drop image upload from local files or web pages
- Image URL input support
- Real-time analysis streaming
- Responsive design with dark mode support
- Download analysis results as text files
- Python 3.9+
- Flask
- Transformers library
- PyTorch
- Hugging Face account and API token
-
Clone the repository:
git clone https://github.com/TanaroSch/pali-gemma-image-analyzer.git cd pali-gemma-image-analyzer
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.env
file in the project root, accept the conditions here and add your Hugging Face API token:HUGGINGFACE_TOKEN=your_token_here MODEL_PATH=./model # Optional: Set a custom path for model storage
Note: If
MODEL_PATH
is not set, the application will use./model
as the default path.
-
Start the Flask application:
python app.py
-
On first startup the model will be downloaded if not existant in the model_cache folder.
-
Open a web browser and navigate to
http://localhost:5000
. -
Upload an image by dragging and dropping it onto the page, or by providing an image URL.
-
Enter a prompt or question about the image in the text area.
-
Click "Analyze" or press Enter to start the analysis.
-
View the analysis results in real-time as they stream in.
-
Optionally, click "Download Result" to save the analysis as a text file.
You can specify a custom path for storing the PaLI-GeMMA model by setting the MODEL_PATH
environment variable in your .env
file. If not set, the application will use ./model
as the default path.
- PaLI-GeMMA model by Google
- Hugging Face for providing the model hosting and API