PaLI-GeMMA Image Analyzer

PaLI-GeMMA Image Analyzer is a web application that utilizes the PaLI-GeMMA (Pathways Language and Image Model - Generalist Multimodal Agent) to analyze images based on user prompts. This application allows users to upload images or provide image URLs and ask questions or provide prompts about the image content.

Features

Drag and drop image upload from local files or web pages
Image URL input support
Real-time analysis streaming
Responsive design with dark mode support
Download analysis results as text files

Prerequisites

Python 3.9+
Flask
Transformers library
PyTorch
Hugging Face account and API token

Installation

Clone the repository:

git clone https://github.com/TanaroSch/pali-gemma-image-analyzer.git
cd pali-gemma-image-analyzer

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```
Create a .env file in the project root, accept the conditions here and add your Hugging Face API token:
```
HUGGINGFACE_TOKEN=your_token_here
MODEL_PATH=./model  # Optional: Set a custom path for model storage
```
Note: If MODEL_PATH is not set, the application will use ./model as the default path.

Usage

Start the Flask application:
```
python app.py
```
On first startup the model will be downloaded if not existant in the model_cache folder.
Open a web browser and navigate to http://localhost:5000.
Upload an image by dragging and dropping it onto the page, or by providing an image URL.
Enter a prompt or question about the image in the text area.
Click "Analyze" or press Enter to start the analysis.
View the analysis results in real-time as they stream in.
Optionally, click "Download Result" to save the analysis as a text file.

Custom Model Path

You can specify a custom path for storing the PaLI-GeMMA model by setting the MODEL_PATH environment variable in your .env file. If not set, the application will use ./model as the default path.

Acknowledgments

PaLI-GeMMA model by Google
Hugging Face for providing the model hosting and API

tanarosch / pali-gemma-image-analyzer Goto Github PK