Git Product home page Git Product logo

gemini-ai-toolkit's Introduction

Google Gemini AI

Gemini AI Toolkit

maintained - yes contributions - welcome

Google Gemini AI

Overview

The Gemini AI Toolkit makes it easy to use Google's 'Gemini' language models for creating chatbots, generating text, and analyzing images. It's designed for everyone, from beginners to experienced developers, allowing quick addition of AI features to projects with simple commands. While it offers simplicity and lightweight integration, it doesn't compromise on power; experienced developers can access the full suite of advanced options available via the API, ensuring robust customization and control. This toolkit is perfect for those looking to efficiently tap into advanced AI without getting bogged down in technical details, yet it still provides the depth needed for complex project requirements.

Key Features

  • Chat Functionality: Engage in interactive conversations with Gemini's advanced conversational models.
  • Image Captioning: Analyze images to generate descriptive captions or insights.
  • Text Generation: Produce creative and contextually relevant text based on prompts.
  • Command-Line Interface (CLI): Access the full suite of functionalities directly from the command line.
  • Python Wrapper: Simplify interaction with Google's Gemini models in only 2 lines of code.
  • Streamed Responses: Receive responses as they are generated for real-time interaction.
  • Safety Settings Integration: Tailor safety filters to prevent the generation of inappropriate or unsafe content.
  • Flexible Configuration: Customize the token limits, safety thresholds, stop sequences, temperature and more.
  • Minimal Dependencies: Built to be efficient and lightweight, requiring only the requests package for operation.

Prerequisites

  • Python 3.x
  • An API key from Google AI Studio

Dependencies

The following Python packages are required:

  • requests: For making HTTP requests to Google's Gemini API.

The following Python packages are optional:

  • python-dotenv: For managing API keys and other environment variables.

Installation

To use the Gemini AI Toolkit, clone the repository to your local machine and install the required Python packages.

Clone the repository:

git clone https://github.com/RMNCLDYO/gemini-ai-toolkit.git

Navigate to the repositories folder:

cd gemini-ai-toolkit

Install the required dependencies:

pip install -r requirements.txt

Configuration

  1. Obtain an API key from Google AI Studio.

  2. You have three options for managing your API key:

    Click here to view the API key configuration options
    • Setting it as an environment variable on your device (recommended for everyday use)

      • Navigate to your terminal.
      • Add your API key like so:
        export GEMINI_API_KEY=your_api_key

      This method allows the API key to be loaded automatically when using the wrapper or CLI.

    • Using an .env file (recommended for development):

      • Install python-dotenv if you haven't already: pip install python-dotenv.
      • Create a .env file in the project's root directory.
      • Add your API key to the .env file like so:
        GEMINI_API_KEY=your_api_key

      This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv installed and set up correctly.

    • Direct Input:

      • If you prefer not to use a .env file, you can directly pass your API key as an argument to the CLI or the wrapper functions.

        CLI

        --api_key your_api_key

        Wrapper

        api_key="your_api_key"

      This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.

Usage

The Gemini AI Toolkit can be used in three different modes: Chat, Text, and Vision. Each mode is designed for specific types of interactions with the Gemini models.

Chat Mode

Chat mode is intended for chatting with an AI model (similar to a chatbot) or building conversational applications. It supports multi-turn dialogues with the model.

Example Usage

CLI

python cli.py --chat

Wrapper

from gemini import Chat

Chat().run()

An executable version of this example can be found here. (You must move this file to the root folder before running the program.)

Text Mode

Text mode is suitable for generating text content based on a provided prompt.

Example Usage

CLI

python cli.py --text --prompt "Write a story about a magic backpack."

Wrapper

from gemini import Text

Text().run(prompt="Write a story about a magic backpack.")

An executable version of this example can be found here. (You must move this file to the root folder before running the program.)

Vision Mode

Vision mode allows for generating text based on a combination of text prompts and images.

Example Usage

CLI

python cli.py --vision --prompt "Describe this image." --image "image_path_or_url"

Wrapper

from gemini import Vision

Vision().run(prompt="Describe this image.", image="image_path_or_url")

An executable version of this example can be found here. (You must move this file to the root folder before running the program.)

Advanced Configuration

CLI and Wrapper Options

Description CLI Flag(s) CLI Usage Wrapper Usage
Enable chat mode -c, --chat --chat See mode usage above.
Enable text mode -t, --text --text See mode usage above.
Enable vision mode -v, --vision --vision See mode usage above.
User prompt -p, --prompt --prompt "Write a story about a magic backpack." prompt="Write a story about a magic backpack."
Image file path or url -i, --image --image "image_path_or_url" prompt="Describe this image.", image="image_path_or_url"
API key for authentication -a, --api_key --api_key "your_api_key" api_key="your_api_key"
Model to use -m, --model --model "gemini-1.0-pro-latest" model="gemini-1.0-pro-latest"
Enable streaming mode -s, --stream --stream stream=True
Maximum tokens to generate -mt, --max_tokens --max_tokens 1024 max_tokens=1024
Sampling temperature -tm, --temperature --temperature 0.7 temperature=0.7
Nucleus sampling threshold -tp, --top_p --top_p 0.9 top_p=0.9
Top-k sampling threshold -tk, --top_k --top_k 40 top_k=40
Number of candidates to generate -cc, --candidate_count --candidate_count 1 candidate_count=1
Stop sequences for completion -ss, --stop_sequences --stop_sequences ["\n", "."] stop_sequences=["\n", "."]
Safety categories for filtering -sc, --safety_categories --safety_categories ["HARM_CATEGORY_HARASSMENT"] safety_categories=["HARM_CATEGORY_HARASSMENT"]
Safety thresholds for filtering -st, --safety_thresholds --safety_thresholds ["BLOCK_NONE"] safety_thresholds=["BLOCK_NONE"]

Available Models

Model Name Model Parameter Max Tokens
Gemini Pro 1.0 (Latest) gemini-1.0-pro-latest 2048 tokens
Gemini Pro 1.0 (Latest Stable) gemini-1.0-pro 2048 tokens
Gemini Pro 1.0 (Stable) gemini-1.0-pro-001 2048 tokens
Gemini Pro 1.0 Vision gemini-pro-vision 4096 tokens
Gemini Pro 1.5 general api access not available yet 8192 tokens

VisionAPI Limitations and Requirements:

  • Supported MIME types: PNG, JPEG, WEBP, HEIC, HEIF.
  • Maximum 4MB of data (including images and text).
  • Images larger than 3072 x 3072 pixels are scaled down while preserving aspect ratio.

Contributing

Contributions are welcome!

Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.

Reporting Issues

Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:

  1. Check if the issue has already been reported.
  2. Use the Bug Report template to create a detailed report.
  3. Submit the report here.

Your report will help us make the project better for everyone.

Feature Requests

Got an idea for a new feature? Feel free to suggest it. Here's how:

  1. Check if the feature has already been suggested or implemented.
  2. Use the Feature Request template to create a detailed request.
  3. Submit the request here.

Your suggestions for improvements are always welcome.

Versioning and Changelog

Stay up-to-date with the latest changes and improvements in each version:

  • CHANGELOG.md provides detailed descriptions of each release.

Security

Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.

License

Licensed under the MIT License. See LICENSE for details.

gemini-ai-toolkit's People

Contributors

rmncldyo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.